Separate columns in R based on the second occurence of ("\\.")

May 11, 2023

I have a very hard to separate my columns from data set

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

tibble(sample=c("AM.F10.T1", "AM.F10.T2","DA.AD.1","DA.AD.2", "ES.AD.1"))
#> # A tibble: 5 × 1
#>   sample   
#>   <chr>    
#> 1 AM.F10.T1
#> 2 AM.F10.T2
#> 3 DA.AD.1  
#> 4 DA.AD.2  
#> 5 ES.AD.1

^{Created on 2023-05-11 with reprex v2.0.2}

and make them look like

#>   sample        col1      col2
#>   <chr>    
#> 1 AM.F10.T1     AM.F10     T1
#> 2 AM.F10.T2     AM.F10     T2
#> 3 DA.AD.1       DA.AD       1
#> 4 DA.AD.2       DA.AD       2
#> 5 ES.AD.1       ES.AD       1

Thank you for spending time in my post

>Solution :

You can do this with tidyr::separate_wider_regex() (this function is in the recent release of tidyr). You can be explicit about what is in the first and second columns and what separates them.

library(tidyr)
tibble(sample=c("AM.F10.T1", "AM.F10.T2","DA.AD.1","DA.AD.2", "ES.AD.1")) |> 
  separate_wider_regex(
     cols = sample, 
     patterns = c(first  = "\\w*\\.\\w*", "\\.", second = "\\w*")
  )
#> # A tibble: 5 × 2
#>   first  second
#>   <chr>  <chr> 
#> 1 AM.F10 T1    
#> 2 AM.F10 T2    
#> 3 DA.AD  1     
#> 4 DA.AD  2     
#> 5 ES.AD  1

^{Created on 2023-05-11 with reprex v2.0.2}