Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Divide rows by more than one columns containing commas

I am trying to separate some rows of a dataframe where commas exist.

This is an example fragment:

name  ref   alt    exon   impact score
ID1    A    C,T    1,2     l,h     5
ID2    T    A       6      h       8
ID3    T    G,C,A  2,5,7   l,l,h   10,2,3
ID4    C    G       3      m       2

How could I perform the following transformation?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

name   ref   alt   exon   impact  score
ID1    A     C      1       l      5
ID1    A     T      2       h      5
ID2    T     A      6       h      8
ID3    T     G      2       l      10
ID3    T     C      5       l      2
ID3    T     A      7       h      3
ID4    C     G      3       m      2

I was trying something as:

df %>% 
    mutate(alt=strsplit(alt, ",")) %>% 
    unnest(alt)

However I do not wish to divide the rows just based on one column, but to take into account all the columns where there are more than one value (and to repeat the values where no commas exist).

How can I achieve that?

>Solution :

The function you are looking for is separate_rows from the package tidyr.

library(tidyr)

separate_rows(df, everything(), sep = ",")

# A tibble: 7 × 6
  name  ref   alt   exon  impact score
  <chr> <chr> <chr> <chr> <chr>  <chr>
1 ID1   A     C     1     l      5    
2 ID1   A     T     2     h      5    
3 ID2   T     A     6     h      8    
4 ID3   T     G     2     l      10   
5 ID3   T     C     5     l      2    
6 ID3   T     A     7     h      3    
7 ID4   C     G     3     m      2    

Data

df <- structure(list(name = c("ID1", "ID2", "ID3", "ID4"), ref = c("A", 
"T", "T", "C"), alt = c("C,T", "A", "G,C,A", "G"), exon = c("1,2", 
"6", "2,5,7", "3"), impact = c("l,h", "h", "l,l,h", "m"), score = c("5", 
"8", "10,2,3", "2")), class = "data.frame", row.names = c(NA, 
-4L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading