I am trying to separate some rows of a dataframe where commas exist.
This is an example fragment:
name ref alt exon impact score
ID1 A C,T 1,2 l,h 5
ID2 T A 6 h 8
ID3 T G,C,A 2,5,7 l,l,h 10,2,3
ID4 C G 3 m 2
How could I perform the following transformation?
name ref alt exon impact score
ID1 A C 1 l 5
ID1 A T 2 h 5
ID2 T A 6 h 8
ID3 T G 2 l 10
ID3 T C 5 l 2
ID3 T A 7 h 3
ID4 C G 3 m 2
I was trying something as:
df %>%
mutate(alt=strsplit(alt, ",")) %>%
unnest(alt)
However I do not wish to divide the rows just based on one column, but to take into account all the columns where there are more than one value (and to repeat the values where no commas exist).
How can I achieve that?
>Solution :
The function you are looking for is separate_rows from the package tidyr.
library(tidyr)
separate_rows(df, everything(), sep = ",")
# A tibble: 7 × 6
name ref alt exon impact score
<chr> <chr> <chr> <chr> <chr> <chr>
1 ID1 A C 1 l 5
2 ID1 A T 2 h 5
3 ID2 T A 6 h 8
4 ID3 T G 2 l 10
5 ID3 T C 5 l 2
6 ID3 T A 7 h 3
7 ID4 C G 3 m 2
Data
df <- structure(list(name = c("ID1", "ID2", "ID3", "ID4"), ref = c("A",
"T", "T", "C"), alt = c("C,T", "A", "G,C,A", "G"), exon = c("1,2",
"6", "2,5,7", "3"), impact = c("l,h", "h", "l,l,h", "m"), score = c("5",
"8", "10,2,3", "2")), class = "data.frame", row.names = c(NA,
-4L))