Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to duplicate specific rows but changing the value in one column by splitting by the comma separated values of an original cell in R

To simplify this problem I’ll use a very basic subset of what the dataset might look like:

library(dplyr)
DF <- tibble(id = seq(1:4),
             label = c("A", "B", "C", "D"),
             val = c(NA, "5, 10", "20", "6, 7, 8"))
DF
# A tibble: 4 × 3
#      id label val    
#   <int> <chr> <chr>  
# 1     1 A     NA     
# 2     2 B     5, 10  
# 3     3 C     20     
# 4     4 D     6, 7, 8

Note: val just contains random numbers that have no meaning in this example, but consider them to be a sort of ID number if that helps).

In this example, I would like to:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • identify rows containing commas in the val column (as these are the only rows to be changed)
  • duplicate each row n times such that the only values that change are in the val column and consist of a single numeric value (where n is the number of comma separated values)
  • e.g. 2 duplicate rows for row 2, and 3 duplicate rows for row 4

So far I’ve only worked out the filter step as below:

DF %>% filter(val %>% stringr::str_detect(pattern = "[,]") == TRUE)

But I’m unsure of how to proceed from here.

This is what my desired output is:

#      id label val    
#   <int> <chr> <chr>  
# 1     1 A     NA     
# 2     2 B     5
# 3     2 B     10
# 4     3 C     20     
# 5     4 D     6
# 6     4 D     7
# 7     4 D     8

Any help is appreciated, but I’d prefer to avoid any loops.

Thanks 🙂

>Solution :

As explained in the answers found from the link pasted in the comments, there are a few ways you can solve this.

The most efficient would probably be to do the following:

separate_rows(DF, val, sep = ", ")

You get:

# A tibble: 7 × 3
     id label val  
  <int> <chr> <chr>
1     1 A     NA   
2     2 B     5    
3     2 B     10   
4     3 C     20   
5     4 D     6    
6     4 D     7    
7     4 D     8    
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading