Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Separate entries in dataframe in new rows in R

I have data.frame df below.

 df <- data.frame(id = c(1:12),
               A = c("alpha", "alpha", "beta", "beta", "gamma", "gamma", "gamma", "delta", 
                     "epsilon", "epsilon", "zeta", "eta"),
               B = c("a", "a; b", "a", "c; d; e", "e", "e", "c; f", "g", "a", "g; h", "f", "d"),
               C = c(NA, 4, 2, 7, 4, NA, 9, 1, 1, NA, 3, NA),
               D = c("ii", "ii", "i", "iii", "iv", "v", "viii", "v", "viii", "i", "iii", "i"))

Column ‘B’ contains four entries with semicolons. How can I copy each of these rows and enter in column ‘B’ each of the separate values?

The expected result df2 is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 df2 <- data.frame(id = c(1, 2, 2, 3, 4, 4, 4, 5, 6, 7, 7, 8, 9, 10, 10, 11, 12),
               A = c(rep("alpha", 3), rep("beta", 4), rep("gamma", 4), "delta", rep("epsilon", 3), 
                     "zeta", "eta"),
               B = c("a", "a", "b", "a", "c", "d", "e", "e", "e", "c", "f", "g", "a", "g", "h", "f", "d"),
               C = c(NA, 4, 4, 2, 7, 7, 7, 4, NA, 9, 9, 1, 1, NA, NA, 3, NA),
               D = c("ii", "ii", "ii", "i", "iii", "iii", "iii", "iv", "v", "viii", "viii", "v", "viii", "i", "i", "iii", "i"))

I tried this, but no luck:

 df2 <- df
 # split the values in column B
 df2$B <- unlist(strsplit(as.character(df2$B), "; "))
 # repeat the rows for each value in column B
 df2 <- df2[rep(seq_len(nrow(df2)), sapply(strsplit(as.character(df1$B), "; "), length)),]
 # match the number of rows in column B with the number of rows in df2
 df2$id <- rep(df2$id, sapply(strsplit(as.character(df1$B), "; "), length))
 # sort the dataframe by id
 df2 <- df2[order(df2$id),]

>Solution :

We may use separate_rows here – specify the sep as ; followed by zero or more spaces (\\s*) to expand the rows

library(tidyr)
df_new <- separate_rows(df, B, sep = ";\\s*")

-checking with OP’s expected

> all.equal(df_new, df2, check.attributes = FALSE)
[1] TRUE

In the base R, we may replicate the sequence of rows by the lengths of the list output

lst1 <- strsplit(df$B, ";\\s+")
df_new2 <- transform(df[rep(seq_len(nrow(df)), lengths(lst1)),], B = unlist(lst1))
row.names(df_new2) <- NULL
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading