Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to combine member of a column, collect them in a data frame and give them a new name, in R?

I want to make a new data frame based on my old data frame and combine members of a specific column while I give them a new name:
for example, this is my old data frame:

df <- structure(list(ID= c("x1", "x1", "x1", "x1", "x1", "x1", "x2", "x2", "x2", "x2", "x2", "x2", "x3", "x3", "x3", "x3", "x3", "x3", "x1", "x1", "x1", "x1", "x1", "x1", "x2", "x2", "x2", "x2", "x2", "x2", "x3", "x3", "x3", "x3", "x3", "x3"),
col1=c("a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2","a2"),
col2= c("a", "b", "c", "d", "e", "f", "a", "b", "c", "d", "e", "f","a", "b", "c", "d", "e", "f","a", "b", "c", "d", "e", "f", "a", "b", "c", "d", "e", "f","a", "b", "c", "d", "e", "f"),
col3= c(2,13,1,21,0,5,3,0,6,4,50,0,0,0,0,9,5,0,51,3,6,0,0,9,89,4,29,1,4,17,6,16,9,1,0,0)), 
                class = "data.frame", row.names = c(NA,-36L))

and for the new dataframe I want to have a new column based on col2, so combine abc, where there is any of a or b or c, name it as abc.1. Combine de where there is d or e, name it as de.5 and finally where it is f name it as f.10. and for the new.col3 the SUM of their value in old col3.

The result would be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df2<- structure(list(col1=c("a1","a1","a1","a2","a2","a2"),
new.col2= c("abc.1", "de.5", "f.10", "abc.1", "de.5", "f.10"),
new.col3=c(25,89,5,213,6,26)),
                class = "data.frame", row.names = c(NA,-6L))

>Solution :

Create groups with case_when and then use summarise to collapse rows by groups and compute the sum of col3 by group.

library(dplyr)
df %>% 
  group_by(col1, gp = case_when(col2 %in% c("a", "b", "c") ~ 1,
                        col2 %in% c("d", "e") ~ 5,
                        col2 == "f" ~ 10)) %>% 
  summarise(new.col2 = paste(paste0(unique(col2), collapse = ""), unique(gp), sep = "."),
            new.col3 = sum(col3))

output

# A tibble: 6 × 4
# Groups:   col1 [2]
  col1     gp new.col2 new.col3
  <chr> <dbl> <chr>       <dbl>
1 a1        1 abc.1          25
2 a1        5 de.5           89
3 a1       10 f.10            5
4 a2        1 abc.1         213
5 a2        5 de.5            6
6 a2       10 f.10           26
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading