Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Collapse rows based on another column duplicate value in R

I have a df:

A <- c("A", "A123", "A123", "B123", "B123", "B")
B <- c("NA", "as", "bp", "df", "kl", "c")

df <- data.frame(A, B) 

and I would like to create a df in which the output would be

A <- c("A", "A123", "B123", "B")
C <- c("NA", "as;bp", "df;kl", "c")
df2 <- data.frame(A,C)

This new column is based on if there is a duplicate in column A, then combine the values in column B to make a new column, all other unique values in column B that correspond single/unique values in A would be carried over to column C.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Any help in generating a code where you get column C would be appreciated as I don’t even know where to begin in coding for this.
thank you!

>Solution :

Use tidyverse with reframe to paste the non-missing ‘B’ values for each ‘A’ group – if all values are missing, return the B column

library(dplyr)
library(stringr)
df %>%
   reframe(C = if(all(is.na(B))) B else 
     str_c(B[complete.cases(B)], collapse = ";"), .by = "A")

-output

     A     C
1    A    NA
2 A123 as;bp
3 B123 df;kl
4    B     c
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading