Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Flatten rows with same identifier in R

I have a dataframe:

   Number Superclass                                Class                                      Subclass                                      
    <dbl> <chr>                                     <chr>                                      <chr>                                         
 1      3 NA                                        Class: Benzene and substituted derivatives NA                                            
 2      3 Superclass: Benzenoids                    NA                                         NA 
 3      4 Superclass: Painkiller                    NA                                         NA

I’d like to flatten the dataframe and merge up so that I have Superclass, class and subclass on the same row:

   Number Superclass                                Class                                      Subclass                                      
    <dbl> <chr>                                     <chr>                                      <chr>                                         
 1      3 Superclass: Benzenoids                    Class: Benzene and substituted derivatives NA                                            
 2      4 Superclass: Painkiller                    NA                                         NA

I’ve tried

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df%>%
  group_by(Number) %>%
  summarise_all(na.omit)

but it only includes rows where all three classes are present, and removes any that only have a superclass or super and class

>Solution :

After grouping by ‘Number’ summarise across (_all/_at/_if are deprecated in favor of across) the rest of the columns (everything()), check if all values are NA, then get the first element or else paste the non-NA elements with toString (comma separated values)

library(dplyr)
df%>%
  group_by(Number) %>%
  summarise(across(everything(), ~ if(all(is.na(.x))) first(.x) 
      else toString(.x[complete.cases(.x)])))

-output

# A tibble: 2 × 4
  Number Superclass             Class                                      Subclass
   <int> <chr>                  <chr>                                      <lgl>   
1      3 Superclass: Benzenoids Class: Benzene and substituted derivatives NA      
2      4 Superclass: Painkiller <NA>                                       NA      

data

df <- structure(list(Number = c(3L, 3L, 4L), Superclass = c(NA, "Superclass: Benzenoids", 
"Superclass: Painkiller"), Class = c("Class: Benzene and substituted derivatives", 
NA, NA), Subclass = c(NA, NA, NA)), class = "data.frame", row.names = c("1", 
"2", "3"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading