Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to unnest a data frame containing list of list with varied length?

I was trying to unnest the the following data frame.

df.org <- structure(list(Gene = "ARIH1", Description = "E3 ubiquitin-protein ligase ARIH1", 
    condition2_cellline = list(c("MCF7", "Jurkat")), condition2_activity = list(
        c(40.8284023668639, 13.26973)), condition2_concentration = list(
        c("100uM", "100uM")), condition3_cellline = list("Jurkat"), 
    condition3_activity = list(-4.60251), condition3_concentration = list(
        "100uM")), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame"))

This is my code:

df.output <- df.ori %>% 
  unnest(where(is.list), keep_empty = T)

This is what I got:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

structure(list(Gene = c("ARIH1", "ARIH1"), Description = c("E3 ubiquitin-protein ligase ARIH1", 
"E3 ubiquitin-protein ligase ARIH1"), condition2_cellline = c("MCF7", 
"Jurkat"), condition2_activity = c(40.8284023668639, 13.26973
), condition2_concentration = c("100uM", "100uM"), condition3_cellline = c("Jurkat", 
"Jurkat"), condition3_activity = c(-4.60251, -4.60251), condition3_concentration = c("100uM", 
"100uM")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-2L))

Is there a way to avoid duplicating those variables with a shorter length? The following output is what I want to get.

df.desired <- structure(list(Gene = c("ARIH1", "ARIH1"), Description = c("E3 ubiquitin-protein ligase ARIH1", 
"E3 ubiquitin-protein ligase ARIH1"), condition2_cellline = c("MCF7", 
"Jurkat"), condition2_activity = c(40.8284023668639, 13.26973
), condition2_concentration = c("100uM", "100uM"), condition3_cellline = c(NA, 
"Jurkat"), condition3_activity = c(NA, -4.60251), condition3_concentration = c(NA, 
"100uM")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-2L))

Thanks so much for any help!

>Solution :

Here is suggestion how it could work.

  1. We pivot_longer all listed columns.
  2. apply the the function to create lists of same length
  3. pivot back and unnest.
library(dplyr)
library(tidyr)

df.org %>% 
  pivot_longer(cols = starts_with("condition")) %>% 
  mutate(value = lapply(value, `length<-`, max(lengths(value)))) %>% 
  pivot_wider(names_from = name, values_from = value) %>% 
  unnest(cols = c(condition2_cellline, condition2_activity, condition2_concentration, 
                  condition3_cellline, condition3_activity, condition3_concentration)) 
Gene  Description        condition2_cell~ condition2_acti~ condition2_conc~ condition3_cell~ condition3_acti~ condition3_conc~
  <chr> <chr>              <chr>                       <dbl> <chr>            <chr>                       <dbl> <chr>           
1 ARIH1 E3 ubiquitin-prot~ MCF7                         40.8 100uM            Jurkat                      -4.60 100uM           
2 ARIH1 E3 ubiquitin-prot~ Jurkat                       13.3 100uM            NA                          NA    NA              
> 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading