Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get the benefit of list_rbind() names_to when I have have a list of lists of differing lengths?

I have two large objects I got from JSON downloads. In raw form, order is important.
Length(species) is the same as length(colors). I’m happy to add another column or use row names. From list_rbind(), rows_to is perfect, but…

  • list_rbind() doesn’t work, because colors is not a list of dfs.
  • as.data.frame() doesn’t work, because colors has lists of varying lengths
  • unlist() loses information
    species <- c("roses", "tulips", "lilies")
    colors <- list(list("red"), list("white", "yellow"), list("pink", "white"))

Desired result

      species colors
    1   roses    red
    2  tulips  white
    3  tulips yellow
    4  lilies   pink
    5  lilies  white

I can brute-force the desired result using a for loop, but I’ve got a million species, each with a minimum of one color and an average of eight. So a smarter and faster approach is needed. And no, the real-world data is not character strings. I need a smarter approach.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

unnest list of lists of different lengths to dataframe does not seem to address my challenge.

Real world

> str(pg, max.level = 2)
'data.frame':   1206169 obs. of  3 variables:
 $ PG : int  1 2 3 4 5 6 7 8 9 10 ...
 $ npi:List of 1206169
  ..$ : int  1376032029 1184159188 1629504501 1598703019 1487200408 1801443619
  ..$ : int 1588809248
  ..$ : int 1497791297

>Solution :

base R

data.frame(
  species = rep(species, times = lengths(colors)),
  colors = unlist(colors)
)
#   species colors
# 1   roses    red
# 2  tulips  white
# 3  tulips yellow
# 4  lilies   pink
# 5  lilies  white

dplyr

library(dplyr)
tibble(species, colors) %>%
  unnest(colors) %>%
  mutate(colors = unlist(colors))
# # A tibble: 5 × 2
#   species colors
#   <chr>   <chr> 
# 1 roses   red   
# 2 tulips  white 
# 3 tulips  yellow
# 4 lilies  pink  
# 5 lilies  white 

With a semblance of your real data:

dat <- data.frame(PG=1:3)
dat$npi <- list(c(1376032029L, 1184159188L, 1629504501L, 1598703019L, 1487200408L, 1801443619L), 1588809248L, 1497791297L)
str(dat)
# 'data.frame': 3 obs. of  2 variables:
#  $ PG : int  1 2 3
#  $ npi:List of 3
#   ..$ : int  1376032029 1184159188 1629504501 1598703019 1487200408 1801443619
#   ..$ : int 1588809248
#   ..$ : int 1497791297

# base R
dat[,-2,drop=FALSE][rep(1:nrow(dat), times = lengths(dat$npi)),,drop=FALSE] |>
  cbind(npi=unlist(dat$npi))
#     PG        npi
# 1    1 1376032029
# 1.1  1 1184159188
# 1.2  1 1629504501
# 1.3  1 1598703019
# 1.4  1 1487200408
# 1.5  1 1801443619
# 2    2 1588809248
# 3    3 1497791297

# dplyr
unnest(dat, npi)
# # A tibble: 8 × 2
#      PG        npi
#   <int>      <int>
# 1     1 1376032029
# 2     1 1184159188
# 3     1 1629504501
# 4     1 1598703019
# 5     1 1487200408
# 6     1 1801443619
# 7     2 1588809248
# 8     3 1497791297
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading