Home How to get the benefit of list_rbind() names_to when I have have a list of lists of differing lengths?

Questions

How to get the benefit of list_rbind() names_to when I have have a list of lists of differing lengths?

November 5, 2023

I have two large objects I got from JSON downloads. In raw form, order is important.
Length(species) is the same as length(colors). I’m happy to add another column or use row names. From list_rbind(), rows_to is perfect, but…

list_rbind() doesn’t work, because colors is not a list of dfs.
as.data.frame() doesn’t work, because colors has lists of varying lengths
unlist() loses information

    species <- c("roses", "tulips", "lilies")
    colors <- list(list("red"), list("white", "yellow"), list("pink", "white"))

Desired result

      species colors
    1   roses    red
    2  tulips  white
    3  tulips yellow
    4  lilies   pink
    5  lilies  white

I can brute-force the desired result using a for loop, but I’ve got a million species, each with a minimum of one color and an average of eight. So a smarter and faster approach is needed. And no, the real-world data is not character strings. I need a smarter approach.

unnest list of lists of different lengths to dataframe does not seem to address my challenge.

Real world

> str(pg, max.level = 2)
'data.frame':   1206169 obs. of  3 variables:
 $ PG : int  1 2 3 4 5 6 7 8 9 10 ...
 $ npi:List of 1206169
  ..$ : int  1376032029 1184159188 1629504501 1598703019 1487200408 1801443619
  ..$ : int 1588809248
  ..$ : int 1497791297

>Solution :

base R

data.frame(
  species = rep(species, times = lengths(colors)),
  colors = unlist(colors)
)
#   species colors
# 1   roses    red
# 2  tulips  white
# 3  tulips yellow
# 4  lilies   pink
# 5  lilies  white

dplyr

library(dplyr)
tibble(species, colors) %>%
  unnest(colors) %>%
  mutate(colors = unlist(colors))
# # A tibble: 5 × 2
#   species colors
#   <chr>   <chr> 
# 1 roses   red   
# 2 tulips  white 
# 3 tulips  yellow
# 4 lilies  pink  
# 5 lilies  white

With a semblance of your real data:

dat <- data.frame(PG=1:3)
dat$npi <- list(c(1376032029L, 1184159188L, 1629504501L, 1598703019L, 1487200408L, 1801443619L), 1588809248L, 1497791297L)
str(dat)
# 'data.frame': 3 obs. of  2 variables:
#  $ PG : int  1 2 3
#  $ npi:List of 3
#   ..$ : int  1376032029 1184159188 1629504501 1598703019 1487200408 1801443619
#   ..$ : int 1588809248
#   ..$ : int 1497791297

# base R
dat[,-2,drop=FALSE][rep(1:nrow(dat), times = lengths(dat$npi)),,drop=FALSE] |>
  cbind(npi=unlist(dat$npi))
#     PG        npi
# 1    1 1376032029
# 1.1  1 1184159188
# 1.2  1 1629504501
# 1.3  1 1598703019
# 1.4  1 1487200408
# 1.5  1 1801443619
# 2    2 1588809248
# 3    3 1497791297

# dplyr
unnest(dat, npi)
# # A tibble: 8 × 2
#      PG        npi
#   <int>      <int>
# 1     1 1376032029
# 2     1 1184159188
# 3     1 1629504501
# 4     1 1598703019
# 5     1 1487200408
# 6     1 1801443619
# 7     2 1588809248
# 8     3 1497791297