Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Convert a list with inconsistent naming to a data frame, with variable depth

Consider the following list:

x <- list("a" = list("b", "c"),
          "d" = list("e", "f" = list("g", "h")),
          "i" = list("j", "k" = list("l" = list("m", "n" = list("o", "p")))))

It is worth noting that:

  • Not all names and elements are going to be of one character
  • There is an undetermined level of nesting a priori.

Given x, my aim is to output the data frame:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

y <- data.frame(
  main_level = c(rep("a", 2), rep("d", 3), rep("i", 4)),
  level1 = c("b", "c", "e", rep("f", 2), "j", rep("k", 3)),
  level2 = c(NA, NA, NA, "g", "h", NA, "l", "l", "l"),
  level3 = c(NA, NA, NA,  NA,  NA, NA, "m", "n", "n"), 
  level4 = c(NA, NA, NA,  NA,  NA, NA, NA, "o", "p")
)
> y
  main_level level1 level2 level3 level4
1          a      b   <NA>   <NA>   <NA>
2          a      c   <NA>   <NA>   <NA>
3          d      e   <NA>   <NA>   <NA>
4          d      f      g   <NA>   <NA>
5          d      f      h   <NA>   <NA>
6          i      j   <NA>   <NA>   <NA>
7          i      k      l      m   <NA>
8          i      k      l      n      o
9          i      k      l      n      p

NOTE that a typo was corrected in y above.

The above implies that there will be a variable number of columns as well, depending on the depth of the nesting.

Solutions online that I’ve found, when it comes to nested lists, assume that the list naming structure is more or less consistent, which is of course not the case here; or that the list depth is identical. For instance, the solutions at How to convert a nested lists to dataframe in R? and Converting nested list to dataframe do not apply because they are much more consistent in their naming.

>Solution :

Here’s a way mainly relying on rrapply:

rrapply::rrapply(x, how = "melt") |>
  apply(1, function(row){
    newrow <- row[grep("[A-Za-z]", row)]
    length(newrow) <- purrr::vec_depth(x) - 1
    newrow
  }) |> 
  t() |> as.data.frame() |>
  `colnames<-`(c("main_level", paste0("level", 1:4)))

output

  main_level level1 level2 level3 level4
1          a      b   <NA>   <NA>   <NA>
2          a      c   <NA>   <NA>   <NA>
3          d      e   <NA>   <NA>   <NA>
4          d      f      g   <NA>   <NA>
5          d      f      h   <NA>   <NA>
6          i      j   <NA>   <NA>   <NA>
7          i      k      l      m   <NA>
8          i      k      l      n      o
9          i      k      l      n      p

Note that so far it is quite crude. There might be a better way to reshape the output of rrapply. For instance, row[grep("[A-Za-z]", row)] may not work every time. I have also not tested whether length(newrow) <- purrr::vec_depth(x) - 1 is a good way of guessing the length, but it works here.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading