Merge list of dfs AND extract index as a new column

March 18, 2024

I have this list of dfs:

my_list <-  list(structure(list(observations = c(1L, 5L), variables = c(4L, 
                                                                        8L)), class = "data.frame", row.names = c("asp_202003...Copy.xlsx", 
                                                                                                                  "asp_202003.xlsx")), structure(list(observations = c(3L, 1L), 
                                                                                                                                                      variables = 5:4), class = "data.frame", row.names = c("eay_201008_a.xlsx", 
                                                                                                                                                                                                            "eay_202003.xlsx")), structure(list(observations = 3:4, variables = c(4L, 
                                                                                                                                                                                                                                                                                  6L)), class = "data.frame", row.names = c("wana_202309...Copy.xlsx", 
                                                                                                                                                                                                                                                                                                                            "wana_202309.xlsx")))

I merge the dfs like so:

my_merge <- my_list %>% reduce(full_join)

Output:
my_merge

#  observations variables
#1            1         4
#2            5         8
#3            3         5
#4            3         4
#5            4         6

But I would want to keep the index names (or extract them) in a new column called ‘file’, like so:

Desired output:

# file                      observations     variables
# asp_202003...Copy.xlsx               1             4
# asp_202003.xlsx                      5             8
# etc.

Also note, the desired output should have 6 rows, not 5 as in current my_merge object! In current my_merge object, identical values between two of the rows means one was ‘lost’. This is also why I want to set file name as index.

>Solution :

You could make them into tibbles first and save row names as a variable then use bind_rows().

library(dplyr)
my_list <-  list(structure(list(observations = c(1L, 5L), variables = c(4L, 8L)), class = "data.frame", 
                           row.names = c("asp_202003...Copy.xlsx", "asp_202003.xlsx")), 
                 structure(list(observations = c(3L, 1L), variables = 5:4), class = "data.frame", 
                           row.names = c("eay_201008_a.xlsx", "eay_202003.xlsx")), 
                 structure(list(observations = 3:4, variables = c(4L, 6L)), class = "data.frame", 
                           row.names = c("wana_202309...Copy.xlsx", "wana_202309.xlsx")))

bind_rows(purrr::map(my_list, ~as_tibble(.x, rownames="file")))
#> # A tibble: 6 × 3
#>   file                    observations variables
#>   <chr>                          <int>     <int>
#> 1 asp_202003...Copy.xlsx             1         4
#> 2 asp_202003.xlsx                    5         8
#> 3 eay_201008_a.xlsx                  3         5
#> 4 eay_202003.xlsx                    1         4
#> 5 wana_202309...Copy.xlsx            3         4
#> 6 wana_202309.xlsx                   4         6

^{Created on 2024-03-18 with reprex v2.0.2}