Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to create a list of dataframes based on two other dataframes

I have two dataframes.
The first one has thousands of columns which represent a given city followed by the year, like "London_2001", "London_2002", and some measurements in the rows.
The second dataframe has two columns. The first one is a territorial region, and the second one a list of cities. Something like:


df1 <- data.frame(London_2021 = c(3,6,83,25,4),
                  London_2022 = c(3,6,83,25,4),
                  Berlin_2021 = c(3,6,83,25,4),
                  Berlin_2022 = c(3,6,83,25,4),
                  BerlinSuburbs_2021 = c(3,6,83,25,4),
                  BerlinSuburbs_2022 = c(3,6,83,25,4),
                  Tokyo_2021 = c(3,6,83,25,4),
                  Tokyo_2022 = c(3,6,83,25,4))

df2 <- data.frame(TerritorialRegion = c("Pacific", "Europe"),
                  Cities = c("Tokyo", "London, Berlin"))

I want to create a list, where each of the elements of the list will be a dataframe.
This dataframe will be called as the territorial region, and will have as columns each of the cities.

In this case your list will be something like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

List
 [1] Pacific
 [2] Europe

And if you access europe as in list[2] you would get:

London_2021 | London_2022 | Berlin_2021 | Berlin_2022

As you can see, we want strictly the same name, so BerlibSuburbswouldn’t appear in our dataframe,

I tried this, but is not working:

library(dplyr)
library(tidyr)

# step 1
df2 <- df2 %>% 
  separate_rows(Cities, sep = ",\\s*")

# step 2
df1_long <- df1 %>% 
  pivot_longer(cols = everything(), 
               names_to = c("City", "Year"), 
               names_sep = "_")

# step 3
df_merged <- df2 %>% 
  left_join(df1_long, by = c("Cities" = "City"))

# step 4
df_list <- df_merged %>% 
  split(.$TerritorialRegion)

>Solution :

You could get this output using purrr mapping and dplyr::select

purrr::map(df2$Cities, function(x) {
  select(df1, starts_with(paste0(strsplit(x, ",\\s*")[[1]], "_")))
}) |> 
  purrr::set_names(df2$TerritorialRegion)

which returns

$Pacific
  Tokyo_2021 Tokyo_2022
1          3          3
2          6          6
3         83         83
4         25         25
5          4          4

$Europe
  London_2021 London_2022 Berlin_2021 Berlin_2022
1           3           3           3           3
2           6           6           6           6
3          83          83          83          83
4          25          25          25          25
5           4           4           4           4
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading