I have two dataframes.
The first one has thousands of columns which represent a given city followed by the year, like "London_2001", "London_2002", and some measurements in the rows.
The second dataframe has two columns. The first one is a territorial region, and the second one a list of cities. Something like:
df1 <- data.frame(London_2021 = c(3,6,83,25,4),
London_2022 = c(3,6,83,25,4),
Berlin_2021 = c(3,6,83,25,4),
Berlin_2022 = c(3,6,83,25,4),
BerlinSuburbs_2021 = c(3,6,83,25,4),
BerlinSuburbs_2022 = c(3,6,83,25,4),
Tokyo_2021 = c(3,6,83,25,4),
Tokyo_2022 = c(3,6,83,25,4))
df2 <- data.frame(TerritorialRegion = c("Pacific", "Europe"),
Cities = c("Tokyo", "London, Berlin"))
I want to create a list, where each of the elements of the list will be a dataframe.
This dataframe will be called as the territorial region, and will have as columns each of the cities.
In this case your list will be something like:
List
[1] Pacific
[2] Europe
And if you access europe as in list[2]
you would get:
London_2021 | London_2022 | Berlin_2021 | Berlin_2022
As you can see, we want strictly the same name, so BerlibSuburbs
wouldn’t appear in our dataframe,
I tried this, but is not working:
library(dplyr)
library(tidyr)
# step 1
df2 <- df2 %>%
separate_rows(Cities, sep = ",\\s*")
# step 2
df1_long <- df1 %>%
pivot_longer(cols = everything(),
names_to = c("City", "Year"),
names_sep = "_")
# step 3
df_merged <- df2 %>%
left_join(df1_long, by = c("Cities" = "City"))
# step 4
df_list <- df_merged %>%
split(.$TerritorialRegion)
>Solution :
You could get this output using purrr
mapping and dplyr::select
purrr::map(df2$Cities, function(x) {
select(df1, starts_with(paste0(strsplit(x, ",\\s*")[[1]], "_")))
}) |>
purrr::set_names(df2$TerritorialRegion)
which returns
$Pacific
Tokyo_2021 Tokyo_2022
1 3 3
2 6 6
3 83 83
4 25 25
5 4 4
$Europe
London_2021 London_2022 Berlin_2021 Berlin_2022
1 3 3 3 3
2 6 6 6 6
3 83 83 83 83
4 25 25 25 25
5 4 4 4 4