I practice data merging using R nowadays. Here are simple two data df1 and df2.
df1<-data.frame(id=c(1,1,1,2,2,2,2),
year_month=c(202205,202206,202207,202204,202205,202206,202207),
points=c(65,58,47,21,25,27,43))
df2<-data.frame(id=c(1,1,1,2,2,2),
year_month_week=c(2022052,2022053,2022061,2022043,2022051,2022052),
temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))
For df1, 202205 in year_month column means May 2022.
For df2, 2022052 in year_month_week column means 2nd week of May, 2022.
I want to merge df1 and df2 with respect to year_month_week. So, all the elements of df2 are left, but some values of df2 can be copied.
For example, 202205 in year_month includes 2022052 and 2022053. There is no column points in df2. In this case, 65 is copied. My expected output looks like this:
df<-data.frame(id=c(1,1,1,2,2,2),
year_month_week=c(2022052,2022053,2022061,2022043,2022051,2022052),
temperature=c(36.1,36.3,36.6,34.3,34.9,35.3),
points=c(65,65,58,21,25,25))
>Solution :
Create a temporary year_month column in df2 by taking the first six characters of year_month_week, then do a left join on df1 by year_month and id before removing the temporary column.
Using tidyverse, we could do this as follows:
library(tidyverse)
df2 %>%
mutate(year_month = as.numeric(substr(year_month_week, 1, 6))) %>%
left_join(df1, by = c('year_month', 'id')) %>%
select(-year_month)
#> id year_month_week temperature points
#> 1 1 2022052 36.1 65
#> 2 1 2022053 36.3 65
#> 3 1 2022061 36.6 58
#> 4 2 2022043 34.3 21
#> 5 2 2022051 34.9 25
#> 6 2 2022052 35.3 25
Or in base R using merge:
df2$year_month <- substr(df2$year_month_week, 1, 6)
merge(df2, df1, by = c('year_month', 'id'))[-1]
#> id year_month_week temperature points
#> 1 2 2022043 34.3 21
#> 2 1 2022052 36.1 65
#> 3 1 2022053 36.3 65
#> 4 2 2022051 34.9 25
#> 5 2 2022052 35.3 25
#> 6 1 2022061 36.6 58
If