Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merge two datasets but one of them is year_month and the other is year_month_week

I practice data merging using R nowadays. Here are simple two data df1 and df2.

df1<-data.frame(id=c(1,1,1,2,2,2,2),
                year_month=c(202205,202206,202207,202204,202205,202206,202207),
                points=c(65,58,47,21,25,27,43))

df2<-data.frame(id=c(1,1,1,2,2,2),
                year_month_week=c(2022052,2022053,2022061,2022043,2022051,2022052),
                temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))

For df1, 202205 in year_month column means May 2022.
For df2, 2022052 in year_month_week column means 2nd week of May, 2022.
I want to merge df1 and df2 with respect to year_month_week. So, all the elements of df2 are left, but some values of df2 can be copied.
For example, 202205 in year_month includes 2022052 and 2022053. There is no column points in df2. In this case, 65 is copied. My expected output looks like this:

df<-data.frame(id=c(1,1,1,2,2,2),
               year_month_week=c(2022052,2022053,2022061,2022043,2022051,2022052),
               temperature=c(36.1,36.3,36.6,34.3,34.9,35.3),
               points=c(65,65,58,21,25,25))

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Create a temporary year_month column in df2 by taking the first six characters of year_month_week, then do a left join on df1 by year_month and id before removing the temporary column.

Using tidyverse, we could do this as follows:

library(tidyverse)

df2 %>%
  mutate(year_month = as.numeric(substr(year_month_week, 1, 6))) %>%
  left_join(df1, by = c('year_month', 'id')) %>%
  select(-year_month)
#>   id year_month_week temperature points
#> 1  1         2022052        36.1     65
#> 2  1         2022053        36.3     65
#> 3  1         2022061        36.6     58
#> 4  2         2022043        34.3     21
#> 5  2         2022051        34.9     25
#> 6  2         2022052        35.3     25

Or in base R using merge:

df2$year_month <- substr(df2$year_month_week, 1, 6)
merge(df2, df1, by = c('year_month', 'id'))[-1]
#>   id year_month_week temperature points
#> 1  2         2022043        34.3     21
#> 2  1         2022052        36.1     65
#> 3  1         2022053        36.3     65
#> 4  2         2022051        34.9     25
#> 5  2         2022052        35.3     25
#> 6  1         2022061        36.6     58

If

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading