Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R dplyr full_join – no common key, need common columns to blend together

I have these two dataframes for example:

dates = c('2020-11-19', '2020-11-20', '2020-11-21')
df1 <- data.frame(dates, area = c('paris', 'london', 'newyork'), 
                  rating = c(10, 5, 6),
                  rating2 = c(5, 6, 7))

df2 <- data.frame(dates, area = c('budapest', 'moscow', 'valencia'), 
                  rating = c(1, 2, 1))
> df1
       dates    area rating rating2
1 2020-11-19   paris     10       5
2 2020-11-20  london      5       6
3 2020-11-21 newyork      6       7
> df2
       dates     area rating
1 2020-11-19 budapest      1
2 2020-11-20   moscow      2
3 2020-11-21 valencia      1

When performing an outer join using dplyr:

df <- df1 %>%
  full_join(df2, by = c('dates', 'area'))

the result is like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

       dates     area rating.x rating2 rating.y
1 2020-11-19    paris       10       5       NA
2 2020-11-20   london        5       6       NA
3 2020-11-21  newyork        6       7       NA
4 2020-11-19 budapest       NA      NA        1
5 2020-11-20   moscow       NA      NA        2
6 2020-11-21 valencia       NA      NA        1

i.e. the rating columns from the two dataframes are not blended together but two separate columns are created.

How do I get a result like this?

       dates     area rating   rating2 
1 2020-11-19    paris       10       5       
2 2020-11-20   london        5       6       
3 2020-11-21  newyork        6       7       
4 2020-11-19 budapest        1      NA        
5 2020-11-20   moscow        2      NA        
6 2020-11-21 valencia        1      NA        

>Solution :

What you’re looking for is dplyr::bind_rows(), which will preserve common columns and fill NA for columns that only exist in one of the data frames:

> bind_rows(df1, df2)
       dates     area rating rating2
1 2020-11-19    paris     10       5
2 2020-11-20   london      5       6
3 2020-11-21  newyork      6       7
4 2020-11-19 budapest      1      NA
5 2020-11-20   moscow      2      NA
6 2020-11-21 valencia      1      NA

Note that you could also continue using full_join() – but you must ensure that all common columns between the data frames are included as keys if you don’t want columns to be split:

> full_join(
+   df1, df2,
+   by = c("dates", "area", "rating")
+ )
       dates     area rating rating2
1 2020-11-19    paris     10       5
2 2020-11-20   london      5       6
3 2020-11-21  newyork      6       7
4 2020-11-19 budapest      1      NA
5 2020-11-20   moscow      2      NA
6 2020-11-21 valencia      1      NA

The documentation for dplyr joins mentions:

Output columns include all x columns and all y columns. If columns in x and y have the same name (and aren’t included in by), suffixes are added to disambiguate.

You could also avoid this issue by not specifying by, in which case dplyr will use all common columns.

> full_join(df1, df2)
Joining, by = c("dates", "area", "rating")
       dates     area rating rating2
1 2020-11-19    paris     10       5
2 2020-11-20   london      5       6
3 2020-11-21  newyork      6       7
4 2020-11-19 budapest      1      NA
5 2020-11-20   moscow      2      NA
6 2020-11-21 valencia      1      NA

As far as I know, both methods are good for your use case. In fact, I believe that the practical advantage full_join() has over bind_rows() is precisely this behaviour you wanted to avoid here, i.e. splitting columns that aren’t keys.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading