I am trying to merge two dataframes in quite a specific way. They are each similar to these two:
df1 <- data.frame(Countries = c("Portugal", "Andorra", "Spain", "Portugal", "Portugal", "Portugal", "Spain"),
Year_of_order = c("2015", "2016", "2014", "2016", "2014", "2015", "2015"),
Type_of_order = c("vegetables", "meat", "fruits", "fruits", "meat", "vegetable", "meat")
)
df2 <- data.frame(Names = c("Andorra", "Andorra", "Andorra", "Andorra", "Portugal", "Portugal", "Portugal", "Spain", "Spain", "Spain"),
Year = c("2014", "2015", "2016", "2017", "2014", "2015", "2016", "2014", "2015", "2016"),
Budget_spent = c("452", "444", "429", "456", "854", "876", "833", "102", "104", "96")
)
The second dataframe is the product of a pivot_longer , that is why it looks a bit weird.
What I want is to add a new column to df1, called Budget_spent : its values should be those indicated in df2 for the corresponding Year and Names. Basically, when df1 both Year_of_order and Countries match df2’s Year and Names , I want to have the value of df2’s Budget_spent. To my understanding, I would need to do a double-conditioned match, but maybe I’m wrong.
In case my poor explanation was not clear, here’s what I would like the output to look like:
>df1
Countries Year_of_order Type_of_order Budget_spent
1 Portugal 2015 vegetables 876
2 Andorra 2016 meat 429
3 Spain 2014 fruits 102
4 Portugal 2016 fruits 833
5 Portugal 2014 meat 854
6 Portugal 2015 vegetable 876
7 Spain 2015 meat 104
I tried quite a lot of things (match, merge, if, when , for row), with little success, mostly because I can’t even grasp conceptually what process should I apply:
#Here's the most complete attempt, the rest look quite torn apart
df1$Budget_spent <- for (row in df1$Countries) {
match(df1$Countries, df2$Names)+
match(df1$Year_of_order, df2$Year)+
df2$Budget_spent}
#Error message: argument non-numeric for binary operator
I would be very grateful of any help or indication on how to address the issue.
>Solution :
Use left-join from dplyr:
library(tidyverse)
left_join(df1, df2, join_by(Countries == Names, Year_of_order == Year))
Countries Year_of_order Type_of_order Budget_spent
1 Portugal 2015 vegetables 876
2 Andorra 2016 meat 429
3 Spain 2014 fruits 102
4 Portugal 2016 fruits 833
5 Portugal 2014 meat 854
6 Portugal 2015 vegetable 876
7 Spain 2015 meat 104