Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to add values of one dataframe to another, double-conditioned by the receiving dataframe?

I am trying to merge two dataframes in quite a specific way. They are each similar to these two:

df1 <- data.frame(Countries = c("Portugal", "Andorra", "Spain", "Portugal", "Portugal", "Portugal", "Spain"),
                 Year_of_order = c("2015", "2016", "2014", "2016", "2014", "2015", "2015"),
                 Type_of_order = c("vegetables", "meat", "fruits", "fruits", "meat", "vegetable", "meat")
)

df2 <- data.frame(Names = c("Andorra", "Andorra", "Andorra", "Andorra", "Portugal", "Portugal", "Portugal", "Spain", "Spain", "Spain"),
                 Year = c("2014", "2015", "2016", "2017", "2014", "2015", "2016", "2014", "2015", "2016"),
                 Budget_spent = c("452", "444", "429", "456", "854", "876", "833", "102", "104", "96")
)

The second dataframe is the product of a pivot_longer , that is why it looks a bit weird.

What I want is to add a new column to df1, called Budget_spent : its values should be those indicated in df2 for the corresponding Year and Names. Basically, when df1 both Year_of_order and Countries match df2’s Year and Names , I want to have the value of df2’s Budget_spent. To my understanding, I would need to do a double-conditioned match, but maybe I’m wrong.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

In case my poor explanation was not clear, here’s what I would like the output to look like:

>df1
   Countries  Year_of_order  Type_of_order  Budget_spent
1  Portugal   2015           vegetables     876
2  Andorra    2016           meat           429
3  Spain      2014           fruits         102
4  Portugal   2016           fruits         833
5  Portugal   2014           meat           854
6  Portugal   2015           vegetable      876
7  Spain      2015           meat           104

I tried quite a lot of things (match, merge, if, when , for row), with little success, mostly because I can’t even grasp conceptually what process should I apply:

#Here's the most complete attempt, the rest look quite torn apart

df1$Budget_spent <- for (row in df1$Countries) {
  match(df1$Countries, df2$Names)+
  match(df1$Year_of_order, df2$Year)+
  df2$Budget_spent}

#Error message: argument non-numeric for binary operator

I would be very grateful of any help or indication on how to address the issue.

>Solution :

Use left-join from dplyr:

library(tidyverse)
left_join(df1, df2, join_by(Countries == Names, Year_of_order == Year))

  Countries Year_of_order Type_of_order Budget_spent
1  Portugal          2015    vegetables          876
2   Andorra          2016          meat          429
3     Spain          2014        fruits          102
4  Portugal          2016        fruits          833
5  Portugal          2014          meat          854
6  Portugal          2015     vegetable          876
7     Spain          2015          meat          104
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading