Combining two dataframes based on two variables when one variable is a single column format in DF1 and across several columns in DF2

This is my first post so hopefully I have included all the necessary information. I don’t think it replicates previously answered questions.

Eventually I am trying to combine numerous datasets together to create a single one with all the necessary variables for our project. So far I have been using dplyr to combine these sets based on by = c(‘Country’ and ‘Year’). This has been successful when the data are presented in the same way in both e.g. a column with country, a column for year, and then the variables in the following columns (see below).

Country Year Temperature
Germany 2010 20
Germany 2009 19
Germany 2008 18
Australia 2010 24
Australia 2009 23
Australia 2018 22
Spain 2010 23
Spain 2009 24
Spain 2008 21

However, I can’t seem to work out how to combine them when they are in different formats e.g. one as the example above and the other as in the example below (containing data on annual rainfall mm).

Country 2010 2009 2008
Germany 800 900 1000
Australia 700 600 750
Spain 800 650 550

I would like a combined table in the format of the first example but now with an additional column for annual rainfall taken from the second df.

Any suggestions would be greatly appreciated as I have been trying without success for quite a while. I thought some form of Pivot (?) but unfortunately I don’t have any examples of my attempts to include because I am at a complete loss.

>Solution :

Your second dataframe is in wide format, so you should convert it into long format

library(dplyr, tidyr)

df2 <- read.table(text="Country 2010    2009    2008
Germany 800 900 1000
Australia   700 600 750
Spain   800 650 550", header=TRUE)


df2 %>% 
  pivot_longer(-Country, names_to = "Year", values_to = "Temperature") %>% 
  mutate(Year = as.integer(sub("X", "", Year))) 

  Country    Year Temperature
  <chr>     <int>       <int>
1 Germany    2010         800
2 Germany    2009         900
3 Germany    2008        1000
4 Australia  2010         700
5 Australia  2009         600
6 Australia  2008         750
7 Spain      2010         800
8 Spain      2009         650
9 Spain      2008         550

Then you can use merge as you said

Leave a Reply