Merge wide and long dataset into wide dataset in R (by combining certain column headers and certain cells into new column headers)

I’ve 2 datasets

  1. Participant performance: 1 participant/row
  2. Participant ratings: >1 participants/row (1 rater/row)

I need to merge the 2 datasets into 1 dataset with 1 participant/row (including NAs).

I’ve included reproducible toy examples of the 2 datasets I have as well as a reproducible toy example of the merged dataset that I need below.

pPerformance <- data.frame(Participant = c(1, 2, 3, 4, 5),
                           Session = c("A", "B", "B", "A", "C"),
                           Group = c(1, 1, 2, 1, 1),
                           Answer1 = c("incorrect", "correct", "correct", "incorrect", "correct"),
                           Condition = c("chat", "essay", "essay", "chat", "essay"),
                           Answer2 = c("correct", "correct", "correct", "incorrect", "incorrect"),
                           Duration = c(96, 43, 56, 75, 23)
                           )
pPerformance
  Participant Session Group   Answer1 Condition   Answer2 Duration
1           1       A     1 incorrect      chat   correct       96
2           2       B     1   correct     essay   correct       43
3           3       B     2   correct     essay   correct       56
4           4       A     1 incorrect      chat incorrect       75
5           5       C     1   correct     essay incorrect       23

pRatings <- data.frame(Participant = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
                       Session = c("A", "B", "B", "A", "C", "A", "B", "B", "A", "C"),
                       Group = c(1, 1, 2, 1, 1, 1, 1, 2, 1, 1),
                       Rater = c("Fran", "Fran", "Fran", "Fran", "Fran",
                                 "Fred", "Fred", "Fred", "Fred", "Fred"),
                       Rating1 = c("Yes", "No", "Yes", "Yes", "No", "No", "Yes", "Yes", "Yes", NA),
                       Rating2 = c(3, 0, 1, 2, 0, 2, 1, 1, 2, 1)
                       )
pRatings
   Participant Session Group Rater Rating1 Rating2
1            1       A     1  Fran     Yes       3
2            2       B     1  Fran      No       0
3            3       B     2  Fran     Yes       1
4            4       A     1  Fran     Yes       2
5            5       C     1  Fran      No       0
6            1       A     1  Fred      No       2
7            2       B     1  Fred     Yes       1
8            3       B     2  Fred     Yes       1
9            4       A     1  Fred     Yes       2
10           5       C     1  Fred      NA       1

pMerged <- data.frame(Participant = c(1, 2, 3, 4, 5),
                      Session = c("A", "B", "B", "A", "C"),
                      Group = c(1, 1, 2, 1, 1),
                      Answer1 = c("incorrect", "correct", "correct", "incorrect", "correct"),
                      Condition = c("chat", "essay", "essay", "chat", "essay"),
                      Answer2 = c("correct", "correct", "correct", "incorrect", "incorrect"),
                      Duration = c(96, 43, 56, 75, 23),
                      Fran_Rating1 = c("Yes", "No", "Yes", "Yes", "No"),
                      Fred_Rating1 = c("No", "Yes", "Yes", "Yes", NA),
                      Fran_Rating2 = c(3, 0, 1, 2, 0),
                      Fred_Rating2 = c(2, 1, 1, 2, 1)
                      )
pMerged
  Participant Session Group   Answer1 Condition   Answer2 Duration Fran_Rating1 Fred_Rating1
1           1       A     1 incorrect      chat   correct       96          Yes           No
2           2       B     1   correct     essay   correct       43           No          Yes
3           3       B     2   correct     essay   correct       56          Yes          Yes
4           4       A     1 incorrect      chat incorrect       75          Yes          Yes
5           5       C     1   correct     essay incorrect       23           No           NA
  Fran_Rating2 Fred_Rating2
1            3            2
2            0            1
3            1            1
4            2            2
5            0            1

Currently, I am merging by "Participant", but that’s resulting in multiple rows per participant (10 rows instead of just 5 in the toy example). I am not sure how to combine the "Rater" cells and "Rating…" column headers the way they need to be combined (e.g., "Fran_Rating1" in the "pMerged" dataframe above).

pMergedReproducibly <- merge(pPerformance, pRatings, by = c("Participant"))      
pMergedReproducibly
   Participant Session.x Group.x   Answer1 Condition   Answer2 Duration Session.y Group.y Rater
1            1         A       1 incorrect      chat   correct       96         A       1  Fran
2            1         A       1 incorrect      chat   correct       96         A       1  Fred
3            2         B       1   correct     essay   correct       43         B       1  Fred
4            2         B       1   correct     essay   correct       43         B       1  Fran
5            3         B       2   correct     essay   correct       56         B       2  Fred
6            3         B       2   correct     essay   correct       56         B       2  Fran
7            4         A       1 incorrect      chat incorrect       75         A       1  Fred
8            4         A       1 incorrect      chat incorrect       75         A       1  Fran
9            5         C       1   correct     essay incorrect       23         C       1  Fran
10           5         C       1   correct     essay incorrect       23         C       1  Fred
   Rating1 Rating2
1      Yes       3
2       No       2
3      Yes       1
4       No       0
5      Yes       1
6      Yes       1
7      Yes       2
8      Yes       2
9       No       0
10      NA       1

Advice very welcome!

>Solution :

pPerformance %>% 
   left_join(pivot_wider(pRatings, names_from = Rater,
                          values_from = starts_with('Rating')))

  Participant Session Group   Answer1 Condition   Answer2 Duration Rating1_Fran Rating1_Fred Rating2_Fran Rating2_Fred
1           1       A     1 incorrect      chat   correct       96          Yes           No            3            2
2           2       B     1   correct     essay   correct       43           No          Yes            0            1
3           3       B     2   correct     essay   correct       56          Yes          Yes            1            1
4           4       A     1 incorrect      chat incorrect       75          Yes          Yes            2            2
5           5       C     1   correct     essay incorrect       23           No         <NA>            0            1

Note that if you need the namings to be in the same order as you did, include, the following in your pivot_wider function

That is:

pivot_wider(pRatings, names_from = Rater, names_glue = "{Rater}_{.value}",
    values_from = starts_with('Rating'))

Leave a Reply