I’ve 2 datasets
- Participant performance: 1 participant/row
- Participant ratings: >1 participants/row (1 rater/row)
I need to merge the 2 datasets into 1 dataset with 1 participant/row (including NAs).
I’ve included reproducible toy examples of the 2 datasets I have as well as a reproducible toy example of the merged dataset that I need below.
pPerformance <- data.frame(Participant = c(1, 2, 3, 4, 5),
Session = c("A", "B", "B", "A", "C"),
Group = c(1, 1, 2, 1, 1),
Answer1 = c("incorrect", "correct", "correct", "incorrect", "correct"),
Condition = c("chat", "essay", "essay", "chat", "essay"),
Answer2 = c("correct", "correct", "correct", "incorrect", "incorrect"),
Duration = c(96, 43, 56, 75, 23)
)
pPerformance
Participant Session Group Answer1 Condition Answer2 Duration
1 1 A 1 incorrect chat correct 96
2 2 B 1 correct essay correct 43
3 3 B 2 correct essay correct 56
4 4 A 1 incorrect chat incorrect 75
5 5 C 1 correct essay incorrect 23
pRatings <- data.frame(Participant = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
Session = c("A", "B", "B", "A", "C", "A", "B", "B", "A", "C"),
Group = c(1, 1, 2, 1, 1, 1, 1, 2, 1, 1),
Rater = c("Fran", "Fran", "Fran", "Fran", "Fran",
"Fred", "Fred", "Fred", "Fred", "Fred"),
Rating1 = c("Yes", "No", "Yes", "Yes", "No", "No", "Yes", "Yes", "Yes", NA),
Rating2 = c(3, 0, 1, 2, 0, 2, 1, 1, 2, 1)
)
pRatings
Participant Session Group Rater Rating1 Rating2
1 1 A 1 Fran Yes 3
2 2 B 1 Fran No 0
3 3 B 2 Fran Yes 1
4 4 A 1 Fran Yes 2
5 5 C 1 Fran No 0
6 1 A 1 Fred No 2
7 2 B 1 Fred Yes 1
8 3 B 2 Fred Yes 1
9 4 A 1 Fred Yes 2
10 5 C 1 Fred NA 1
pMerged <- data.frame(Participant = c(1, 2, 3, 4, 5),
Session = c("A", "B", "B", "A", "C"),
Group = c(1, 1, 2, 1, 1),
Answer1 = c("incorrect", "correct", "correct", "incorrect", "correct"),
Condition = c("chat", "essay", "essay", "chat", "essay"),
Answer2 = c("correct", "correct", "correct", "incorrect", "incorrect"),
Duration = c(96, 43, 56, 75, 23),
Fran_Rating1 = c("Yes", "No", "Yes", "Yes", "No"),
Fred_Rating1 = c("No", "Yes", "Yes", "Yes", NA),
Fran_Rating2 = c(3, 0, 1, 2, 0),
Fred_Rating2 = c(2, 1, 1, 2, 1)
)
pMerged
Participant Session Group Answer1 Condition Answer2 Duration Fran_Rating1 Fred_Rating1
1 1 A 1 incorrect chat correct 96 Yes No
2 2 B 1 correct essay correct 43 No Yes
3 3 B 2 correct essay correct 56 Yes Yes
4 4 A 1 incorrect chat incorrect 75 Yes Yes
5 5 C 1 correct essay incorrect 23 No NA
Fran_Rating2 Fred_Rating2
1 3 2
2 0 1
3 1 1
4 2 2
5 0 1
Currently, I am merging by "Participant", but that’s resulting in multiple rows per participant (10 rows instead of just 5 in the toy example). I am not sure how to combine the "Rater" cells and "Rating…" column headers the way they need to be combined (e.g., "Fran_Rating1" in the "pMerged" dataframe above).
pMergedReproducibly <- merge(pPerformance, pRatings, by = c("Participant"))
pMergedReproducibly
Participant Session.x Group.x Answer1 Condition Answer2 Duration Session.y Group.y Rater
1 1 A 1 incorrect chat correct 96 A 1 Fran
2 1 A 1 incorrect chat correct 96 A 1 Fred
3 2 B 1 correct essay correct 43 B 1 Fred
4 2 B 1 correct essay correct 43 B 1 Fran
5 3 B 2 correct essay correct 56 B 2 Fred
6 3 B 2 correct essay correct 56 B 2 Fran
7 4 A 1 incorrect chat incorrect 75 A 1 Fred
8 4 A 1 incorrect chat incorrect 75 A 1 Fran
9 5 C 1 correct essay incorrect 23 C 1 Fran
10 5 C 1 correct essay incorrect 23 C 1 Fred
Rating1 Rating2
1 Yes 3
2 No 2
3 Yes 1
4 No 0
5 Yes 1
6 Yes 1
7 Yes 2
8 Yes 2
9 No 0
10 NA 1
Advice very welcome!
>Solution :
pPerformance %>%
left_join(pivot_wider(pRatings, names_from = Rater,
values_from = starts_with('Rating')))
Participant Session Group Answer1 Condition Answer2 Duration Rating1_Fran Rating1_Fred Rating2_Fran Rating2_Fred
1 1 A 1 incorrect chat correct 96 Yes No 3 2
2 2 B 1 correct essay correct 43 No Yes 0 1
3 3 B 2 correct essay correct 56 Yes Yes 1 1
4 4 A 1 incorrect chat incorrect 75 Yes Yes 2 2
5 5 C 1 correct essay incorrect 23 No <NA> 0 1
Note that if you need the namings to be in the same order as you did, include, the following in your pivot_wider
function
That is:
pivot_wider(pRatings, names_from = Rater, names_glue = "{Rater}_{.value}",
values_from = starts_with('Rating'))