Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Merge wide and long dataset into wide dataset in R (by combining certain column headers and certain cells into new column headers)

I’ve 2 datasets

  1. Participant performance: 1 participant/row
  2. Participant ratings: >1 participants/row (1 rater/row)

I need to merge the 2 datasets into 1 dataset with 1 participant/row (including NAs).

I’ve included reproducible toy examples of the 2 datasets I have as well as a reproducible toy example of the merged dataset that I need below.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

pPerformance <- data.frame(Participant = c(1, 2, 3, 4, 5),
                           Session = c("A", "B", "B", "A", "C"),
                           Group = c(1, 1, 2, 1, 1),
                           Answer1 = c("incorrect", "correct", "correct", "incorrect", "correct"),
                           Condition = c("chat", "essay", "essay", "chat", "essay"),
                           Answer2 = c("correct", "correct", "correct", "incorrect", "incorrect"),
                           Duration = c(96, 43, 56, 75, 23)
                           )
pPerformance
  Participant Session Group   Answer1 Condition   Answer2 Duration
1           1       A     1 incorrect      chat   correct       96
2           2       B     1   correct     essay   correct       43
3           3       B     2   correct     essay   correct       56
4           4       A     1 incorrect      chat incorrect       75
5           5       C     1   correct     essay incorrect       23

pRatings <- data.frame(Participant = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
                       Session = c("A", "B", "B", "A", "C", "A", "B", "B", "A", "C"),
                       Group = c(1, 1, 2, 1, 1, 1, 1, 2, 1, 1),
                       Rater = c("Fran", "Fran", "Fran", "Fran", "Fran",
                                 "Fred", "Fred", "Fred", "Fred", "Fred"),
                       Rating1 = c("Yes", "No", "Yes", "Yes", "No", "No", "Yes", "Yes", "Yes", NA),
                       Rating2 = c(3, 0, 1, 2, 0, 2, 1, 1, 2, 1)
                       )
pRatings
   Participant Session Group Rater Rating1 Rating2
1            1       A     1  Fran     Yes       3
2            2       B     1  Fran      No       0
3            3       B     2  Fran     Yes       1
4            4       A     1  Fran     Yes       2
5            5       C     1  Fran      No       0
6            1       A     1  Fred      No       2
7            2       B     1  Fred     Yes       1
8            3       B     2  Fred     Yes       1
9            4       A     1  Fred     Yes       2
10           5       C     1  Fred      NA       1

pMerged <- data.frame(Participant = c(1, 2, 3, 4, 5),
                      Session = c("A", "B", "B", "A", "C"),
                      Group = c(1, 1, 2, 1, 1),
                      Answer1 = c("incorrect", "correct", "correct", "incorrect", "correct"),
                      Condition = c("chat", "essay", "essay", "chat", "essay"),
                      Answer2 = c("correct", "correct", "correct", "incorrect", "incorrect"),
                      Duration = c(96, 43, 56, 75, 23),
                      Fran_Rating1 = c("Yes", "No", "Yes", "Yes", "No"),
                      Fred_Rating1 = c("No", "Yes", "Yes", "Yes", NA),
                      Fran_Rating2 = c(3, 0, 1, 2, 0),
                      Fred_Rating2 = c(2, 1, 1, 2, 1)
                      )
pMerged
  Participant Session Group   Answer1 Condition   Answer2 Duration Fran_Rating1 Fred_Rating1
1           1       A     1 incorrect      chat   correct       96          Yes           No
2           2       B     1   correct     essay   correct       43           No          Yes
3           3       B     2   correct     essay   correct       56          Yes          Yes
4           4       A     1 incorrect      chat incorrect       75          Yes          Yes
5           5       C     1   correct     essay incorrect       23           No           NA
  Fran_Rating2 Fred_Rating2
1            3            2
2            0            1
3            1            1
4            2            2
5            0            1

Currently, I am merging by "Participant", but that’s resulting in multiple rows per participant (10 rows instead of just 5 in the toy example). I am not sure how to combine the "Rater" cells and "Rating…" column headers the way they need to be combined (e.g., "Fran_Rating1" in the "pMerged" dataframe above).

pMergedReproducibly <- merge(pPerformance, pRatings, by = c("Participant"))      
pMergedReproducibly
   Participant Session.x Group.x   Answer1 Condition   Answer2 Duration Session.y Group.y Rater
1            1         A       1 incorrect      chat   correct       96         A       1  Fran
2            1         A       1 incorrect      chat   correct       96         A       1  Fred
3            2         B       1   correct     essay   correct       43         B       1  Fred
4            2         B       1   correct     essay   correct       43         B       1  Fran
5            3         B       2   correct     essay   correct       56         B       2  Fred
6            3         B       2   correct     essay   correct       56         B       2  Fran
7            4         A       1 incorrect      chat incorrect       75         A       1  Fred
8            4         A       1 incorrect      chat incorrect       75         A       1  Fran
9            5         C       1   correct     essay incorrect       23         C       1  Fran
10           5         C       1   correct     essay incorrect       23         C       1  Fred
   Rating1 Rating2
1      Yes       3
2       No       2
3      Yes       1
4       No       0
5      Yes       1
6      Yes       1
7      Yes       2
8      Yes       2
9       No       0
10      NA       1

Advice very welcome!

>Solution :

pPerformance %>% 
   left_join(pivot_wider(pRatings, names_from = Rater,
                          values_from = starts_with('Rating')))

  Participant Session Group   Answer1 Condition   Answer2 Duration Rating1_Fran Rating1_Fred Rating2_Fran Rating2_Fred
1           1       A     1 incorrect      chat   correct       96          Yes           No            3            2
2           2       B     1   correct     essay   correct       43           No          Yes            0            1
3           3       B     2   correct     essay   correct       56          Yes          Yes            1            1
4           4       A     1 incorrect      chat incorrect       75          Yes          Yes            2            2
5           5       C     1   correct     essay incorrect       23           No         <NA>            0            1

Note that if you need the namings to be in the same order as you did, include, the following in your pivot_wider function

That is:

pivot_wider(pRatings, names_from = Rater, names_glue = "{Rater}_{.value}",
    values_from = starts_with('Rating'))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading