Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pivoting wider with two separate keys simultaneously in dplyr

I need to pivot a tidy dataset into a wider format based on two set of keys using dplyr. I am not great with the terminology of pivoting so please excuse if ‘keys’ was not the right term. Here is some toy data to illustrate. These are data from two fictional participants: three different measures taken once on each of four days. For each measure we have a total score summed across the four days, expressed in the ‘tot’ column. This value will be constant over the four days for each measure.

library(dplyr)

df <- data.frame(id = rep(c("DFE3",
                            "DFE76"),
                          each = 12),
                 measure = rep(letters[1:3],
                               each = 4,
                               length.out = 24),
                 day = rep(1:4,
                           times = 3,
                           length.out = 24),
                 score = sample(0:5,
                                24,
                                replace = T)) %>%
        arrange(id,measure,day) %>%
          group_by(id, measure) %>%
            mutate(tot = sum(score)) %>%
              ungroup
  
         

df

# # A tibble: 24 x 5
#      id  measure   day score   tot
#   <fct>  <fct>   <int> <int> <int>
#  1 DFE3  a           1     5    12
#  2 DFE3  a           2     2    12
#  3 DFE3  a           3     5    12
#  4 DFE3  a           4     0    12
#  5 DFE3  b           1     1     9
#  6 DFE3  b           2     2     9
#  7 DFE3  b           3     5     9
#  8 DFE3  b           4     1     9
#  9 DFE3  c           1     0    15
# 10 DFE3  c           2     5    15
# # i 14 more rows
# # i Use `print(n = ...)` to see more rows

Now what I want to do is pivot so that I get one column for each measure and week for the score column AND one column for each measure only for the tot column.

When I ran this code…

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df %>%
  pivot_wider(names_from = c(measure,
                             day),
              values_from = c(score, tot)) 

# A tibble: 2 x 25
# id      score_a_1 score_a_2 score_a_3 score_a_4 score_b_1 score_b_2 score_b_3 score_b_4 score_c_1 score_c_2 score_c_3 score_c_4 tot_a_1
# <fct>       <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>   <int>
# 1 DFE3          2         1         3         3         4         4         5         0         2         0         3         5       9
# 2 DFE76         1         4         4         2         1         2         2         4         2         3         2         5      11
# # i 11 more variables: tot_a_2 <int>, tot_a_3 <int>, tot_a_4 <int>, tot_b_1 <int>, tot_b_2 <int>, tot_b_3 <int>, tot_b_4 <int>,
# #   tot_c_1 <int>, tot_c_2 <int>, tot_c_3 <int>, tot_c_4 <int>

…it gave me what I wanted for the score variable – spread across measure and week – but it did the same thing for the tot column, which is not what I want (there should only be three columns per id, one for each measure.

Is there any way to do these processes simultaneously using pivot_wider?

>Solution :

This might be simplest to think of as two tables keyed to id, one pivoted wide reflecting measure and day, the other pivoted just using measure, and then joined together based on id:

library(dplyr); library(tidyr)
df %>%
  select(-tot) %>%
  pivot_wider(names_from = c(measure, day), values_from = score, 
              names_prefix = "score_") %>%
  left_join(df %>%
              distinct(id, measure, tot) %>%
              pivot_wider(names_from = measure, values_from = tot, 
              names_prefix = "tot_"))

Result

Joining with `by = join_by(id)`
# A tibble: 2 × 16
  id    score_a_1 score_a_2 score_a_3 score_a_4 score_b_1 score_b_2 score_b_3 score_b_4 score_c_1 score_c_2 score_c_3 score_c_4 tot_a tot_b tot_c
  <chr>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int> <int> <int> <int>
1 DFE3          0         4         0         4         4         0         3         2         5         2         5         1     8     9    13
2 DFE76         4         5         5         5         2         2         3         5         4         4         2         2    19    12    12
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading