How to properly use pivot_wider() to align the values of two variables?

March 18, 2024

I have one dataset as below.

library(dplyr)
library(tidyr)

df= tibble::tibble(
    variety=rep(c("CV1", "CV2", "CV3"), each=16L),
    irrigation=rep(rep(c("yes", "no"), 3), each=8L),
    fertilizer=rep(rep(c("Organic", "Urea"), 6), each=4L),
    reps=c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 3, 4, 3, 1, 3, 4, 3, 2, 1, 4,
    2, 3, 1, 4, 1, 3, 1, 2, 2, 4, 1, 2, 1, 4, 2, 3, 1, 4, 2, 3, 2, 4),
    yield=c(8.379842, 8.058658, 9.73285, 9.224371999999999, NA, 6.996108000000001,
    9.865782, 7.112071666666666, 5.968758, 8.976471666666667, 7.980724, 9.35065,
    5.5111574999999995, 6.998728, 6.164252, 5.118412857142857, 7.748125, 8.58071,
    NA, NA, 7.673354999999999, 7.91948, NA, NA, 11.190445, 8.463484999999999,
    9.61818, 10.89841, 7.83943, 8.44905, 9.844165, 9.98026, 10.130675, 9.59432,
    NA, NA, 9.502525, 9.216965, NA, NA, 7.807259999999999, 9.94434, 7.92808,
    11.88664, 10.700185000000001, 10.723835000000001, 11.363140000000001,
    11.846934999999998),
    nutrients=c(0.42549600000000004, 0.417924, 0.47264, 0.45002, NA, 0.381154, 0.484084,
    0.3597316666666666, 0.32555, 0.45681666666666665, 0.38164600000000004,
    0.456822, 0.30655, 0.363892, 0.350876, 0.30200857142857146, 0.26754,
    0.30954499999999996, NA, NA, 0.328395, 0.30893, NA, NA, 0.37877, 0.33532,
    0.40417000000000003, 0.4581, 0.32077500000000003, 0.33331500000000003,
    0.39925, 0.40179000000000004, 0.40585499999999997, 0.339465, NA, NA, 0.339545,
    0.34077500000000005, NA, NA, 0.3227, 0.37770000000000004, 0.34663, 0.48564,
    0.43601500000000004, 0.38200500000000004, 0.47248500000000004, 0.506255),
)
head(df,5)
variety irrigation  fertilizer  reps  yield     nutrients
CV1     yes         Organic     1     8.379842  0.425496
CV1     yes         Organic     2     8.058658  0.417924
CV1     yes         Organic     3     9.732850  0.472640
CV1     yes         Organic     4     9.224372  0.450020
CV1     yes         Urea        1     NA        NA
.
.
.

I want to align the yields for Organic fertilizer with those for Urea fertilizer, as well as the nutrients between the two fertilizers to create regression graph between two different yield and nutrients. Initially, I attempted to use pivot_wider().

df2= data.frame(df %>%
                group_by(variety, irrigation) %>%
                pivot_wider(names_from=fertilizer, values_from=nutrients))
head(df2,8)
variety irrigation  reps     nutrients  Organic   Urea
1   CV1     yes         1    0.4254960  8.379842  NA
2   CV1     yes         2    0.4179240  8.058658  NA
3   CV1     yes         3    0.4726400  9.732850  NA
4   CV1     yes         4    0.4500200  9.224372  NA
5   CV1     yes         1    NA         NA        NA
6   CV1     yes         2    0.3811540  NA        6.996108
7   CV1     yes         3    0.4840840  NA        9.865782
8   CV1     yes         4    0.3597317  NA        7.112072
    .
    .
    .

Currently, the yields for Organic and Urea are not aligned next to each other. What I’m aiming for is a layout like the following.

variety irrigation  reps     nutrients  Organic   Urea
1   CV1     yes         1    0.4254960  8.379842  NA
2   CV1     yes         2    0.4179240  8.058658  6.996108
3   CV1     yes         3    0.4726400  9.732850  9.865782
4   CV1     yes         4    0.4500200  9.224372  7.112072
    .
    .
    .

How can I solve this problem? Also, is there any way to transpose both yield and nutrients at once?

Thanks,

>Solution :

Here’s my best guess. It looks like in a few cases you have multiple observations with the same variety/irrigation/reps/fertilizer. To address that, I add a variable obs to distinguish these.

df |>
  arrange(variety, irrigation, reps) %>% 
  mutate(obs = row_number(), .by = c(variety, irrigation, reps, fertilizer)) %>%
  pivot_wider(names_from = fertilizer, values_from = c(yield, nutrients))

Result

# A tibble: 28 × 8
   variety irrigation  reps   obs yield_Organic yield_Urea nutrients_Organic nutrients_Urea
   <chr>   <chr>      <dbl> <int>         <dbl>      <dbl>             <dbl>          <dbl>
 1 CV1     no             1     1          5.97       5.51             0.326          0.307
 2 CV1     no             2     1          8.98       7.00             0.457          0.364
 3 CV1     no             3     1          7.98       6.16             0.382          0.351
 4 CV1     no             4     1          9.35       5.12             0.457          0.302
 5 CV1     yes            1     1          8.38      NA                0.425         NA    
 6 CV1     yes            2     1          8.06       7.00             0.418          0.381
 7 CV1     yes            3     1          9.73       9.87             0.473          0.484
 8 CV1     yes            4     1          9.22       7.11             0.450          0.360
 9 CV2     no             1     1         11.2        7.84             0.379          0.321
10 CV2     no             1     2         NA          9.84            NA              0.399
# ℹ 18 more rows