In R, how to use string values in one column to ID and select rows from another column in the same dataframe, varying per group?

I have a dataframe that contains all possible order permutations that participants could be presented with, and a column that contains a permutation ID (i.e. presentation order) that each individual participant actually was presented with.

How do I use the permutation ID to select the values of the corresponding column for each participant?

To illustrate with dummy data, in this example, the ID for the presentation order each participant saw is given in column presentation_type. I would like to use that ID to select the values from the corresponding order* columns.

library(dplyr)

df1 <- tibble::tribble(
  ~participant_id, ~presentation_type, ~trial_number, ~response, ~order1, ~order2, ~order3,
             "p1",           "order3",            1L,     "yes",     "a",     "b",     "c",
             "p1",           "order3",            2L,     "yes",     "b",     "c",     "a",
             "p1",           "order3",            3L,      "no",     "c",     "a",     "b",
             "p2",           "order1",            1L,      "no",     "a",     "b",     "c",
             "p2",           "order1",            2L,     "yes",     "b",     "c",     "a",
             "p2",           "order1",            3L,      "no",     "c",     "a",     "b",
             "p3",           "order2",            1L,      "no",     "a",     "b",     "c",
             "p3",           "order2",            2L,     "yes",     "b",     "c",     "a",
             "p3",           "order2",            3L,     "yes",     "c",     "a",     "b"
  )

In other words, the desired outcome is a datafame with a column that contains the actual stimuli each participant saw (stimulus_presented), like this:

desired_outcome <- tibble::tribble(
                     ~participant_id, ~presentation_type, ~trial_number, ~response, ~stimulus_presented,
                                "p1",           "order3",            1L,     "yes",              "c",
                                "p1",           "order3",            2L,     "yes",              "a",
                                "p1",           "order3",            3L,      "no",              "b",
                                "p2",           "order1",            1L,      "no",              "a",
                                "p2",           "order1",            2L,     "yes",              "b",
                                "p2",           "order1",            3L,      "no",              "c",
                                "p3",           "order2",            1L,      "no",              "b",
                                "p3",           "order2",            2L,     "yes",              "c",
                                "p3",           "order2",            3L,     "yes",              "a"
                     )

I thought I could get there with something like the code below, but it just seems to assign the order for participant_id "p1" to all participants. Do I need to somehow group by/map for individual participants?

## Attempt so far - incorrect
# make single column of the stimulus presented on each row
stimuli_presented <- df1 %>% 
 select(stimulus_presented = .$presentation_type[1])

# bind our newly created column back onto the original dataframe, then remove "order1", "order2", etc. 
df2 <- bind_cols(df1, stimuli_presented) %>% 
  relocate(stimulus_presented, .after = response) %>% 
  select(-c(starts_with("order")))

Thanks.

>Solution :

With dplyr, you can use rowwise and get:

library(dplyr)
df1 %>% 
  rowwise() %>% 
  mutate(stimulus_presented = get(presentation_type)) %>% 
  select(-starts_with("order"))

# A tibble: 9 × 5
# Rowwise: 
  participant_id presentation_type trial_number response stimulus_presented
  <chr>          <chr>                    <int> <chr>    <chr>             
1 p1             order3                       1 yes      c                 
2 p1             order3                       2 yes      a                 
3 p1             order3                       3 no       b                 
4 p2             order1                       1 no       a                 
5 p2             order1                       2 yes      b                 
6 p2             order1                       3 no       c                 
7 p3             order2                       1 no       b                 
8 p3             order2                       2 yes      c                 
9 p3             order2                       3 yes      a                 

With base R:

diag(as.matrix(df1[match(df1$presentation_type, colnames(df1))]))
#or
unlist(sapply(seq(nrow(df1)), \(x) df1[x, match(df1$presentation_type[x], colnames(df1))]))

#[1] "c" "a" "b" "a" "b" "c" "b" "c" "a"

Leave a Reply