Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

In R, how to use string values in one column to ID and select rows from another column in the same dataframe, varying per group?

I have a dataframe that contains all possible order permutations that participants could be presented with, and a column that contains a permutation ID (i.e. presentation order) that each individual participant actually was presented with.

How do I use the permutation ID to select the values of the corresponding column for each participant?

To illustrate with dummy data, in this example, the ID for the presentation order each participant saw is given in column presentation_type. I would like to use that ID to select the values from the corresponding order* columns.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(dplyr)

df1 <- tibble::tribble(
  ~participant_id, ~presentation_type, ~trial_number, ~response, ~order1, ~order2, ~order3,
             "p1",           "order3",            1L,     "yes",     "a",     "b",     "c",
             "p1",           "order3",            2L,     "yes",     "b",     "c",     "a",
             "p1",           "order3",            3L,      "no",     "c",     "a",     "b",
             "p2",           "order1",            1L,      "no",     "a",     "b",     "c",
             "p2",           "order1",            2L,     "yes",     "b",     "c",     "a",
             "p2",           "order1",            3L,      "no",     "c",     "a",     "b",
             "p3",           "order2",            1L,      "no",     "a",     "b",     "c",
             "p3",           "order2",            2L,     "yes",     "b",     "c",     "a",
             "p3",           "order2",            3L,     "yes",     "c",     "a",     "b"
  )

In other words, the desired outcome is a datafame with a column that contains the actual stimuli each participant saw (stimulus_presented), like this:

desired_outcome <- tibble::tribble(
                     ~participant_id, ~presentation_type, ~trial_number, ~response, ~stimulus_presented,
                                "p1",           "order3",            1L,     "yes",              "c",
                                "p1",           "order3",            2L,     "yes",              "a",
                                "p1",           "order3",            3L,      "no",              "b",
                                "p2",           "order1",            1L,      "no",              "a",
                                "p2",           "order1",            2L,     "yes",              "b",
                                "p2",           "order1",            3L,      "no",              "c",
                                "p3",           "order2",            1L,      "no",              "b",
                                "p3",           "order2",            2L,     "yes",              "c",
                                "p3",           "order2",            3L,     "yes",              "a"
                     )

I thought I could get there with something like the code below, but it just seems to assign the order for participant_id "p1" to all participants. Do I need to somehow group by/map for individual participants?

## Attempt so far - incorrect
# make single column of the stimulus presented on each row
stimuli_presented <- df1 %>% 
 select(stimulus_presented = .$presentation_type[1])

# bind our newly created column back onto the original dataframe, then remove "order1", "order2", etc. 
df2 <- bind_cols(df1, stimuli_presented) %>% 
  relocate(stimulus_presented, .after = response) %>% 
  select(-c(starts_with("order")))

Thanks.

>Solution :

With dplyr, you can use rowwise and get:

library(dplyr)
df1 %>% 
  rowwise() %>% 
  mutate(stimulus_presented = get(presentation_type)) %>% 
  select(-starts_with("order"))

# A tibble: 9 Ă— 5
# Rowwise: 
  participant_id presentation_type trial_number response stimulus_presented
  <chr>          <chr>                    <int> <chr>    <chr>             
1 p1             order3                       1 yes      c                 
2 p1             order3                       2 yes      a                 
3 p1             order3                       3 no       b                 
4 p2             order1                       1 no       a                 
5 p2             order1                       2 yes      b                 
6 p2             order1                       3 no       c                 
7 p3             order2                       1 no       b                 
8 p3             order2                       2 yes      c                 
9 p3             order2                       3 yes      a                 

With base R:

diag(as.matrix(df1[match(df1$presentation_type, colnames(df1))]))
#or
unlist(sapply(seq(nrow(df1)), \(x) df1[x, match(df1$presentation_type[x], colnames(df1))]))

#[1] "c" "a" "b" "a" "b" "c" "b" "c" "a"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading