I have some data from a "select all that apply" survey data and I now have something like this:
ID <- c(1,2,3,4,5)
answer <- c("apple, orange", "pinneaple, apple", "orange, pinneaple", "pinneaple, orange","apple")
df <- data.frame(ID,answer)
> df
ID answer
1 1 apple, orange
2 2 pinneaple, apple
3 3 orange, pinneaple
4 4 pinneaple, orange
5 5 apple
Now, I am trying to separate this and spread the results, for this I am using separate from dplyr and it does only partially what I am looking for:
df %>%
separate(answer, into = c("ans1", "ans2","ans3"), sep = ", ")
> df
ID ans1 ans2 ans3
1 1 apple orange <NA>
2 2 pinneaple apple <NA>
3 3 orange pinneaple <NA>
4 4 pinneaple orange <NA>
5 5 apple <NA> <NA>
Now, I would like to have the responses ordered. In the example that would be assigning ans1 to apple and so on, until we would end up with something like this:
> target_df
ID ans1 ans2 ans3
1 1 apple orange <NA>
2 2 apple <NA> pinneaple
3 3 <NA> orange pinneaple
4 4 <NA> orange pinneaple
5 5 apple <NA> <NA>
Is there a way to add this order attribute to the separate function?
>Solution :
One approach could be as follows, although it does not use separate:
library(dplyr)
library(tidyr)
library(stringr)
ID <- c(1,2,3,4,5)
answer <- c("apple, orange", "pinneaple, apple", "orange, pinneaple", "pinneaple, orange","apple")
df1 <- data.frame(ID,answer)
df1 |>
separate_rows(answer) |>
mutate(value = answer,
value = str_replace_all(value, c("apple" = "ans1", "orange" = "ans2", "pinneaple" = "ans3"))) |>
pivot_wider(names_from = value, values_from = answer)
#> # A tibble: 5 × 4
#> ID ans1 ans2 ans3
#> <dbl> <chr> <chr> <chr>
#> 1 1 apple orange <NA>
#> 2 2 apple <NA> pinneaple
#> 3 3 <NA> orange pinneaple
#> 4 4 <NA> orange pinneaple
#> 5 5 apple <NA> <NA>
Created on 2023-07-04 with reprex v2.0.2