Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Order elements into different columns using separate()

I have some data from a "select all that apply" survey data and I now have something like this:

ID <- c(1,2,3,4,5)
answer <- c("apple, orange", "pinneaple, apple", "orange, pinneaple", "pinneaple, orange","apple")

df <- data.frame(ID,answer)

> df
  ID            answer
1  1     apple, orange
2  2  pinneaple, apple
3  3 orange, pinneaple
4  4 pinneaple, orange
5  5             apple

Now, I am trying to separate this and spread the results, for this I am using separate from dplyr and it does only partially what I am looking for:

df %>% 
  separate(answer, into = c("ans1", "ans2","ans3"), sep = ", ")

> df
  ID      ans1      ans2 ans3
1  1     apple    orange <NA>
2  2 pinneaple     apple <NA>
3  3    orange pinneaple <NA>
4  4 pinneaple    orange <NA>
5  5     apple      <NA> <NA>

Now, I would like to have the responses ordered. In the example that would be assigning ans1 to apple and so on, until we would end up with something like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

> target_df
  ID  ans1   ans2      ans3
1  1 apple orange      <NA>
2  2 apple   <NA> pinneaple
3  3  <NA> orange pinneaple
4  4  <NA> orange pinneaple
5  5 apple   <NA>      <NA>

Is there a way to add this order attribute to the separate function?

>Solution :

One approach could be as follows, although it does not use separate:

library(dplyr)
library(tidyr)
library(stringr)


ID <- c(1,2,3,4,5)
answer <- c("apple, orange", "pinneaple, apple", "orange, pinneaple", "pinneaple, orange","apple")

df1 <- data.frame(ID,answer)

df1 |> 
  separate_rows(answer) |> 
  mutate(value = answer, 
         value = str_replace_all(value, c("apple" = "ans1", "orange" = "ans2", "pinneaple" = "ans3"))) |> 
  pivot_wider(names_from = value, values_from = answer)
#> # A tibble: 5 × 4
#>      ID ans1  ans2   ans3     
#>   <dbl> <chr> <chr>  <chr>    
#> 1     1 apple orange <NA>     
#> 2     2 apple <NA>   pinneaple
#> 3     3 <NA>  orange pinneaple
#> 4     4 <NA>  orange pinneaple
#> 5     5 apple <NA>   <NA>

Created on 2023-07-04 with reprex v2.0.2

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading