Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I decode a column with text from another column in R?

I have a dataframe with encoded survey answers in the answer column und the keys as one string in a character column:

df <- data.frame(answer = c(1, 2, 1, 3, 1),
                 key = c("1 = Answer One 2 = Answer Two 3 = Answer Three", "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI", 
                         "1 = Answer abc 2 = Answer def 3 = Answer ghi", "1 = Answer One 2 = Answer Two 3 = Answer Three",
                         "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"))

print(df)

  answer                                            key
1      1 "1 = Answer One 2 = Answer Two 3 = Answer Three"
2      2   "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"
3      1   "1 = Answer abc 2 = Answer def 3 = Answer ghi"
4      3 "1 = Answer One 2 = Answer Two 3 = Answer Three"
5      1   "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"

How can I decode the answer column with the data from the key column so that I get this result?

df_result <- data.frame(answer = c(1, 2, 1, 3, 1),
                 key = c("1 = Answer One 2 = Answer Two 3 = Answer Three", "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI", 
                         "1 = Answer abc 2 = Answer def 3 = Answer ghi", "1 = Answer One 2 = Answer Two 3 = Answer Three",
                         "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"),
                 answer_decoded = c("Answer One", "Answer DEF", "Answer abc", "Answer Three","Answer ABC"))

print(df_result)

  answer                                            key answer_decoded
1      1 "1 = Answer One 2 = Answer Two 3 = Answer Three"     "Answer One"
2      2   "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"     "Answer DEF"
3      1   "1 = Answer abc 2 = Answer def 3 = Answer ghi"     "Answer abc"
4      3 "1 = Answer One 2 = Answer Two 3 = Answer Three"   "Answer Three"
5      1   "1 = Answer ABC 2 = Answer DEF 3 = Answer GHI"     "Answer ABC"

I cannot use factor labels since I have too many different items to manually create them.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We may extract the substring based on the ‘answer’ values – use str_c to create the pattern to be extracted i.e. paste the ‘answer’ with space followed by = and one or more non-digit characters (\\D+) and remove the prefix part including the = and any spaces with trimws

library(stringr)
library(dplyr)
df %>%
   mutate(answer_decoded = trimws(str_extract(key, 
        str_c(answer, ' = \\D+')), whitespace = ".*=\\s+|\\s+"))

-output

  answer                                            key answer_decoded
1      1 1 = Answer One 2 = Answer Two 3 = Answer Three     Answer One
2      2   1 = Answer ABC 2 = Answer DEF 3 = Answer GHI     Answer DEF
3      1   1 = Answer abc 2 = Answer def 3 = Answer ghi     Answer abc
4      3 1 = Answer One 2 = Answer Two 3 = Answer Three   Answer Three
5      1   1 = Answer ABC 2 = Answer DEF 3 = Answer GHI     Answer ABC
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading