Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Recoding multiple factors using regexp

I have data from a survey, where several questions are in the format

"Do you think that [xxxxxxx]"

The possible answers to the questions are in the format

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

"I am certain that [xxxxxxx]"
"I think it is possible that [xxxxxx]"
"I don’t know if [xxxxxx]"

and so on.

I would now like to recode these factors so that "I am certain" = 1, "I think it is possible" = 2 and so on. I have been playing with dplyr::recode but it does not seem to work with regular expressions.

For example:

set.seed(12345)

possible_answers <- c(
    "I am certain that", "I think it is possible that",
    "I don't know if is possible that", "I think it is not possible that",
    "I am certain that it is not possible that", "It is impossible for me to know if"
)

num_answers <- 10
survey <- data.frame(
    Q1 = paste(
        sample(possible_answers, num_answers, replace = TRUE),
        "topic 1"
    ),
    Q2 = paste(
        sample(possible_answers, num_answers, replace = TRUE),
        "topic 2"
    ),
    Q3 = paste(
        sample(possible_answers, num_answers, replace = TRUE),
        "topic 3"
    ),
    Q4 = paste(
        sample(possible_answers, num_answers, replace = TRUE),
        "topic 4"
    ),
    Q5 = paste(
        sample(possible_answers, num_answers, replace = TRUE),
        "topic 5"
    )
)

I can do something like

survey %>% 
    mutate_at("Q1", recode,
                "I am certain that topic 1" = 1,
                "I think it is possible that topic 1" = 2,
                "I don't know if is possible that topic 1" = 3,
                "I think it is not possible that topic 1" = 4,
                "I am certain that it is not possible that topic 1" = 5,
                "It is impossible for me to know if topic 1" = 6)

but doing it for all questions would be cumbersome.

I would like to do

survey %>% 
    mutate_at(vars(starts_with("Q")), recode,
                "I am certain that (.*)" = 1,
                "I think it is possible that (.*)" = 2,
                "I don't know if is possible that (.*)" = 3,
                "I think it is not possible that (.*)" = 4,
                "I am certain that it is not possible that (.*)" = 5,
                "It is impossible for me to know if (.*)" = 6)

But this changes everything to NA, because it does not see the strings as regular expressions.

>Solution :

Without the data I can’t test, but you should be able to use mutate(across(...)) with case_when() to do this. Note that since "I am certain that" will also match "I am certain that it is not possible", you need to do the latter first so that the search for "I am certain" only catches the positive cases.

survey %>% 
  mutate(across(starts_with("Q"), 
                ~case_when(
                  grepl("I am certain that it is not possible that", .x) ~ 5,
                  grepl("I am certain that", .x) ~ 1, 
                  grepl("I think it is possible that", .x) ~ 2, 
                  grepl("I don't know if is possible that", .x) ~ 3, 
                  grepl("I think it is not possible that", .x) ~ 4,
                  grepl("It is impossible for me to know if", .x) ~ 6)))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading