Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Recode subset of variables using case when in R

I am trying to recode some survey data in R. Here is some data similar to what I actually have.

df <- data.frame(
  A = rep("Y",5),
  B=seq(as.POSIXct("2014-01-13"), as.POSIXct("2014-01-17"), by="days"),
  C = c("Neither agree nor disagree",
        "Somewhat agree",
        "Somewhat disagree",
        "Strongly agree",
        "Strongly disagree"),
  D=c("Neither agree nor disagree",
         "Somewhat agree",
         "Somewhat disagree",
         "Strongly agree",
         "Strongly disagree")
)



I looked up some other posts and wrote the code below:

init2<-df %>%
  mutate_at(vars(c(1:4)), function(x) case_when( x == "Neither agree nor disagree" ~ 3, 
                                     x == "Somewhat agree" ~ 4, 
                                     x == "Somewhat disagree"~ 2,
                                     x== "Strongly agree"~ 5,
                                     x== "Strongly disaagree"~ 1
                                     
                                     ))

But this throws the error

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Error: Problem with `mutate()` column `B`.
i `B = (function (x) ...`.
x character string is not in a standard unambiguous format

Run `rlang::last_error()` to see where the error occurred. 

My input dates are POSIXct. SHould I change their format? What is the fix for this issue? Thanks.

>Solution :

It does not make sense to try to recode POSIXt columns to your Likert scale; nor does it make sense to me to try to recode the "Y" column, though at least you are not getting an error about that.

I suggest you either:

  1. Explicitly mutate the columns you want,

    df %>%
      mutate(across(c(C, D), ~ case_when(
        . == "Neither agree nor disagree" ~ 3,
        . == "Somewhat agree"             ~ 4,
        . == "Somewhat disagree"          ~ 2,
        . == "Strongly agree"             ~ 5,
        . == "Strongly disagree"          ~ 1
      )))
    #   A          B C D
    # 1 Y 2014-01-13 3 3
    # 2 Y 2014-01-14 4 4
    # 3 Y 2014-01-15 2 2
    # 4 Y 2014-01-16 5 5
    # 5 Y 2014-01-17 1 1
    
  2. Explicitly exclude columns you don’t want,

    df %>%
      mutate(across(-c(A, B), ~ case_when(
        . == "Neither agree nor disagree" ~ 3,
        . == "Somewhat agree"             ~ 4,
        . == "Somewhat disagree"          ~ 2,
        . == "Strongly agree"             ~ 5,
        . == "Strongly disagree"          ~ 1
      )))
    
  3. Conditionally process them via some filter (though this is not infallible):

    df %>%
      mutate(across(where(~ all(grepl("agree", .))), ~ case_when(
        . == "Neither agree nor disagree" ~ 3,
        . == "Somewhat agree"             ~ 4,
        . == "Somewhat disagree"          ~ 2,
        . == "Strongly agree"             ~ 5,
        . == "Strongly disagree"          ~ 1
      )))
    

FYI, according to https://dplyr.tidyverse.org/reference/mutate_all.html (on 2021 Nov 7):

Scoped verbs (_if, _at, _all) have been superseded by the use of across() in an existing verb. See vignette("colwise") for details.

It pairs nicely with where, provided (surreptitiously) by the tidyselect package.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading