Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Recode only certain values and keep others as it is in R

I am trying to recode a list of columns var1:var8 in df – "sampledf" where I am changing the values "B" and "D" into "0", but keeping the other values as it is.

sampledf <- data.frame(
    var1 = c(1,4,2,1,1,0,0,1,0,0,0),
  var2 = c(1,1,"D",1,0,0,1,"B",0,"D",0),
  var3 = c(1,5,2,1,"B",0,1,1,1,0,0),
  var4 = c(1,1,0,1,2,0,1,1,5,1,1),
  var5 = c(0,4,"D",1,0,0,0,1,1,1,1),
  var6 = c(1,"D",0,1,0,2,1,1,0,1,0),
  var7 = c(1,1,0,0,1,"E",1,0,"D",1,1),
  var8 = c(1,1,0,0,2,5,1,"D",0,3,1))

This is what I tried but did not work. Compared to this example, the other values I have in my real dataset is very very long. So I cannot manually supply all the values. All I want is just to change this and keep others as it is.

sampledfnew <- sampledf %>% mutate(across(var1:var2, ~recode(
  .x,
  'B'=0L,
  'D'=0L,
  TRUE ~ X,
)))

Can anyone help me fix the error here?
Thank you

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Alternatives to ifelse, since it is prone to at least two not-insignificant issues (class-dropping and class-ambiguity, discussed below).

sampledf %>%
  mutate(
    across(var1:var8, ~ if_else(
      . %in% c("B", "D"),
      if (is.character(.)) "0" else 0, # could also be maybechar(0, .) from below
      .)
    )
  )
#    var1 var2 var3 var4 var5 var6 var7 var8
# 1     1    1    1    1    0    1    1    1
# 2     4    1    5    1    4    0    1    1
# 3     2    0    2    0    0    0    0    0
# 4     1    1    1    1    1    1    0    0
# 5     1    0    0    2    0    0    1    2
# 6     0    0    0    0    0    2    E    5
# 7     0    1    1    1    0    1    1    1
# 8     1    0    1    1    1    1    0    0
# 9     0    0    1    5    1    0    0    0
# 10    0    0    0    1    1    1    1    3
# 11    0    0    0    1    1    0    1    1

In case you don’t always want B/D to be replaced with the same value,

maybechar <- function(val, src) if (is.character(src)) as.character(val) else val
sampledf %>%
  mutate(
    across(var1:var8, ~ case_when(
      . == "B" ~ maybechar(0, .),
      . == "D" ~ maybechar(0, .),
      TRUE ~ .)
    )
  )

Notes:

  • Most of the replacement being doing here is actually replacing with a "0" string instead of a 0 integer, because most of your data is string.

  • The use of ifelse by itself is something I often recommend against due to class ambiguity. It is feasible with ifelse to change the class of the return value without realizing it. See the difference between ifelse(c(T,T), 1:2, c("A","B")) and compare with ifelse(c(T,F), 1:2, c("A","B")) to see what I mean. This is "dangerous"/risky, and one thing that if_else explicitly guards against. (This also is enforced by case_when in my second code block.)

  • It is because of the previous bullet that I suggested the use of something like maybechar, which might suggest a little sloppy code but at least is a little more declarative/intentional about it. I give two ways to do it: the first is explicitly without a helper function, shown in the if_else example above, the second is with the helper function. It seems more prudent to use the helper function in the case of case_when, since the operation is being doing multiple times, so the code is a little easier to read (imo).

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading