Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

dplyr mutate based on condition across many columns

I have a data frame df which looks something like what you see below.

df <- data.frame(id     = c("abc", "def", "ghi", "jkl", "mno"),
                 flag_V1 = c(2, 1, 1, 0, 1),
                 V1      = c(600, 1000, 500, NA, 700),
                 flag_V2 = c(1, 1, 0, 2, 0),
                 V2      = c(400, 600, 100, 700, NA),
                 flag_V3 = c(1, 2, 1, 0, 1),
                 V3      = c(600, 300, 600, NA, 700))

> df
   id flag_V1   V1 flag_V2  V2 flag_V3  V3
1 abc       2  600       1 400       1 600
2 def       1 1000       1 600       2 300
3 ghi       1  500       0 100       1 600
4 jkl       0   NA       2 700       0  NA
5 mno       1  700       0  NA       1 700

Essentially, each column V1, V2, and V3 comes with a flag variable flag_V1, flag_V2, and flag_V3 which contain information about how the values in V1, V2, and V3 where collected.

As you can see, whenever the flag variable reports 0, the value in the associated column is NA. I want to replace these missing values with zeroes if (and only if) the flag variable reports 0. I want something like this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df_modified <- df %>% mutate(V1 = ifelse(flag_V1 == 0, 0, V1))

for every column that comes with a flag variable. The actual data that I’m working with has hundreds of columns, and not all of them come with a flag variable.

Is there a simple way to perform the above operation on many columns without having to do it one by one, preferably using dplyr-like syntax and avoiding traditional loops? I want to highlight again that I don’t want to change all missing values to zeroes; it is very important that this is only done when the flag variable reports 0. Thank you!

>Solution :

Use across twice with replace:

library(dplyr)
df %>% 
  mutate(replace(across(matches("^V")), across(matches("^flag")) == 0, 0))

   id flag_V1   V1 flag_V2  V2 flag_V3  V3
1 abc       2  600       1 400       1 600
2 def       1 1000       1 600       2 300
3 ghi       1  500       0   0       1 600
4 jkl       0    0       2 700       0   0
5 mno       1  700       0   0       1 700
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading