Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Condition on multiple columns and change the values using R

I simplified the dataset to demonstrate what I want to do. I’m not used to dealing with multiple columns. Here I made a simple data

data<-data.frame(id=c(1,1,1,2,2,2,2),
                 title_1=c(65,58,47,NA,25,27,43),
                 title_2=c(NA,NA,32,35,12,NA,1))

In my actual dataset there are so many columns, but for now I just named as above. My goal is to change the values of title_1 and title_2 by the following rule. If there is a number , change it to 1. If there is an NA value, change it to 0. But in my actual dataset, there are hundreds of columns named as title_1, title_2, ... , title_100, ... So, I cannot type all the column names. So for my simple data, I want to use the code that doesn’t type the column names explicitly. My expected output is

data<-data.frame(id=c(1,1,1,2,2,2,2),
                       title_1=c(1,1,1,0,1,1,1),
                       title_2=c(0,0,1,1,1,0,1))

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

With dplyr we can use tidyselect syntax inside across() to select all variables starting with "title_" and then apply a function on all selected columns inside across():

data<-data.frame(id=c(1,1,1,2,2,2,2),
                 title_1=c(65,58,47,NA,25,27,43),
                 title_2=c(NA,NA,32,35,12,NA,1))

library(dplyr)

data %>% 
  mutate(across(starts_with("title_"), ~ ifelse(is.na(.x), 0, 1)))
#>   id title_1 title_2
#> 1  1       1       0
#> 2  1       1       0
#> 3  1       1       1
#> 4  2       0       1
#> 5  2       1       1
#> 6  2       1       0
#> 7  2       1       1

In base R we would use grepl to select the column names, then assign those columns new values with lapply:

data<-data.frame(id=c(1,1,1,2,2,2,2),
                 title_1=c(65,58,47,NA,25,27,43),
                 title_2=c(NA,NA,32,35,12,NA,1))

mycols <- grepl("^title_", names(data))

data[mycols] <- lapply(data[mycols], \(x) ifelse(is.na(x), 0, 1))
data

#>   id title_1 title_2
#> 1  1       1       0
#> 2  1       1       0
#> 3  1       1       1
#> 4  2       0       1
#> 5  2       1       1
#> 6  2       1       0
#> 7  2       1       1

Finally, we would select the columns with data.table similary, but here we’d prefer the actual names with grep(value = TRUE):

mycols <- grep("^title_", names(data), value = TRUE)

library(data.table)

data_tb <- as.data.table(data)

data_tb[,
        get("mycols") := lapply(.SD, \(x) ifelse(is.na(x), 0, 1)),
       .SDcols = mycols]

data_tb
#>    id title_1 title_2
#> 1:  1       1       0
#> 2:  1       1       0
#> 3:  1       1       1
#> 4:  2       0       1
#> 5:  2       1       1
#> 6:  2       1       0
#> 7:  2       1       1

Created on 2022-07-26 by the reprex package (v2.0.1)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading