Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create a new variable that prints the first value in a series of column only if the condition is met in R

I am trying to create a new variable that prints the first value of a series of column, only if a certain condition is met.

To clarify, my database looks something like this:

var1 var2 var3 var4
C7931 C3490 R0781 I10
R079 R0600 I10 C3490
S270XXA S225XXA C3490 C7931

I want to create a variable (main) that prints the value in the first var column only if the value does not start with C00 to C99. If the value does start with that condition, then I would like to test the condition the next column, until the condition is met, and the value is printed.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Therefore, the newly created variable (main) should look something like this for the table above:

var1 var2 var3 var4 main
C7931 C3490 R0781 I10 R0781
R079 R0600 I10 C3490 R079
C0258 S225XXA C3490 C7931 S225XXA

I am not too sure where to start, but I suspect that maybe this might involve mutate() and ifelse()

>Solution :

We could use grepl to create a logical vector for subsetting by looping over each row. The pattern matched is C followed by one or more digits (\\d+) and negate (!) the logical vector to subset the elements, and return the first ([1])

df1$main <- apply(df1[startsWith(names(df1), "var")], 1, 
       function(x) x[!grepl("^C\\d+", x)][1])

With tidyverse, can use rowwise with str_subset

library(dplyr)
library(stringr)
df1 %>% 
 rowwise %>% 
 mutate(main = first(str_subset(c_across(starts_with("var")), 
       regex("^C\\d+"), negate = TRUE))) %>%
 ungroup
# A tibble: 3 × 5
  var1    var2    var3  var4  main   
  <chr>   <chr>   <chr> <chr> <chr>  
1 C7931   C3490   R0781 I10   R0781  
2 R079    R0600   I10   C3490 R079   
3 S270XXA S225XXA C3490 C7931 S270XXA

data

df1 <- structure(list(var1 = c("C7931", "R079", "S270XXA"), var2 = c("C3490", 
"R0600", "S225XXA"), var3 = c("R0781", "I10", "C3490"), var4 = c("I10", 
"C3490", "C7931")), class = "data.frame", row.names = c(NA, -3L
))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading