I have a large data frame that looks like the df1.
I want to search in the range of columns between col2:col4 (col2,col3,col4)if any character contains the string S.
library(tidyverse)
df <- tibble(position=c(100,200,300),
correction=c("62M89S",
"8M1D55M88S",
"1S25M1S36M89S"))
df1 <- df %>%
separate(correction, into = str_c("col", 1:5),
sep = "(?<=\\D)(?=\\d)", fill = "left", remove = FALSE)
df1
#> # A tibble: 3 × 7
#> position correction col1 col2 col3 col4 col5
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 100 62M89S <NA> <NA> <NA> 62M 89S
#> 2 200 8M1D55M88S <NA> 8M 1D 55M 88S
#> 3 300 1S25M1S36M89S 1S 25M 1S 36M 89S
Created on 2022-03-12 by the reprex package (v2.0.1)
I want my data to look like this
df1
#> position correction col1 col2 col3 col4 col5 inner_S
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 100 62M89S <NA> <NA> <NA> 62M 89S NO
#> 2 200 8M1D55M88S <NA> 8M 1D 55M 88S NO
#> 3 300 1S25M1S36M89S 1S 25M 1S 36M 89S YES
since there is the character 1S in col3.
I tried this, but I am sure I am violating some properties
df1 %>%
mutate_at(vars(col2:col4),
~inner_S=case_when(grepl("S",.) ~ "Yes",
TRUE ~ "No"
))
>Solution :
dplyr‘s c_across is very handy for operations like this:
df1 %>%
rowwise() %>%
mutate(inner_S = ifelse(any(grepl('S', c_across(col1:col4))), 'YES', 'NO'))
position correction col1 col2 col3 col4 col5 inner_S
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 100 62M89S NA NA NA 62M 89S NO
2 200 8M1D55M88S NA 8M 1D 55M 88S NO
3 300 1S25M1S36M89S 1S 25M 1S 36M 89S YES