here it is my problem in the form of a reproducible example and my partial attempt of solution
# input
mydf_in<-data.frame(a=letters[6:10],
b=c("<0.5","2","<0.5", "9", "10"),
c=1:5,
d=6:10,
e=c("<0.8","12","<0.8", "<0.8", "<0.8"))
mydf_in
# output
# the desired final result
mydf_out<-data.frame(a=letters[6:10],
b=c(0.5,2,0.5,9,10),
b_flag=c(1,0,1,0,0),
c=1:5,
d=6:10,
e=c(0.8,12,0.8,0.8,0.8),
e_flag=c(1,0,1,1,1)
)
mydf_out
library(tidyverse)
mydf_in %>%
select(where(~ is.character(.x) &
any(str_detect(.x, "<")
)
)
) %>%
# in between here is missing the creation and
# the population of the flagging columns, i.e. "b_flag" and "e_flag"
mutate(across(everything(), ~ as.numeric(str_replace(.x, "<", ""))))
in short, what is missing in the between of the above code snippet, for each selected column:
- create a corresponding flagging column
- populate the rows of the flagging column with 1 or 0 depending on the presence of the sign "<" (see desired output)
>Solution :
If we want to use the conditions explicitly, instead of select use mutate with the where the condition – to create the ‘flag’ columns loop over the columns with across and to change the column types use across
library(dplyr)
library(stringr)
mydf_in %>%
mutate(across(where(~ is.character(.x) &
any(str_detect(.x, fixed("<")))), ~
+(str_detect(.x, fixed("<"))), .names = "{.col}_flag"),
across(where(~ is.character(.x) &
any(str_detect(.x, fixed("<")))), ~ readr::parse_number(.)))
-output
a b c d e b_flag e_flag
1 f 0.5 1 6 0.8 1 1
2 g 2.0 2 7 12.0 0 0
3 h 0.5 3 8 0.8 1 1
4 i 9.0 4 9 0.8 0 1
5 j 10.0 5 10 0.8 0 1