Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Select columns of specific types and manipulate their content

here it is my problem in the form of a reproducible example and my partial attempt of solution

# input
mydf_in<-data.frame(a=letters[6:10], 
                 b=c("<0.5","2","<0.5", "9", "10"), 
                 c=1:5, 
                 d=6:10, 
                 e=c("<0.8","12","<0.8", "<0.8", "<0.8"))

mydf_in

# output 
# the desired final result

mydf_out<-data.frame(a=letters[6:10],
                     b=c(0.5,2,0.5,9,10),
                     b_flag=c(1,0,1,0,0),
                     c=1:5, 
                     d=6:10, 
                     e=c(0.8,12,0.8,0.8,0.8),
                     e_flag=c(1,0,1,1,1)
                     )

mydf_out

library(tidyverse)

mydf_in %>%
  select(where(~ is.character(.x) & 
                 any(str_detect(.x, "<")
                     )
               )
         ) %>%
  
  # in between here is missing the creation and
  # the population of the flagging columns, i.e. "b_flag" and "e_flag"
  
  mutate(across(everything(), ~ as.numeric(str_replace(.x, "<", ""))))

in short, what is missing in the between of the above code snippet, for each selected column:

  • create a corresponding flagging column
  • populate the rows of the flagging column with 1 or 0 depending on the presence of the sign "<" (see desired output)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

If we want to use the conditions explicitly, instead of select use mutate with the where the condition – to create the ‘flag’ columns loop over the columns with across and to change the column types use across

library(dplyr)
library(stringr)
mydf_in %>%
    mutate(across(where(~ is.character(.x) & 
        any(str_detect(.x, fixed("<")))), ~
     +(str_detect(.x, fixed("<"))), .names = "{.col}_flag"), 
   across(where(~ is.character(.x) & 
    any(str_detect(.x, fixed("<")))), ~ readr::parse_number(.)))

-output

  a    b c  d    e b_flag e_flag
1 f  0.5 1  6  0.8      1      1
2 g  2.0 2  7 12.0      0      0
3 h  0.5 3  8  0.8      1      1
4 i  9.0 4  9  0.8      0      1
5 j 10.0 5 10  0.8      0      1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading