Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to remove ONLY a specific group of characters from both names and values of dataframe in R

assuming this is my df

df <- tibble(`a*`=c("_x__", "*y", "z+-"),
             b=c("_x__", "*y", "z+-"))
> df
# A tibble: 3 x 2
  `a*`  b    
  <chr> <chr>
1 _x__  _x__ 
2 *y    *y   
3 z+-   z+-  

I want to remove *, _ and + characters from both column names and values if exist to get

# A tibble: 3 x 2
  a     b    
  <chr> <chr>
1 x     x    
2 y     y    
3 z-    z-  

so I am using gsub(), but it only removes the first character. in fact I am looking for a pretty way to achieve both these changes using dply r pipes. Any hint or idea is appreciated.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df %>%
  mutate_all(funs(gsub(c("_","[*]","+"),"",.))) 


names(df) <- str_remove_all("[*]")

>Solution :

We can pass multiple characters to match within [] in str_remove or gsub. But, not a vector of patterns in gsub as pattern is not vectorized in gsub

library(dplyr)
library(stringr)
df <- df %>% 
   transmute(across(everything(), str_remove_all,
    pattern = "[*_+]", .names = "{str_remove_all(.col, '[*_+]')}"))

-output

df
# A tibble: 3 × 2
  a     b    
  <chr> <chr>
1 x     x    
2 y     y    
3 z-    z-   
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading