Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to gsub "/" and "\" from mutiple cols

I have a df like this:

enter image description here

I want to clean it up by two method:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  1. gsub the subject 1-4 if it started with \ or / to "";
    or
  2. change all / to \, and add \ to the one that is not start with \.

Is it a way to do this using mutate(across(everything(),...) or any other way?

I would like to know how to achieve both methods if it is possible. Thanks.

The ideal output will looks like this:

enter image description here

sample data:

df<- structure(list(ID = c("Tom", "Jerry"), Subject1 = c("/Art", "/ELA"
), Subject2 = c("\\Math", "/Math"), Subject3 = c("PE", "\\Bio\\2"
), Subject4 = c(NA, "\\Music\\1")), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))

>Solution :

We may match any character that are not a letter from the start (^) and remove it with str_remove

library(dplyr)
library(stringr)
df %>%
     mutate(across(starts_with("Subject"), 
   ~ str_remove(.x, "^[^[:alpha:]]+")))

-output

# A tibble: 2 × 5
  ID    Subject1 Subject2 Subject3 Subject4  
  <chr> <chr>    <chr>    <chr>    <chr>     
1 Tom   Art      Math     "PE"      <NA>     
2 Jerry ELA      Math     "Bio\\2" "Music\\1"

Or escape the \\

df %>%
   mutate(across(starts_with("Subject"), ~ str_remove(.x, "^(/|\\\\)")))
# A tibble: 2 × 5
  ID    Subject1 Subject2 Subject3 Subject4  
  <chr> <chr>    <chr>    <chr>    <chr>     
1 Tom   Art      Math     "PE"      <NA>     
2 Jerry ELA      Math     "Bio\\2" "Music\\1"

or can also be

df %>%
   mutate(across(starts_with("Subject"), ~ str_remove(.x, "^[/\\\\]")))
# A tibble: 2 × 5
  ID    Subject1 Subject2 Subject3 Subject4  
  <chr> <chr>    <chr>    <chr>    <chr>     
1 Tom   Art      Math     "PE"      <NA>     
2 Jerry ELA      Math     "Bio\\2" "Music\\1"

For the second case, it would be to use str_replace

df %>%
  mutate(across(starts_with("Subject"), 
  ~ str_replace(str_replace(.x, "^\\\\", "/"), "^([A-Z])", "/\\1")))

# A tibble: 2 × 5
  ID    Subject1 Subject2 Subject3  Subject4   
  <chr> <chr>    <chr>    <chr>     <chr>      
1 Tom   /Art     /Math    "/PE"      <NA>      
2 Jerry /ELA     /Math    "/Bio\\2" "/Music\\1"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading