Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R using a list of column names to remove unwanted character from data

I have read several CSVs into a tibble and am working to clean up the data. I have several columns where I need to remove the ‘%’ character from the data. Every column with entries containing ‘%’ has a similar name (for simplicity’s sake,lets say they all end with ‘A’), so I made a list of all column names, then used a for loop to create a list of columns names that need to be changed.

list_col <- colnames(df1)
change_col <- list()

for (i in seq_along(list_col)){
  last_char <- substr(list_col[i], nchar(col_list[i]), nchar(col_list[i]))
  ifelse(last_char == 'A', change_cols <- c(change_cols,list_col[i]),0))
} 

To get rid of the ‘%’, my next step is to loop through the list I just created and mutate the df for each entry on the list. I figured out you can’t pass variables as an argument to the mutate function, so I tried passing parse_character(change_cols[[i]]) as an argument as shown below:

for (i in seq_along(change_col)){
  df1 <- df1 %>%
    mutate(parse_character(change_col[[i]])
           = gsub('\\%',"", parse_character(change_col[[i]])))
}

When I run this code, I get Error: unexpected ‘}’ in "}". I’m presuming it’s because it does not like the arguments I’ve given the mutate function, but I’ve run out of ideas for a different way to run the loop.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

With dplyr, you can use across() inside mutate to pick which columns to work on. You haven’t provided any sample data so this is untested, but should give you the idea:

df1 %>%
  mutate(across(
    ends_with("A"), ## all columns that end with "A"
    gsub, pattern = "%", replacement = "", fixed = TRUE
  ))

Setting fixed = TRUE means we will match the pattern exactly, not with regex, so we don’t have to worry about escapes. It will also be faster (not that that should matter unless you have tons of data).

See the ?across help page for more explanation and examples.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading