R using a list of column names to remove unwanted character from data

February 9, 2022

I have read several CSVs into a tibble and am working to clean up the data. I have several columns where I need to remove the ‘%’ character from the data. Every column with entries containing ‘%’ has a similar name (for simplicity’s sake,lets say they all end with ‘A’), so I made a list of all column names, then used a for loop to create a list of columns names that need to be changed.

list_col <- colnames(df1)
change_col <- list()

for (i in seq_along(list_col)){
  last_char <- substr(list_col[i], nchar(col_list[i]), nchar(col_list[i]))
  ifelse(last_char == 'A', change_cols <- c(change_cols,list_col[i]),0))
}

To get rid of the ‘%’, my next step is to loop through the list I just created and mutate the df for each entry on the list. I figured out you can’t pass variables as an argument to the mutate function, so I tried passing parse_character(change_cols[[i]]) as an argument as shown below:

for (i in seq_along(change_col)){
  df1 <- df1 %>%
    mutate(parse_character(change_col[[i]])
           = gsub('\\%',"", parse_character(change_col[[i]])))
}

When I run this code, I get Error: unexpected ‘}’ in "}". I’m presuming it’s because it does not like the arguments I’ve given the mutate function, but I’ve run out of ideas for a different way to run the loop.

>Solution :

With dplyr, you can use across() inside mutate to pick which columns to work on. You haven’t provided any sample data so this is untested, but should give you the idea:

df1 %>%
  mutate(across(
    ends_with("A"), ## all columns that end with "A"
    gsub, pattern = "%", replacement = "", fixed = TRUE
  ))

Setting fixed = TRUE means we will match the pattern exactly, not with regex, so we don’t have to worry about escapes. It will also be faster (not that that should matter unless you have tons of data).

See the ?across help page for more explanation and examples.