Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

rename columns in R using str_replace_all for more than two string types

I have a dataset (dataraw) with column labels such as

condition1_men, condition1_women, condition2_men, condition3_women (etc)

I want to replace the strings ‘condition1’, ‘condition2’ with their names.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

condition1_women = related_women;

condition2_men = unrelated_men;

condition3_men = filler_men;

Current code:

data <- dataraw %>%
 rename_all(~ str_replace_all(str_replace(., 'condition1', "related"), 'condition2', "unrelated"))

This is working for up to 2 strings, every way I attempt to add a third string, I get unexpected symbol errors.

 data <- dataraw %>%
rename_all(~ str_replace_all(str_replace((., 'condition1', "related"), 'condition2', "unrelated"), 'condition3', "filler")))

I’m sure this must be simple, but no matter the combinations I try I’m getting errors.
Would anyone be able to point me towards the simple mistake I’m making?
Thanks.

>Solution :

rename_all was superseded over 6 years ago in favor of rename_with, I’ll use that:

library(dplyr)
dataraw <- data.frame(condition1_men=1, condition1_women=2, condition2_men=3, condition2_women=4, condition3_men=5)
dataraw
#   condition1_men condition1_women condition2_men condition2_women condition3_men
# 1              1                2              3                4              5
dataraw |>
  rename_with(.fn = ~ sub("^condition1_", "related_", sub("^condition2_", "unrelated_", .)))
#   related_men related_women unrelated_men unrelated_women condition3_men
# 1           1             2             3               4              5

If you have a (named) vector of "from=to" assignments, we can also do it like this to be a little more general:

conds <- c(condition1="related", condition2="unrelated")
dataraw |>
  rename_with(.fn = ~ Reduce(function(st, i) sub(names(conds)[i], conds[i], st), seq_along(conds), init = .x))
#   related_men related_women unrelated_men unrelated_women condition3_men
# 1           1             2             3               4              5

We need Reduce since we need to preserve all changes from previous condition mappings.

I often find data like this does better (in later data-munging/analysis) in a long format (as Limey suggested). For that, we can also do:

dataraw |>
  tidyr::pivot_longer(cols = everything(), names_pattern = "(.*)_(.*)",
                      names_to = c("cond", ".value")) |>
  mutate(cond2 = conds[match(sub("_.*", "", cond), names(conds))])
# # A tibble: 3 × 4
#   cond         men women cond2    
#   <chr>      <dbl> <dbl> <chr>    
# 1 condition1     1     2 related  
# 2 condition2     3     4 unrelated
# 3 condition3     5    NA NA       

though it might be simpler (data management, visualizing, updating, etc) if your mapping were in a different frame, which we can merge/join onto the original data:

cond_df <- tribble(
  ~ cond, ~ cond2
  , "condition1", "related"
  , "condition2", "unrelated"
)
dataraw |>
  tidyr::pivot_longer(cols = everything(), names_pattern = "(.*)_(.*)",
                      names_to = c("cond", ".value")) |>
  left_join(cond_df, by = "cond")
# # A tibble: 3 × 4
#   cond         men women cond2    
#   <chr>      <dbl> <dbl> <chr>    
# 1 condition1     1     2 related  
# 2 condition2     3     4 unrelated
# 3 condition3     5    NA NA       
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading