Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Standardize Rename with a Dictionary

I’m looking to standardize some code which deals with cleaning data which has different column names over time. The idea is to create a dictionary along with a function which searches if a given dataset has names in the dictionary, and then replaces the names with the correct name (housed in the dictionary).

In the example below, ‘Sepal.Length’ would be converted to ‘sepal_length’.

column_dict <- tibble(
from = c('Sepal.Length', 'length_of_sepal', 'sepal.lgth'),
to = c('sepal_length', 'sepal_length', 'sepal_length')
)

iris %>%
  as_tibble %>%
  map2(., column_dict, rename)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can just pass a named vector as your dictionary to dplyr::rename(). Here you will want to take advantage of any_of() to build in flexibility to not require all of the dictionary terms to be present.

library(tidyverse)

old_names <- c('Sepal.Length', 'length_of_sepal', 'sepal.lgth')
new_names <- c('sepal_length', 'sepal_length', 'sepal_length')

# create named vector as dictionary
naming_key <- setNames(object = old_names, nm = new_names)

# rename according to naming key with any_of() in case there are missing columns in data
iris %>%
  tibble() %>% 
  rename(any_of(naming_key))
#> # A tibble: 150 x 5
#>    sepal_length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ... with 140 more rows

Created on 2022-02-18 by the reprex package (v2.0.1)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading