Is there a way to set variable labels of data frame using the map family?

November 4, 2021

I have a data set and accompanying datat dictionary. And I would like to use the data dictionary to set the variable labels of the dataset. I tried using the explicit for loop but it appears to be quite slow. Is there a way to use the map family from tidyverse to achieve the same goal?

library(tidyverse)

mydata <- tibble(
  a_1 = c(20,22, 13,14,44),
  a_2 = c(42, 13, 32, 31, 14),
  b = c("male", "female", "male", "female", "male"),
  c = c("Primary", "secondary", "Tertiary", "Primary", "Secondary")
)

dictionary <- tibble(
  variable = c("a", "b", "c"),
  label = c("Age", "Gender", "Education"),
  type = c("mselect", "select", "select")

)


variables <- names(mydata)


for (var in variables){

  vm <- unique(str_remove_all(var, "_.*")) # Take care of the variables with _

  varlbl <- filter(dictionary, variable == vm) %>%
    select(label) %>% pull


    attr(mydata[[var]], "label") <- varlbl
}


#---- Map the variable labels using map
#

>Solution :

base R

mydata[] <- Map(
  function(x, lbl) if (!is.na(lbl)) `attr<-`(x, "label", lbl) else x, 
  mydata, dictionary$label[ match(gsub("_.*", "", names(mydata)),
  dictionary$variable) ])
str(mydata)
# tibble [5 x 4] (S3: tbl_df/tbl/data.frame)
#  $ a_1: num [1:5] 20 22 13 14 44
#   ..- attr(*, "label")= chr "Age"
#  $ a_2: num [1:5] 42 13 32 31 14
#   ..- attr(*, "label")= chr "Age"
#  $ b  : chr [1:5] "male" "female" "male" "female" ...
#   ..- attr(*, "label")= chr "Gender"
#  $ c  : chr [1:5] "Primary" "secondary" "Tertiary" "Primary" ...
#   ..- attr(*, "label")= chr "Education"

The mydata[] <- reassignment is intentional and a small hack: if we do mydata <- (no brackets), then Map returns a list and the "frame" properties are lost. However, mydata[] <- reassigns the contents (columns) with the new data, and the replacement comes as a list/frame, and the mydata frame-like properties are preserved.

I use this frequently when I want to (for example) convert a subset of columns to something else. I might do somedata[3:6] <- lapply(somedata[3:6], as.numeric), and I think it is much more readable than other methods to get the same effect.

purrr

library(dplyr) # just for %>% here
library(purrr)
mydata <- map2_dfc(
  mydata,
  dictionary$label[ match(gsub("_.*", "", names(mydata)), dictionary$variable) ],
  ~ `attr<-`(.x, "label", .y))

For both, I’m using a shortcut "cheat": these two are equivalent:

{
  attr(x, "label") <- "something"
  x
}

## is equivalent to

{
  `attr<-`(x, "label", "something")
}

in that they both return the updated x. It’s a little code-golf, a little aesthetics (reduced requirement for semicolons and braces), but you can easily shift to the more traditional (first) method if you prefer.