Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is there a way to set variable labels of data frame using the map family?

I have a data set and accompanying datat dictionary. And I would like to use the data dictionary to set the variable labels of the dataset. I tried using the explicit for loop but it appears to be quite slow. Is there a way to use the map family from tidyverse to achieve the same goal?

library(tidyverse)

mydata <- tibble(
  a_1 = c(20,22, 13,14,44),
  a_2 = c(42, 13, 32, 31, 14),
  b = c("male", "female", "male", "female", "male"),
  c = c("Primary", "secondary", "Tertiary", "Primary", "Secondary")
)

dictionary <- tibble(
  variable = c("a", "b", "c"),
  label = c("Age", "Gender", "Education"),
  type = c("mselect", "select", "select")

)


variables <- names(mydata)


for (var in variables){

  vm <- unique(str_remove_all(var, "_.*")) # Take care of the variables with _

  varlbl <- filter(dictionary, variable == vm) %>%
    select(label) %>% pull


    attr(mydata[[var]], "label") <- varlbl
}


#---- Map the variable labels using map
#

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

base R

mydata[] <- Map(
  function(x, lbl) if (!is.na(lbl)) `attr<-`(x, "label", lbl) else x, 
  mydata, dictionary$label[ match(gsub("_.*", "", names(mydata)),
  dictionary$variable) ])
str(mydata)
# tibble [5 x 4] (S3: tbl_df/tbl/data.frame)
#  $ a_1: num [1:5] 20 22 13 14 44
#   ..- attr(*, "label")= chr "Age"
#  $ a_2: num [1:5] 42 13 32 31 14
#   ..- attr(*, "label")= chr "Age"
#  $ b  : chr [1:5] "male" "female" "male" "female" ...
#   ..- attr(*, "label")= chr "Gender"
#  $ c  : chr [1:5] "Primary" "secondary" "Tertiary" "Primary" ...
#   ..- attr(*, "label")= chr "Education"

The mydata[] <- reassignment is intentional and a small hack: if we do mydata <- (no brackets), then Map returns a list and the "frame" properties are lost. However, mydata[] <- reassigns the contents (columns) with the new data, and the replacement comes as a list/frame, and the mydata frame-like properties are preserved.

I use this frequently when I want to (for example) convert a subset of columns to something else. I might do somedata[3:6] <- lapply(somedata[3:6], as.numeric), and I think it is much more readable than other methods to get the same effect.

purrr

library(dplyr) # just for %>% here
library(purrr)
mydata <- map2_dfc(
  mydata,
  dictionary$label[ match(gsub("_.*", "", names(mydata)), dictionary$variable) ],
  ~ `attr<-`(.x, "label", .y))

For both, I’m using a shortcut "cheat": these two are equivalent:

{
  attr(x, "label") <- "something"
  x
}

## is equivalent to

{
  `attr<-`(x, "label", "something")
}

in that they both return the updated x. It’s a little code-golf, a little aesthetics (reduced requirement for semicolons and braces), but you can easily shift to the more traditional (first) method if you prefer.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading