Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using purrr and select to create dichotomous variables

I’m trying to create columns of dichotomous variables based on presence (or absence) of selected continuous variables.

Example:

library(tidyverse)

df <- tibble(z = c(0, 0), a_1 = c(.1, NA), a_2 = c(NA, .1))

out <- tibble(z = c(0, 0),
              a_1 = c(.1, NA), 
              a_2 = c(NA, .1), 
              a_1_d = c(1, 0), 
              a_2_d = c(0, 1))

I can do this on an ad hoc basis using mutate:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

out <- df %>% 
  mutate(a_1_d = if_else(is.na(a_1), 0, 1)) %>% 
  mutate(a_2_d = if_else(is.na(a_2), 0, 1))

But my real use case involves a lot of variables, so I’d like to use purrr and dplyr::select. I’ve tried a bunch of approaches, such as:

out <- df %>% 
  select(starts_with("a_")) %>% 
  map(.x, .f = mutate({{.x}}_d = 
                        if_else(is.na(.x), 0, 1)))

But I think I’m missing something fundamental about some combination of name assignment within map and passing variables to map. What is the most efficient way to get from df to out using a purrr function and dplyr::select?

>Solution :

How do you feel about mutate() with across()? That seems like a good tool for this sort of problem.

You can choose which columns to work "across" with tidy select functions just like in select(). We then give the function we want to use on each column. You’ll see I used as.numeric() on the logical output of "not NA" (!is.na) to 0/1 but you could absolutely use if_else() here, as well. I use the purrr-style lambda in the function (i.e., ~).

To add a suffix to new columns to be added to the dataset I use a named list for .fns.

mutate(df, across(.cols = starts_with("a"),
                  .fns = list(d = ~as.numeric(!is.na(.x)))))
#> # A tibble: 2 x 5
#>       z   a_1   a_2 a_1_d a_2_d
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     0   0.1  NA       1     0
#> 2     0  NA     0.1     0     1

Created on 2021-11-03 by the reprex package (v2.0.0)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading