Using purrr and select to create dichotomous variables


I’m trying to create columns of dichotomous variables based on presence (or absence) of selected continuous variables.



df <- tibble(z = c(0, 0), a_1 = c(.1, NA), a_2 = c(NA, .1))

out <- tibble(z = c(0, 0),
              a_1 = c(.1, NA), 
              a_2 = c(NA, .1), 
              a_1_d = c(1, 0), 
              a_2_d = c(0, 1))

I can do this on an ad hoc basis using mutate:

out <- df %>% 
  mutate(a_1_d = if_else(, 0, 1)) %>% 
  mutate(a_2_d = if_else(, 0, 1))

But my real use case involves a lot of variables, so I’d like to use purrr and dplyr::select. I’ve tried a bunch of approaches, such as:

out <- df %>% 
  select(starts_with("a_")) %>% 
  map(.x, .f = mutate({{.x}}_d = 
                        if_else(, 0, 1)))

But I think I’m missing something fundamental about some combination of name assignment within map and passing variables to map. What is the most efficient way to get from df to out using a purrr function and dplyr::select?

>Solution :

How do you feel about mutate() with across()? That seems like a good tool for this sort of problem.

You can choose which columns to work "across" with tidy select functions just like in select(). We then give the function we want to use on each column. You’ll see I used as.numeric() on the logical output of "not NA" (! to 0/1 but you could absolutely use if_else() here, as well. I use the purrr-style lambda in the function (i.e., ~).

To add a suffix to new columns to be added to the dataset I use a named list for .fns.

mutate(df, across(.cols = starts_with("a"),
                  .fns = list(d = ~as.numeric(!
#> # A tibble: 2 x 5
#>       z   a_1   a_2 a_1_d a_2_d
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     0   0.1  NA       1     0
#> 2     0  NA     0.1     0     1

Created on 2021-11-03 by the reprex package (v2.0.0)

Leave a Reply Cancel reply