Home Use `across()` to create new columns with a function of multiple other columns

Questions

Use `across()` to create new columns with a function of multiple other columns

August 30, 2023

Is it possible to create multiple new columns based on a pattern of column names, and a function of multiple column values?

For example

set.seed(123)
tibble(x = seq(1:10), bm_a = runif(10), val_a = runif(10), bm_b = runif(10), val_b = runif(10)) |>
  mutate(rel_a = val_a - bm_a, rel_b = val_b - bm_b)

# A tibble: 10 × 7
       x   bm_a  val_a  bm_b  val_b  rel_a   rel_b
   <int>  <dbl>  <dbl> <dbl>  <dbl>  <dbl>   <dbl>
 1     1 0.288  0.957  0.890 0.963   0.669  0.0735
 2     2 0.788  0.453  0.693 0.902  -0.335  0.209 
 3     3 0.409  0.678  0.641 0.691   0.269  0.0502
 4     4 0.883  0.573  0.994 0.795  -0.310 -0.199 
 5     5 0.940  0.103  0.656 0.0246 -0.838 -0.631 
 6     6 0.0456 0.900  0.709 0.478   0.854 -0.231 
 7     7 0.528  0.246  0.544 0.758  -0.282  0.214 
 8     8 0.892  0.0421 0.594 0.216  -0.850 -0.378 
 9     9 0.551  0.328  0.289 0.318  -0.224  0.0290
10    10 0.457  0.955  0.147 0.232   0.498  0.0845

But imagine I have this for many different dynamically named columns all starting with bm_* and val_*. Is it possible to use some combination of mutate(), across(), pick(), starts_with() etc to get this to work for all possible column names?

>Solution :

Here’s how you can do it with across and pick:

library(tidyverse)
df |> mutate(across(starts_with("val"), ~ .x - pick(str_replace(cur_column(), "val", "bm")), .names = "{str_replace(.col, 'val', 'rel')}"))

But it seems probable that using pivot_longer() is a faster and more robust approach:

df |> 
  pivot_longer(-x, names_to = c(".value", "group"), names_pattern = "(bm|val)_(.+)") |>
  mutate(rel = val - bm)

More details:

cur_column() gets the current column name. So str_replace(cur_column(), "val", "bm") gets the current column name with val replaced by bm.
.names sets the names of the output columns. If we didn’t do this, they’d overwrite the val columns!
the reason for the curly brackets is because of the glue syntax. across needs strings for names. Check out the glue package for more information on how this works.