Is it possible to create multiple new columns based on a pattern of column names, and a function of multiple column values?
For example
set.seed(123)
tibble(x = seq(1:10), bm_a = runif(10), val_a = runif(10), bm_b = runif(10), val_b = runif(10)) |>
mutate(rel_a = val_a - bm_a, rel_b = val_b - bm_b)
# A tibble: 10 Ă— 7
x bm_a val_a bm_b val_b rel_a rel_b
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.288 0.957 0.890 0.963 0.669 0.0735
2 2 0.788 0.453 0.693 0.902 -0.335 0.209
3 3 0.409 0.678 0.641 0.691 0.269 0.0502
4 4 0.883 0.573 0.994 0.795 -0.310 -0.199
5 5 0.940 0.103 0.656 0.0246 -0.838 -0.631
6 6 0.0456 0.900 0.709 0.478 0.854 -0.231
7 7 0.528 0.246 0.544 0.758 -0.282 0.214
8 8 0.892 0.0421 0.594 0.216 -0.850 -0.378
9 9 0.551 0.328 0.289 0.318 -0.224 0.0290
10 10 0.457 0.955 0.147 0.232 0.498 0.0845
But imagine I have this for many different dynamically named columns all starting with bm_* and val_*. Is it possible to use some combination of mutate(), across(), pick(), starts_with() etc to get this to work for all possible column names?
>Solution :
Here’s how you can do it with across and pick:
library(tidyverse)
df |> mutate(across(starts_with("val"), ~ .x - pick(str_replace(cur_column(), "val", "bm")), .names = "{str_replace(.col, 'val', 'rel')}"))
But it seems probable that using pivot_longer() is a faster and more robust approach:
df |>
pivot_longer(-x, names_to = c(".value", "group"), names_pattern = "(bm|val)_(.+)") |>
mutate(rel = val - bm)
More details:
- cur_column() gets the current column name. So
str_replace(cur_column(), "val", "bm")gets the current column name with val replaced by bm. - .names sets the names of the output columns. If we didn’t do this, they’d overwrite the val columns!
- the reason for the curly brackets is because of the glue syntax. across needs strings for names. Check out the glue package for more information on how this works.