Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Use `across()` to create new columns with a function of multiple other columns

Is it possible to create multiple new columns based on a pattern of column names, and a function of multiple column values?

For example

set.seed(123)
tibble(x = seq(1:10), bm_a = runif(10), val_a = runif(10), bm_b = runif(10), val_b = runif(10)) |>
  mutate(rel_a = val_a - bm_a, rel_b = val_b - bm_b)
# A tibble: 10 Ă— 7
       x   bm_a  val_a  bm_b  val_b  rel_a   rel_b
   <int>  <dbl>  <dbl> <dbl>  <dbl>  <dbl>   <dbl>
 1     1 0.288  0.957  0.890 0.963   0.669  0.0735
 2     2 0.788  0.453  0.693 0.902  -0.335  0.209 
 3     3 0.409  0.678  0.641 0.691   0.269  0.0502
 4     4 0.883  0.573  0.994 0.795  -0.310 -0.199 
 5     5 0.940  0.103  0.656 0.0246 -0.838 -0.631 
 6     6 0.0456 0.900  0.709 0.478   0.854 -0.231 
 7     7 0.528  0.246  0.544 0.758  -0.282  0.214 
 8     8 0.892  0.0421 0.594 0.216  -0.850 -0.378 
 9     9 0.551  0.328  0.289 0.318  -0.224  0.0290
10    10 0.457  0.955  0.147 0.232   0.498  0.0845

But imagine I have this for many different dynamically named columns all starting with bm_* and val_*. Is it possible to use some combination of mutate(), across(), pick(), starts_with() etc to get this to work for all possible column names?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Here’s how you can do it with across and pick:

library(tidyverse)
df |> mutate(across(starts_with("val"), ~ .x - pick(str_replace(cur_column(), "val", "bm")), .names = "{str_replace(.col, 'val', 'rel')}")) 

But it seems probable that using pivot_longer() is a faster and more robust approach:

df |> 
  pivot_longer(-x, names_to = c(".value", "group"), names_pattern = "(bm|val)_(.+)") |>
  mutate(rel = val - bm)

More details:

  1. cur_column() gets the current column name. So str_replace(cur_column(), "val", "bm") gets the current column name with val replaced by bm.
  2. .names sets the names of the output columns. If we didn’t do this, they’d overwrite the val columns!
  3. the reason for the curly brackets is because of the glue syntax. across needs strings for names. Check out the glue package for more information on how this works.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading