Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Applying conditional functions on multiple columns of a dataframe, based on values of a variable

I have a dataframe with a factor variable identifying my groups (here y), and multiple numerical variables (to simplify, here I only show two x and z):

df = tribble(
  ~x,     ~y,     ~z,
  1,     "a",     5,   
  2,     "b",     6,   
  3,     "a",     7,    
  4,     "b",     8,  
  5,     "a",     9,  
  6,     "b",     10
)

I want to add new columns to my dataframe in which I apply different mathematical functions on those numerical variables (x and z), based on the values of the factor variable (y). For the example dataframe above: all observations with y == "a" are added with 1, and the ones with y == "b" are added with 2.

This is my code to do it:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df %>% mutate(x_new = case_when(grepl("a", y) ~ x + 1,
                                grepl("b", y) ~ x + 2))

### Output
# A tibble: 6 × 4
      x y         z     x_new
  <dbl> <chr> <dbl> <dbl>
1     1 a         5     2
2     2 b         6     4
3     3 a         7     4
4     4 b         8     6
5     5 a         9     6
6     6 b        10     8

My code works OK for adding one variable, but I want to apply the same functions for ALL the numerical variables, so in the example I want to apply the functions to the "z" variable as well and store the values in another new column. Since I have many numerical columns I don’t want to manually mutate them one by one with the approach above. Any advice on how to do this? (specially tidyverse solutions but any help is very appreciated)

>Solution :

Make a lookup to match a, b with 1,2. Then add to df excluding y column. Finally, suffix with "_new" and column bind back to original dataframe:

ll <- setNames(c(1, 2), c("a", "b"))

x <- df[, -2 ] + ll[ df$y ]
colnames(x) <- paste0(colnames(x), "_new")

cbind(df, x)
#   x y  z x_new z_new
# 1 2 a  6     3     7
# 2 4 b  8     6    10
# 3 4 a  8     5     9
# 4 6 b 10     8    12
# 5 6 a 10     7    11
# 6 8 b 12    10    14
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading