Subtracting column values by a named vector in R

I found a post very similar to my problem (subtract a constant vector from each row in a matrix in r), but I was hoping I could solve this using dplyr.

I have a data.frame that looks like this:

set.seed(1)
toy_df <- data.frame(Patient.ID = letters[1:5],
                     Patient.Age = rnorm(5,35,4),
                     Protein.A = rnorm(5,100,10),
                     Protein.B = rnorm(5,100,10),
                     Protein.D = rnorm(5,100,10),
                     Protein.E = rnorm(5,100,10))

I calculated the median absolute deviation using this approach:

medianDeviation <- apply(X = toy_df[,grepl("^Protein\\.", names(toy_df))], MARGIN = 2, FUN = function(x) median(x) + (2*mad(x)))

It created a named vector with the median deviation for each protein. Now, I want to subtract the median deviation for each corresponding protein from "toy_df".

I asked chatGPT for a solution, and it suggested this:

result <- toy_df %>% mutate(across(names(medianDeviation), ~ . - medianDeviation[.col]))

It looks promising, but for some reason, it is not working. I think the problem lies in the "medianDeviation[.col]"; however, I can’t find any alternative. Any suggestions?

>Solution :

You could directly use:

mutate(toy_df, across(starts_with('Protein'), ~.x - median(.x) - 2*mad(.x)))

  Patient.ID Patient.Age  Protein.A Protein.B  Protein.D  Protein.E
1          a    32.49418 -20.518532 -18.76128 -16.764619  -5.878928
2          b    35.73457  -7.439558 -29.98066 -16.477185  -7.247338
3          c    31.65749  -4.930601 -40.09150  -6.876920 -14.323052
4          d    41.38112  -6.556035 -56.02609  -8.103071 -34.962218
5          e    36.31803 -15.367732 -22.62978 -10.376269  -8.870444

or use

. - medianDeviation[cur_column()]

Leave a Reply