I found a post very similar to my problem (subtract a constant vector from each row in a matrix in r), but I was hoping I could solve this using dplyr.
I have a data.frame that looks like this:
set.seed(1)
toy_df <- data.frame(Patient.ID = letters[1:5],
Patient.Age = rnorm(5,35,4),
Protein.A = rnorm(5,100,10),
Protein.B = rnorm(5,100,10),
Protein.D = rnorm(5,100,10),
Protein.E = rnorm(5,100,10))
I calculated the median absolute deviation using this approach:
medianDeviation <- apply(X = toy_df[,grepl("^Protein\\.", names(toy_df))], MARGIN = 2, FUN = function(x) median(x) + (2*mad(x)))
It created a named vector with the median deviation for each protein. Now, I want to subtract the median deviation for each corresponding protein from "toy_df".
I asked chatGPT for a solution, and it suggested this:
result <- toy_df %>% mutate(across(names(medianDeviation), ~ . - medianDeviation[.col]))
It looks promising, but for some reason, it is not working. I think the problem lies in the "medianDeviation[.col]"; however, I can’t find any alternative. Any suggestions?
>Solution :
You could directly use:
mutate(toy_df, across(starts_with('Protein'), ~.x - median(.x) - 2*mad(.x)))
Patient.ID Patient.Age Protein.A Protein.B Protein.D Protein.E
1 a 32.49418 -20.518532 -18.76128 -16.764619 -5.878928
2 b 35.73457 -7.439558 -29.98066 -16.477185 -7.247338
3 c 31.65749 -4.930601 -40.09150 -6.876920 -14.323052
4 d 41.38112 -6.556035 -56.02609 -8.103071 -34.962218
5 e 36.31803 -15.367732 -22.62978 -10.376269 -8.870444
or use
. - medianDeviation[cur_column()]