Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

rowmeans but ignore certain values when calculating the mean but na.rm=F

Here is a sample dataset:


data <- data.frame(x=c(4,3,4,4,99),
                  y=c(4,NA,3,2,4),
                  z = c(88,NA,4,4,5),
                  w = c(4,5,2,3,4))

I would like to create a new column for means using rowMeans. I would like to keep na.rm=F because if its truly NA I do not want to include that into my means calculation.
But if its either 88/99 I would like R to ignore it while calculating the mean and still use the remaining valid values. So far I have the below.

data$mean <- rowMeans(subset(data, select = c(`x`,`y`,`z`,`w`)), na.rm = T)

But I am not sure how to add in a function where it would just ignore the 88 and 99 from calculations.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is what I am hoping to get

data <- data.frame(x=c(4,3,4,4,99),
                   y=c(4,NA,3,2,4),
                   z = c(88,NA,4,4,5),
                   w = c(4,5,2,3,4),
                   mean=c(4,NA,3.25,3.25,4.3))

Any help is appreciated – thank you!

>Solution :

Using rowMeans nevertheless with na.rm=TRUE, but on a subset and temporally replaceing 88 and 99 with NA.

s <- rowSums(is.na(data)) == 0  ## store row subset
v <- c("x", "y", "z", "w")  ## col subset to calc. mean
data$mean <- NA  ## ini column
m <- as.matrix(data[v])  ## we'll ned a matrix
data$mean[s] <- rowMeans(replace(m[s, v], m[s, v] %in% c(88, 99), NA), na.rm=TRUE)
data
#    x  y  z w     mean
# 1  4  4 88 4 4.000000
# 2  3 NA NA 5       NA
# 3  4  3  4 2 3.250000
# 4  4  2  4 3 3.250000
# 5 99  4  5 4 4.333333

Or simply using apply but is much slower.

f <- \(x) if (any(is.na(x))) NA else mean(x[!x %in% c(88, 99)])
cbind(data, mean=apply(data, 1, f))
#    x  y  z w     mean
# 1  4  4 88 4 4.000000
# 2  3 NA NA 5       NA
# 3  4  3  4 2 3.250000
# 4  4  2  4 3 3.250000
# 5 99  4  5 4 4.333333

From microbenchmark.

# Unit: milliseconds
#     expr       min        lq      mean    median        uq        max neval cld
#    apply 35.018162 35.941815 38.834333 36.394632 36.960161 212.469412   100   b
# rowMeans  1.097393  1.119396  1.493563  1.193787  1.226691   9.352118   100  a 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading