Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to prevent pmax/pmin from taking non-numerical values into consideration?

I am using pmax and pmin to extract the max and min values from each row. I have some values that are statistically not significant and these values are surrounded by <>. For some reason, pmax and pmin still take into consideration these values and then I cannot calculate the difference between values that are significant. Below is an example:

ID Var1 Var2 Var3 Var4
A 1 !5! NA 10
B 20 NA NA 3
C !20! 10 NA NA
D NA NA 30 NA
E !10! NA NA NA

I want the !xx! values not included when I do the following:

DF1 = data.frame(ID=c("A","B","C","D","E"), 
                 Var1=c("1","20","!20!","NA","!10!"), 
                 Var2=c("!5!","NA","10","NA","NA"), 
                 Var3=c("NA","NA","NA","30","NA"), 
                 Var4=c("10","NA","NA","NA","NA"),
                 Var5=c("NA","!50!","20","NA","NA"))
DF1$max <- pmax(DF1$Var1,DF1$Var2,DF1$Var3,DF1$Var4,na.rm = TRUE)
DF1$min <- pmin(DF1$Var1,DF1$Var2,DF1$Var3,DF1$Var4,na.rm = TRUE)

This leads to me getting the following:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

When the following is what I want:

How do I prevent the !xx! values from being taken up by pmax and pmin? I appreciate any help!

>Solution :

Assuming your "NA" is really NA (not a string literal):

DF1[-1] <- lapply(DF1[-1], function(z) replace(z, z=="NA", NA))

we can do this:

do.call(pmax, c(lapply(DF1[-1], function(z) replace(z, grepl("!", z), NA)), list(na.rm = TRUE)))
# [1] "10" "20" "20" "30" NA  

results stored with:

DF1$max <- do.call(pmax, c(lapply(DF1[-1], function(z) replace(z, grepl("!", z), NA)), list(na.rm = TRUE)))
DF1$min <- do.call(pmin, c(lapply(DF1[-1], function(z) replace(z, grepl("!", z), NA)), list(na.rm = TRUE)))
DF1
#   ID Var1 Var2 Var3 Var4 Var5  max  min
# 1  A    1  !5! <NA>   10 <NA>   10    1
# 2  B   20 <NA> <NA> <NA> !50!   20   20
# 3  C !20!   10 <NA> <NA>   20   20   10
# 4  D <NA> <NA>   30 <NA> <NA>   30   30
# 5  E !10! <NA> <NA> <NA> <NA> <NA> <NA>

Note that we also need to add na.rm=FALSE.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading