Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Double loop to iterate in many columns to find outliers in R

I have a dataframe with "id" of an individual and two traits ("x" e "y") like the following:

id = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10","A11","A12","A13","A14","A15","A16","A17","A18","A19","A20","A21","A22","A23","A24")
x = c(10,4,6,8,9,8,7,6,12,14,11,9,8,4,5,10,14,12,15,7,10,14,24,28)
y = c(1.5,1.2,5,2,0.8,4,1,1.1,1.2,1.4,1.3,1.6,0.9,0.8,1,1.1,1.3,1.5,1.2,1.1,1,1.2,1.1,1)
a = data.frame(id,x,y)

I want to have a loop to iterate over each trait and for each individual so that I can create a new dataframe (or new columns of a) in which the individual will have a 1 if it is an outlier and a 0 if it is not. Considering outlier as any point that is deviated ± 3 sd from the mean of the trait.

In this example, an outlier for "x" is 28 and for "y" is 5. The required result then could be something like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

id = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10","A11","A12","A13","A14","A15","A16","A17","A18","A19","A20","A21","A22","A23","A24")
x_out = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1)
y_out = c(0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
a_out = data.frame(id, x_out, y_out)

Any idea how to do it in a loop? The idea is that if I include new traits or individuals, I don’t need to change the loop. Thanks!

>Solution :

No need for loops, you can just test whether the absolute z-score (abs(scale())) is >= 3 for all columns at once:

a_out <- a
a_out[, -1] <- as.integer(abs(scale(a[, -1])) >= 3)
#> a_out
    id x y
1   A1 0 0
2   A2 0 0
3   A3 0 1
4   A4 0 0
5   A5 0 0
6   A6 0 0
7   A7 0 0
8   A8 0 0
9   A9 0 0
10 A10 0 0
11 A11 0 0
12 A12 0 0
13 A13 0 0
14 A14 0 0
15 A15 0 0
16 A16 0 0
17 A17 0 0
18 A18 0 0
19 A19 0 0
20 A20 0 0
21 A21 0 0
22 A22 0 0
23 A23 0 0
24 A24 1 0

Or using dplyr:

library(dplyr)

a_out <- a %>% 
  mutate(across(!id, \(x) as.integer(abs(scale(x)) >= 3)))
# same output as above
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading