Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Subsetting a named numeric for top N values in R

I have a large numeric object which is the output of an isolation forest model.

I wish to subset the output of the model to find the top N outliers. Using the example code from here I can find the top outlier but I wish to find more than one outlier

My data looks as follows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

if (!require("pacman")) install.packages("pacman")
pacman::p_load(isotree)

set.seed(1) 

m <- 100 

n <- 2 

X <- matrix(rnorm(m * n), nrow = m)

# ADD CLEAR OUTLIER TO THE DATA
X <- rbind(X, c(3, 3))

# TRAIN AN ISOLATION FOREST MODEL
iso <- isolation.forest(X, ntrees = 10, nthreads = 1)

# MAKE A PREDICTION TO SCORE EACH ROW
pred <- predict(iso, X)

The max outlier can be subset using the following

X[which.max(pred), ]

dplyr::slice_max doesn’t appear to be compatible with my large numeric object.

Any suggestions that would allow me to subset my data to find the top N outliers would be greatly appreciated.

>Solution :

Does this solve your problem?

library(tidyverse)
#install.packages("isotree")
library(isotree)

set.seed(1) 

m <- 100 

n <- 2 

X <- matrix(rnorm(m * n), nrow = m)

# ADD CLEAR OUTLIER TO THE DATA
X <- rbind(X, c(3, 3))

# TRAIN AN ISOLATION FOREST MODEL
iso <- isolation.forest(X, ntrees = 10, nthreads = 1)

# MAKE A PREDICTION TO SCORE EACH ROW
pred <- predict(iso, X)

X[which.max(pred), ]
#> [1] 3 3

# Perhaps this?
data.frame(X, "pred" = pred) %>%
  slice_max(order_by = pred, n = 3)
#>          X1         X2      pred
#> 1  3.000000  3.0000000 0.7306871
#> 2 -1.523567 -1.4672500 0.6496666
#> 3 -2.214700 -0.6506964 0.5982211

# Or maybe this?
data.frame(X, "pred" = pred) %>%
  slice_max(order_by = X1, n = 3)
#>         X1        X2      pred
#> 1 3.000000 3.0000000 0.7306871
#> 2 2.401618 0.4251004 0.5014570
#> 3 2.172612 0.2075383 0.4811756

Created on 2022-04-06 by the reprex package (v2.0.1)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading