Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter groups based on difference two highest values

I have the following dataframe called df (dput below):

> df
   group value
1      A     5
2      A     1
3      A     1
4      A     5
5      B     8
6      B     2
7      B     2
8      B     3
9      C    10
10     C     1
11     C     1
12     C     8

I would like to filter groups based on the difference between their highest value (max) and second highest value. The difference should be smaller equal than 2 (<=2), this means that group B should be removed because the highest value is 8 and the second highest value is 3 which is a difference of 5. The desired output should look like this:

  group value
1     A     5
2     A     1
3     A     1
4     A     5
5     C    10
6     C     1
7     C     1
8     C     8

So I was wondering if anyone knows how to filter groups based on the difference between their highest and second-highest value?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel


dput of df:

df<-structure(list(group = c("A", "A", "A", "A", "B", "B", "B", "B", 
"C", "C", "C", "C"), value = c(5, 1, 1, 5, 8, 2, 2, 3, 10, 1, 
1, 8)), class = "data.frame", row.names = c(NA, -12L))

>Solution :

Using dplyr

library(dplyr)

df %>% 
  group_by(group) %>% 
  filter(abs(diff(sort(value, decreasing=T)[1:2])) <= 2) %>%
  ungroup()
# A tibble: 8 × 2
  group value
  <chr> <int>
1 A         5
2 A         1
3 A         1
4 A         5
5 C        10
6 C         1
7 C         1
8 C         8

A base R alternative

grp <- aggregate(. ~ group, df, function(x) 
  abs(diff(sort(x, decreasing=T)[1:2])) <= 2)

do.call(rbind, c(mapply(function(g, v) 
  list(df[df$group == g & v,]), grp$group, grp$value), make.row.names=F))
  group value
1     A     5
2     A     1
3     A     1
4     A     5
5     C    10
6     C     1
7     C     1
8     C     8
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading