Filter groups based on difference two highest values

Advertisements

I have the following dataframe called df (dput below):

> df
   group value
1      A     5
2      A     1
3      A     1
4      A     5
5      B     8
6      B     2
7      B     2
8      B     3
9      C    10
10     C     1
11     C     1
12     C     8

I would like to filter groups based on the difference between their highest value (max) and second highest value. The difference should be smaller equal than 2 (<=2), this means that group B should be removed because the highest value is 8 and the second highest value is 3 which is a difference of 5. The desired output should look like this:

  group value
1     A     5
2     A     1
3     A     1
4     A     5
5     C    10
6     C     1
7     C     1
8     C     8

So I was wondering if anyone knows how to filter groups based on the difference between their highest and second-highest value?


dput of df:

df<-structure(list(group = c("A", "A", "A", "A", "B", "B", "B", "B", 
"C", "C", "C", "C"), value = c(5, 1, 1, 5, 8, 2, 2, 3, 10, 1, 
1, 8)), class = "data.frame", row.names = c(NA, -12L))

>Solution :

Using dplyr

library(dplyr)

df %>% 
  group_by(group) %>% 
  filter(abs(diff(sort(value, decreasing=T)[1:2])) <= 2) %>%
  ungroup()
# A tibble: 8 × 2
  group value
  <chr> <int>
1 A         5
2 A         1
3 A         1
4 A         5
5 C        10
6 C         1
7 C         1
8 C         8

A base R alternative

grp <- aggregate(. ~ group, df, function(x) 
  abs(diff(sort(x, decreasing=T)[1:2])) <= 2)

do.call(rbind, c(mapply(function(g, v) 
  list(df[df$group == g & v,]), grp$group, grp$value), make.row.names=F))
  group value
1     A     5
2     A     1
3     A     1
4     A     5
5     C    10
6     C     1
7     C     1
8     C     8

Leave a ReplyCancel reply