Home data.table, filter >= median per group and keep two lowest

Questions

data.table, filter >= median per group and keep two lowest

May 2, 2022

Situation & Goal

I’m having a large table that looks like (simplified):

|MainCat |SubCat | Value|
|:-------|:------|-----:|
|A       |Y      |    50|
|A       |Z      |    60|
|A       |ZZZZ   |    80|
|A       |XX     |    90|
|A       |X      |   100|
|B       |XYXY   |    15|
|B       |XXX    |    50|
|B       |YY     |    60|
|B       |ZZZ    |   150|
|B       |ZZ     |   400|

Now I want to filter each group (MainCat) and keep only the two lowest values (Value) that are equal/greater than median:

|MainCat |SubCat | Value|Comment               |
|:-------|:------|-----:|:---------------------|
|A       |Y      |    50|-                     |
|A       |Z      |    60|-                     |
|A       |ZZZZ   |    80|Median, First to keep |
|A       |XX     |    90|Second to keep        |
|A       |X      |   100|-                     |
|B       |XYXY   |    15|-                     |
|B       |XXX    |    50|-                     |
|B       |YY     |    60|Median, First to keep |
|B       |ZZZ    |   150|Second to keep        |
|B       |ZZ     |   400|-                     |

Expected result:

|MainCat |SubCat | Value|
|:-------|:------|-----:|
|A       |ZZZZ   |    80|
|A       |XX     |    90|
|B       |YY     |    60|
|B       |ZZZ    |   150|

My (failed) attempt

I tried df2[Value >= df2[MainCat==MainCat, median(Value, na.rm=TRUE)]] but this calculates a Median for all values, without grouping. Can somebody help? As performance is key, I prefer a data.table solution if possible. Thank you very much.

MWE

Base data:

df2 = structure(list(MainCat = c("A", "A", "A", "A", "A", "B", "B", 
                                 "B", "B", "B"), SubCat = c("Y", "Z", "ZZZZ", "XX", "X", "XYXY", 
                                                "XXX", "YY", "ZZZ", "ZZ"), Value = c(50, 60, 80, 90, 100, 15, 
                                                             50, 60, 150, 400)), row.names = c(NA, -10L), class = c("data.table", 
                                                                "data.frame"))

Result:

data.table(MainCat=c("A","A","B","B"),
                 SubCat=c("ZZZZ", "XX", "YY", "ZZZ"),
                 Value=c(80,90,60,150))

>Solution :

Do a group by ‘MainCat’, get the row index (.I) after creating the logical expression with the median ‘Value’, extract the index ($V1), subset the data, order by the ‘MainCat’, ‘Value’, get the first two rows with head, grouped by ‘MainCat’

library(data.table)
df2[df2[, .I[Value >= median(Value, na.rm = TRUE)],.(MainCat)]$V1
    ][order(MainCat, Value), head(.SD, 2), MainCat]

-output

   MainCat SubCat Value
    <char> <char> <num>
1:       A   ZZZZ    80
2:       A     XX    90
3:       B     YY    60
4:       B    ZZZ   150

data.table

byMR

Published May 02, 2022

Add a comment

How to <br> in javascript?

byMR

May 2, 2022

Questions

C++: For two different functions with do-while loops, why does x+=y give the same result as x=x+y in one function but not the other?

byMR

May 2, 2022

Questions

How to compare 2 columns of a dataframe values with possibilities that are in a list

byMR

May 2, 2022

Questions

Change text fragment in TextFormField and make it static (not changeable) Flutter

byMR

May 2, 2022

Questions

access class private property inside nested method and function?

byMR

May 2, 2022

Questions

dplyr: correlations with NA

byMR

May 2, 2022

data.table, filter >= median per group and keep two lowest

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to <br> in javascript?

C++: For two different functions with do-while loops, why does x+=y give the same result as x=x+y in one function but not the other?

How to compare 2 columns of a dataframe values with possibilities that are in a list

Change text fragment in TextFormField and make it static (not changeable) Flutter

access class private property inside nested method and function?

dplyr: correlations with NA

Keep Up to Date with the Most Important News

data.table, filter >= median per group and keep two lowest

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to <br> in javascript?

C++: For two different functions with do-while loops, why does x+=y give the same result as x=x+y in one function but not the other?

How to compare 2 columns of a dataframe values with possibilities that are in a list

Change text fragment in TextFormField and make it static (not changeable) Flutter

access class private property inside nested method and function?

dplyr: correlations with NA

Discover more from Dev solutions