Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to select row with max value from a group of groups in R?

I have a dataframe with a number of groupings. I want to create a dataframe populated by rows that are the maximum of a given group of groups. Given the dataframe

group unit treatment value         etc
     1    A         w     8       apple
     1    A         x     9        pear
     1    A         y     7      orange
     1    A         z     2        pear
     1    B         w     4  strawberry
     1    B         x     3 dragonfruit
     1    B         y     6   raspberry
     1    B         z     5       apple
     1    C         w    32      banana
     1    C         x    27       peach
     1    C         y    15        plum
     1    C         z    28      orange
     2    A         w    12     apricot
     2    A         x    11  blackberry
     2    A         y    10      banana
     2    A         z     9  raspeberry
     2    B         w     1        plum
     2    B         x     2       lemon
     2    B         y     3  grapefruit
     2    B         z     4       apple
     2    C         w    51         fig
     2    C         x    47     avocado
     2    C         y    68  blackberry
     2    C         z    53 dragonfruit

for each group, for each unit, I would like to select the row with the highest value, such that I would end up with:

group unit treatment value        etc
     1    A         x     9       pear
     1    B         y     6  raspberry
     1    C         w    32     banana
     2    A         w    12    apricot
     2    B         z     4      apple
     2    C         y    68 blackberry

the etc column is just to highlight that I’d like to select the whole row.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I could write a series of nested loops, but there feels like there has to be something more elegant. Happy for base or tidyverse suggestions.

>Solution :

You can do as follows:

library(dplyr)
filter(dt, value==max(value), .by=group:unit)

or (as @Limey suggests)

library(dplyr)
slice_max(dt, order_by= value, by=group:unit)

or

library(data.table)
setDT(dt)[, .SD[value==max(value)], .(group, unit)]

Output:

   group   unit treatment value        etc
   <int> <char>    <char> <int>     <char>
1:     1      A         x     9       pear
2:     1      B         y     6  raspberry
3:     1      C         w    32     banana
4:     2      A         w    12    apricot
5:     2      B         z     4      apple
6:     2      C         y    68 blackberry
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading