Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Select the row with the maximum value in each group based on multiple columns in R dplyr

My data frame looks like this one

library(tidyverse)

df1 <- tibble(col1= c("apple","apple","banana","banana"), 
              col2 = c("appl","aple","banan","bananb"),
              count_col1=c(1,1,4,4), count_col2=c(3,4,1,1))
df1
#> # A tibble: 4 × 4
#>   col1   col2   count_col1 count_col2
#>   <chr>  <chr>       <dbl>      <dbl>
#> 1 apple  appl            1          3
#> 2 apple  aple            1          4
#> 3 banana banan           4          1
#> 4 banana bananb          4          1

Created on 2022-02-17 by the reprex package (v2.0.1)

I want to select after grouping_by col1 the row that has the maximum value based on count_col1 and count_col2.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I want my data to look like this

      col1   col2   count_col1 count_col2
      apple  aple            1          4
      banana banan           4          1
      banana bananb          4          1

for one column you can write something

df1 %>% 
  slice(which.max(count_col1))

but not for two

>Solution :

We may get rowwise max of the ‘count’ columns with pmax, grouped by ‘col1’, filter the rows where the max value of ‘Max’ column is.

library(dplyr)
df1 %>% 
 mutate(Max = pmax(count_col1, count_col2) ) %>%
 group_by(col1) %>%
 filter(Max == max(Max)) %>%
 ungroup %>%
 select(-Max)

-output

# A tibble: 3 × 4
  col1   col2   count_col1 count_col2
  <chr>  <chr>       <dbl>      <dbl>
1 apple  aple            1          4
2 banana banan           4          1
3 banana bananb          4          1

We may also use slice_max

library(purrr)
df1 %>%
  group_by(col1) %>%
  slice_max(invoke(pmax, across(starts_with("count")))) %>%
  ungroup
# A tibble: 3 × 4
  col1   col2   count_col1 count_col2
  <chr>  <chr>       <dbl>      <dbl>
1 apple  aple            1          4
2 banana banan           4          1
3 banana bananb          4          1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading