Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R – dplyr keep 2 most recent (date) rows

I have a dataset with groups and dates like this:

> df
   Group  Date 
   1      01-01-2016
   1      01-02-2016
   1      01-03-2016
   2      01-04-2016
   2      01-05-2016
   2      01-06-2016

I would like to only keep the most recent plus the second most recent rows. So I would like to end up with this:

> df
       Group  Date
       1      01-02-2016
       1      01-03-2016
       2      01-05-2016
       2      01-06-2016

Until now I got it sorted by date like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

sorted_data <- df %>% arrange(Group,Date)

And I also found that just getting the most recent date row, I could do this:

df %>% 
  group_by(Group) %>%
  slice(which.max(as.Date(Date, '%d-%m-%Y')))

But I’m not sure how to keep the 2 most recent rows, does someone know?

>Solution :

Does this work:

library(dplyr)

df %>% mutate(Date = lubridate::dmy(Date)) %>% group_by(Group) %>% slice_max(Date,n= 2)
# A tibble: 4 × 2
# Groups:   Group [2]
  Group Date      
  <dbl> <date>    
1     1 2016-03-01
2     1 2016-02-01
3     2 2016-06-01
4     2 2016-05-01
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading