How to filter all rows with last and second last observation in R (dplyr)

July 31, 2023

I have the following dataframe in R

data <- structure(list(Version = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L), levels = c("Nov 2022", "Jan 2023", "Mar 2023", "May 2023", 
"Jul 2023"), class = "factor")), row.names = c(NA, -40L), class = c("tbl_df", 
"tbl", "data.frame"))

I would like to have dynamic code which filters all the rows of the last and second last month in my dataframe (so i.e all of the rows that have May 2023 and Jul 2023 in my case). I’m able to filter all the rows with Jul 2023 using last, but is there a way to tweak my code to filter the second last month (May 2023) as well?

library(tidyverse)

data %>% 
  filter(Version == last(Version))

>Solution :

We can get the tail of n = 2 of your Version column and use it in filter. Since there are two entries that we are comparing, we need the %in% operator instead of ==.

library(dplyr)

data |> filter(Version %in% (tail(levels(data$Version), n = 2)))

# A tibble: 14 x 1
   Version 
   <fct>   
 1 May 2023
 2 May 2023
 3 May 2023
 4 May 2023
 5 May 2023
 6 May 2023
 7 May 2023
 8 Jul 2023
 9 Jul 2023
10 Jul 2023
11 Jul 2023
12 Jul 2023
13 Jul 2023
14 Jul 2023