I have a data frame containing dates and associated values for three groups as follows.
library(lubridate)
library(magrittr)
library(dplyr)
data <-
data.frame(group = rep(c("a", "b", "c"), each = 5),
date = rep(seq(ymd(20200101), ymd(20200105), by = 1),
times = 3),
value = runif(15))
I would like to perform a grouped filter operation in order to extract subsets of the data by date, where the start date varies by group.
start_date <- list(a = ymd(20200102), b = ymd(20200104), c = ymd(20200103))
I’d like to use the group name to index into the start date list. I try to do the operation as follows, but I get an error message.
data %>%
group_by(group) %>%
filter(date >= start_date[[group]])
Error in app$vspace(new_style$`margin-top` %||% 0) :
attempt to apply non-function
What am I doing wrong?
(The alternative to the above procedure is to set the start dates up as a data frame, join the data frame to data, and perform an ungrouped filter. This works as intended, but it’s less elegant (in my opinion) and I’d prefer to keep the start dates as a list for other reasons.)
start_date_2 <-
data.frame(group = c("a", "b", "c"),
start_date = c(ymd(20200102), ymd(20200104), c = ymd(20200103)))
data %>%
left_join(start_date_2, by = "group") %>%
filter(date >= start_date) %>%
select(-start_date)
>Solution :
We can access the name of the current group(s) by using cur_group(). This gives us a tibble with every group being a column containing the current group name, so we need to subset it with $group.
library(dplyr)
data %>%
group_by(group) %>%
filter(date >= start_date[[cur_group()$group]])
#> # A tibble: 9 x 3
#> # Groups: group [3]
#> group date value
#> <chr> <date> <dbl>
#> 1 a 2020-01-02 0.225
#> 2 a 2020-01-03 0.345
#> 3 a 2020-01-04 0.110
#> 4 a 2020-01-05 0.0951
#> 5 b 2020-01-04 0.356
#> 6 b 2020-01-05 0.345
#> 7 c 2020-01-03 0.0973
#> 8 c 2020-01-04 0.344
#> 9 c 2020-01-05 0.418
Created on 2022-03-29 by the reprex package (v0.3.0)