Calculate means beyond a given date R

July 25, 2022

I have a large dataframe (Merged) of several thousand lines and 20 columns. Each line is a record of a site visit. There are 400 sites in the dataframe and each has multiple lines for visits on different months. . Example:

Site	Date	Method	Result	ChangeDate
2200	01/07/2021	24	0.22	01/07/2021
2201	03/07/2021	21	0.01	01/01/2021
2203	03/07/2021	21	0.02	01/01/2021
2200	01/08/2021	21	0.05	01/07/2021
2201	03/08/2021	21	0.01	01/01/2021
2203	03/08/2021	21	0.02	01/01/2021
2200	01/09/2021	21	0.05	01/07/2021

And so on for all 400 sites over two years.

I need to calculate the mean result for each site, which I can do:

OPmean <- aggregate(Merged$Result, list(Merged$Site), FUN=mean, na.rm = TRUE

But I only want to take the means after the change date. So on site 2200 I would want to take the mean on these lines only:

Site	Date	Method	Result	ChangeDate
2200	01/08/2021	21	0.05	01/07/2021
2200	01/09/2021	21	0.05	01/07/2021

Leaving the first line for 2200 off because the date occurs on (or before) the change date.

For an output I would want a new dataframe something like this:

Site	Mean Result
2200	0.05
2201	0.01
2203	0.02

>Solution :

@user2974951 gives a great base R answer. A tidyverse approach might be

library(dplyr) 
Merged %>% 
     group_by(Site) %>% 
     filter(Date > ChangeDate) %>% 
     summarize(mr = mean(Result) # or optionally, mean(Result, na.rm = TRUE)