How do obtain max and mean values from different columns with a Date column condition in R?

I am trying to get the daily mean and maximum values for each pollutant. For each of the monitoring stations, the daily mean and maximum values of each pollutant shall be calculated.

Here is a sample of my dataframe:

df <- data.frame (Station  = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 10),
                  Date = c("31/12/2018", "31/12/2018", "30/12/2018", "1/12/2018", "4/3/2018", "16/3/2018", "16/3/2018", "5/2/2018", "31/12/2018", "6/2/2018", "6/2/2018", "4/3/2018"),
                  NO2 = c(40,55,52,58,76,98,12,14,23,54,53,76),
                  O3 = c(13,12,45,87,72,54,88,102,63,52,64,53))

This is the expected dataframe:

df2 <- data.frame (Station  = c(1, 1, 2, 2, 2, 3, 3, 3, 4, 10),
                    Date = c("31/12/2018",  "30/12/2018", "1/12/2018", "4/3/2018", "16/3/2018", "16/3/2018", "5/2/2018", "31/12/2018", "6/2/2018", "4/3/2018"),
                    MAX_NO2 = c(55,52,58,76,98,12,14,23,54,76),
                    MEAN_NO2 = c(47.5,52,58,76,98,12,14,23,53.5,76),
                    MAX_O3 = c(13,45,87,72,54,88,102,63,64,53),
                    MEAN_O3 = c(12.5,45,87,72,54,88,102,63,58,53))

This is the code I found in S.O.. I tried that but an error occurred :

library(dplyr)

df %>%
group_by(Station,Date) %>%
mutate(max.O3 = max(O3), max.NO2 = max(NO2)) %>%
ungroup() %>%
mutate(max.O3 = case_when(O3 == max.O3 ~ max.O3, TRUE ~ NA_real_), max.NO2 = case_when(NO2 == max.NO2 ~ max.NO2, TRUE ~ NA_real_)) %>%
filter(!is.na(max.O3) | !is.na(max.NO2)) %>%
select(-O3,-NO2)

How can I get the desired output? I first thought of using a for loop for the iteration of values, but I don’t know if this is the right thing to do. Something like:

for i in df$Station{
   for date in df$date{
      Max_NO2 = max(a)
      Mean_NO2 = mean(a)
      ...
   }
}

Thanks in advance!

>Solution :

We can first convert to date format, then we can summarise across the 2 columns to get the mean and max for each group and column. Then, we can use the .names argument to define the column names.

library(tidyverse)
    
df %>%
  mutate(Date = as.Date(Date, "%d/%m/%Y")) %>% 
  group_by(Station, Date) %>%
  summarise(across(NO2:O3, list(MAX = max, MEAN = mean), .names = "{.fn}_{.col}")) %>% 
  ungroup

Output

   Station Date       MAX_NO2 MEAN_NO2 MAX_O3 MEAN_O3
     <dbl> <date>       <dbl>    <dbl>  <dbl>   <dbl>
 1       1 2018-12-30      52     52       45    45  
 2       1 2018-12-31      55     47.5     13    12.5
 3       2 2018-03-04      76     76       72    72  
 4       2 2018-03-16      98     98       54    54  
 5       2 2018-12-01      58     58       87    87  
 6       3 2018-02-05      14     14      102   102  
 7       3 2018-03-16      12     12       88    88  
 8       3 2018-12-31      23     23       63    63  
 9       4 2018-02-06      54     53.5     64    58  
10      10 2018-03-04      76     76       53    53 

Leave a Reply