Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do obtain max and mean values from different columns with a Date column condition in R?

I am trying to get the daily mean and maximum values for each pollutant. For each of the monitoring stations, the daily mean and maximum values of each pollutant shall be calculated.

Here is a sample of my dataframe:

df <- data.frame (Station  = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 10),
                  Date = c("31/12/2018", "31/12/2018", "30/12/2018", "1/12/2018", "4/3/2018", "16/3/2018", "16/3/2018", "5/2/2018", "31/12/2018", "6/2/2018", "6/2/2018", "4/3/2018"),
                  NO2 = c(40,55,52,58,76,98,12,14,23,54,53,76),
                  O3 = c(13,12,45,87,72,54,88,102,63,52,64,53))

This is the expected dataframe:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df2 <- data.frame (Station  = c(1, 1, 2, 2, 2, 3, 3, 3, 4, 10),
                    Date = c("31/12/2018",  "30/12/2018", "1/12/2018", "4/3/2018", "16/3/2018", "16/3/2018", "5/2/2018", "31/12/2018", "6/2/2018", "4/3/2018"),
                    MAX_NO2 = c(55,52,58,76,98,12,14,23,54,76),
                    MEAN_NO2 = c(47.5,52,58,76,98,12,14,23,53.5,76),
                    MAX_O3 = c(13,45,87,72,54,88,102,63,64,53),
                    MEAN_O3 = c(12.5,45,87,72,54,88,102,63,58,53))

This is the code I found in S.O.. I tried that but an error occurred :

library(dplyr)

df %>%
group_by(Station,Date) %>%
mutate(max.O3 = max(O3), max.NO2 = max(NO2)) %>%
ungroup() %>%
mutate(max.O3 = case_when(O3 == max.O3 ~ max.O3, TRUE ~ NA_real_), max.NO2 = case_when(NO2 == max.NO2 ~ max.NO2, TRUE ~ NA_real_)) %>%
filter(!is.na(max.O3) | !is.na(max.NO2)) %>%
select(-O3,-NO2)

How can I get the desired output? I first thought of using a for loop for the iteration of values, but I don’t know if this is the right thing to do. Something like:

for i in df$Station{
   for date in df$date{
      Max_NO2 = max(a)
      Mean_NO2 = mean(a)
      ...
   }
}

Thanks in advance!

>Solution :

We can first convert to date format, then we can summarise across the 2 columns to get the mean and max for each group and column. Then, we can use the .names argument to define the column names.

library(tidyverse)
    
df %>%
  mutate(Date = as.Date(Date, "%d/%m/%Y")) %>% 
  group_by(Station, Date) %>%
  summarise(across(NO2:O3, list(MAX = max, MEAN = mean), .names = "{.fn}_{.col}")) %>% 
  ungroup

Output

   Station Date       MAX_NO2 MEAN_NO2 MAX_O3 MEAN_O3
     <dbl> <date>       <dbl>    <dbl>  <dbl>   <dbl>
 1       1 2018-12-30      52     52       45    45  
 2       1 2018-12-31      55     47.5     13    12.5
 3       2 2018-03-04      76     76       72    72  
 4       2 2018-03-16      98     98       54    54  
 5       2 2018-12-01      58     58       87    87  
 6       3 2018-02-05      14     14      102   102  
 7       3 2018-03-16      12     12       88    88  
 8       3 2018-12-31      23     23       63    63  
 9       4 2018-02-06      54     53.5     64    58  
10      10 2018-03-04      76     76       53    53 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading