combinining multiple summarize calls dplyr

May 20, 2022

Given the df

ww <- data.frame(
    GM = c("A", "A", "A", "A", "A", "A",
            "B", "B", "B", "B", "B", "B",
            "C", "C", "C", "C", "C", "C"),
    stanza = rep(c("Past", "Mid", "End"), 6),   
    change = c(1, 1.1, 1.4, 1, 1.3, 1.5, 1, 1.2, 1.4,
               1.1, 1.2, 1.3, .9, 1.2, 1.3, .9, 1.3, 1.5))

I would like to calculate the mean for Past for each GM and the divide each value in ‘change’ by the GM specific mean. I can do this with two dplyr calls and a join function as follows:

past <- ww %>%
    group_by(GM) %>%
    filter(stanza == "Past") %>%
    summarize(past.mean = mean(change))

ww <- left_join(ww, past, by = "GM")

ww %>%
    group_by(GM, stanza) %>%
    summarize(pr.change = change/past.mean)

But there must be a way to do this in one dplyr call.

>Solution :

No need to join, you can compute this directly in one pipe chain:

ww %>% 
  group_by(GM) %>% 
  mutate(pr.change = change / mean(change[stanza == "Past"])) %>%
  ungroup()

Output

   GM    stanza change pr.change
   <chr> <chr>   <dbl>     <dbl>
 1 A     Past      1       1    
 2 A     Mid       1.1     1.1  
 3 A     End       1.4     1.4  
 4 A     Past      1       1    
 5 A     Mid       1.3     1.3  
 6 A     End       1.5     1.5  
 7 B     Past      1       0.952
 8 B     Mid       1.2     1.14 
 9 B     End       1.4     1.33 
10 B     Past      1.1     1.05 
11 B     Mid       1.2     1.14 
12 B     End       1.3     1.24 
13 C     Past      0.9     1    
14 C     Mid       1.2     1.33 
15 C     End       1.3     1.44 
16 C     Past      0.9     1    
17 C     Mid       1.3     1.44 
18 C     End       1.5     1.67