Given a df with many individuals pear year calculate the mean

Advertisements

I would like to calculate the mean value of the ratio [op/(tr - lag(tr))] for each year for the following data

structure(list(year = c("1984", "1985", "1986", "1987", "1988", 
"1985", "1986", "1987", "1988", "1985", "1986", "1986", "1987", 
"1988"), id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4), op = c(10, 
20, 30, 40, 50, 15, 17, 18, 19, 20, 22, 10, 20, 40), tr = c(10, 
20, 30, 40, 50, 15, 17, 18, 19, 20, 22, 10, 20, 40)), class = "data.frame", row.names = c(NA, 
-14L))

I would like an answer using dplyr

>Solution :

Interpretation:

  • tr(-1) is a lag, ordered by year, grouped by id
  • [op/tr - tr(-1)] is actually op/(tr - lag(tr)) (grouping and naming ‘lag’)

This may be broken, but it’s a first attempt:

library(dplyr)
quux %>%
  group_by(id) %>%
  mutate(lagtr = lag(tr)) %>%
  group_by(year) %>%
  summarize(ans = mean(op / (tr - lagtr), na.rm = TRUE), .groups = "drop")
# # A tibble: 5 × 2
#   year     ans
#   <chr>  <dbl>
# 1 1984  NaN   
# 2 1985    2   
# 3 1986    7.5 
# 4 1987    8   
# 5 1988    8.67

I’m inferring that the lag(tr) you reference should be grouped by id, so I group and calculate that lag first. After that, I group by year and summarize on the ratio.


Data

quux <- structure(list(year = c("1984", "1985", "1986", "1987", "1988", "1985", "1986", "1987", "1988", "1985", "1986", "1986", "1987", "1988"), id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4), op = c(10, 20, 30, 40, 50, 15, 17, 18, 19, 20, 22, 10, 20, 40), tr = c(10, 20, 30, 40, 50, 15, 17, 18, 19, 20, 22, 10, 20, 40)), class = "data.frame", row.names = c(NA, -14L))

Leave a Reply Cancel reply