Given a df with many individuals pear year calculate the mean

Advertisements

I would like to calculate the mean value of the ratio `[op/(tr - lag(tr))]` for each year for the following data

``````structure(list(year = c("1984", "1985", "1986", "1987", "1988",
"1985", "1986", "1987", "1988", "1985", "1986", "1986", "1987",
"1988"), id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4), op = c(10,
20, 30, 40, 50, 15, 17, 18, 19, 20, 22, 10, 20, 40), tr = c(10,
20, 30, 40, 50, 15, 17, 18, 19, 20, 22, 10, 20, 40)), class = "data.frame", row.names = c(NA,
-14L))
``````

I would like an answer using dplyr

>Solution :

Interpretation:

• `tr(-1)` is a lag, ordered by year, grouped by `id`
• `[op/tr - tr(-1)]` is actually `op/(tr - lag(tr))` (grouping and naming ‘lag’)

This may be broken, but it’s a first attempt:

``````library(dplyr)
quux %>%
group_by(id) %>%
mutate(lagtr = lag(tr)) %>%
group_by(year) %>%
summarize(ans = mean(op / (tr - lagtr), na.rm = TRUE), .groups = "drop")
# # A tibble: 5 × 2
#   year     ans
#   <chr>  <dbl>
# 1 1984  NaN
# 2 1985    2
# 3 1986    7.5
# 4 1987    8
# 5 1988    8.67
``````

I’m inferring that the `lag(tr)` you reference should be grouped by `id`, so I group and calculate that lag first. After that, I group by year and summarize on the ratio.

Data

``````quux <- structure(list(year = c("1984", "1985", "1986", "1987", "1988", "1985", "1986", "1987", "1988", "1985", "1986", "1986", "1987", "1988"), id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4), op = c(10, 20, 30, 40, 50, 15, 17, 18, 19, 20, 22, 10, 20, 40), tr = c(10, 20, 30, 40, 50, 15, 17, 18, 19, 20, 22, 10, 20, 40)), class = "data.frame", row.names = c(NA, -14L))
``````