Advertisements
I would like to calculate the mean value of the ratio [op/(tr - lag(tr))]
for each year for the following data
structure(list(year = c("1984", "1985", "1986", "1987", "1988",
"1985", "1986", "1987", "1988", "1985", "1986", "1986", "1987",
"1988"), id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4), op = c(10,
20, 30, 40, 50, 15, 17, 18, 19, 20, 22, 10, 20, 40), tr = c(10,
20, 30, 40, 50, 15, 17, 18, 19, 20, 22, 10, 20, 40)), class = "data.frame", row.names = c(NA,
-14L))
I would like an answer using dplyr
>Solution :
Interpretation:
tr(-1)
is a lag, ordered by year, grouped byid
[op/tr - tr(-1)]
is actuallyop/(tr - lag(tr))
(grouping and naming ‘lag’)
This may be broken, but it’s a first attempt:
library(dplyr)
quux %>%
group_by(id) %>%
mutate(lagtr = lag(tr)) %>%
group_by(year) %>%
summarize(ans = mean(op / (tr - lagtr), na.rm = TRUE), .groups = "drop")
# # A tibble: 5 × 2
# year ans
# <chr> <dbl>
# 1 1984 NaN
# 2 1985 2
# 3 1986 7.5
# 4 1987 8
# 5 1988 8.67
I’m inferring that the lag(tr)
you reference should be grouped by id
, so I group and calculate that lag first. After that, I group by year and summarize on the ratio.
Data
quux <- structure(list(year = c("1984", "1985", "1986", "1987", "1988", "1985", "1986", "1987", "1988", "1985", "1986", "1986", "1987", "1988"), id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 4), op = c(10, 20, 30, 40, 50, 15, 17, 18, 19, 20, 22, 10, 20, 40), tr = c(10, 20, 30, 40, 50, 15, 17, 18, 19, 20, 22, 10, 20, 40)), class = "data.frame", row.names = c(NA, -14L))