I’m trying to do a year-to-year comparison for each player in a dataframe so I can plot Year 1 vs Year 2 in a scatterplot on the x and y axis’ respectively. I think I need to use a for loop with a nested if_else statement but I feel like that may be overcomplicating things.
Here is what I have:
| player |season|average|
| ------ | ---- |-------|
| A.Lee | 2019 | 45 |
| A.Lee | 2020 | 40 |
| A.Lee | 2021 | 46 |
| A.Cole | 2020 | 39 |
| A.Cole | 2021 | 42 |
Here are the desired results:
| player | Y1 | Y2 |
| ------ | ---- | ------- |
| A.Lee | 45 | 40 |
| A.Lee | 40 | 46 |
| A.Cole | 39 | 42 |
>Solution :
Grouped by ‘player’, get the lead values of ‘average’ as ‘Y1’ and ‘average’ as ‘Y2’ within summarise and remove the NA elements (by default the default option in lead/lag are NA)
library(dplyr)
df %>%
arrange(player, desc(season)) %>%
group_by(player) %>%
summarise(Y1 = lead(average), Y2 = average, .groups = 'drop') %>%
na.omit
-output
# A tibble: 3 × 3
player Y1 Y2
<chr> <int> <int>
1 A.Cole 39 42
2 A.Lee 40 46
3 A.Lee 45 40
data
df <- structure(list(player = c("A.Lee", "A.Lee", "A.Lee", "A.Cole",
"A.Cole"), season = c(2019, 2020, 2021, 2020, 2021), average = c(45L,
40L, 46L, 39L, 42L)), row.names = c(NA, -5L), class = "data.frame")