I have a dataset in long-format called df containing four key variables: respondent_id, time (the time period), change (a variable indicating the last period before a change of income) and lost_income, a variable indicating when a person lost their income.
The lost_income column is currently empty, only in rows that follow a row where change == -1 does lost_income take on the value 0, indicating that an individual lost their income at that point.
I want to write R code that continues the count of lost_income into both directions, within each group of respondent_id. That means that for each respondent_id, I want the lost_income column, currently containing 0 or NA, to be filled into both directions so that the row preceding the 0 takes on the value -1, the one before that -2 etc. Similarly, the first row after lost_sen_timing==0 should take on the value 1, the next 2, and so on.
Example Data:
library(data.table)
df <- data.table(respondent_id = c(1,1,1,1,1,1,2,2,2,2,2),
time = c(1,2,3,4,5,6,1,2,3,4,5),
change = c(0,0,0,-1,0,0,0,-1,0,0,0),
lost_income= c(NA,NA,NA,NA,0,NA,NA,NA,0,NA,NA))
I have tried using fill()
function from the tidyr package but it only changes the first row after a non-missing value of lost_income. I want to adapt the code so that it changes, for example, the sequence of respondent 1 from NA,NA,NA,NA,0,NA
to -4,-3,-2,-1,0,1
instead of NA,NA,NA,NA,0,1
.
How can I fill in missing values in R based on previous rows within group to get the desired lost_income column as shown below?
Desired output:
respondent_id time change lost_income
1 1 0 -4
1 2 0 -3
1 3 0 -2
1 4 -1 -1
1 5 0 0
1 6 0 1
2 1 0 -2
2 2 -1 -1
2 3 0 0
2 4 0 1
2 5 0 2
Any suggestions on how to achieve this in R would be greatly appreciated. Thank you in advance!
>Solution :
library(dplyr)
df %>%
left_join(
df %>%
filter(lost_income == 0) %>%
select(respondent_id, time0 = time)
) %>%
mutate(lost_income = time - time0)
Result
Joining with `by = join_by(respondent_id)`
# A tibble: 11 × 5
respondent_id time change lost_income time0
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 -4 5
2 1 2 0 -3 5
3 1 3 0 -2 5
4 1 4 -1 -1 5
5 1 5 0 0 5
6 1 6 0 1 5
7 2 1 0 -2 3
8 2 2 -1 -1 3
9 2 3 0 0 3
10 2 4 0 1 3
11 2 5 0 2 3
Or, using dplyr 1.1.0:
df %>%
mutate(lost_income = time - time[lost_income %in% 0], .by = respondent_id)
Result
# A tibble: 11 × 4
respondent_id time change lost_income
<dbl> <dbl> <dbl> <dbl>
1 1 1 0 -4
2 1 2 0 -3
3 1 3 0 -2
4 1 4 -1 -1
5 1 5 0 0
6 1 6 0 1
7 2 1 0 -2
8 2 2 -1 -1
9 2 3 0 0
10 2 4 0 1
11 2 5 0 2