Filling in Missing Values in R Based on Previous Rows within Group

Advertisements

I have a dataset in long-format called df containing four key variables: respondent_id, time (the time period), change (a variable indicating the last period before a change of income) and lost_income, a variable indicating when a person lost their income.

The lost_income column is currently empty, only in rows that follow a row where change == -1 does lost_income take on the value 0, indicating that an individual lost their income at that point.

I want to write R code that continues the count of lost_income into both directions, within each group of respondent_id. That means that for each respondent_id, I want the lost_income column, currently containing 0 or NA, to be filled into both directions so that the row preceding the 0 takes on the value -1, the one before that -2 etc. Similarly, the first row after lost_sen_timing==0 should take on the value 1, the next 2, and so on.

Example Data:

   library(data.table)
    df <- data.table(respondent_id = c(1,1,1,1,1,1,2,2,2,2,2),
                            time = c(1,2,3,4,5,6,1,2,3,4,5),
                            change = c(0,0,0,-1,0,0,0,-1,0,0,0),
                            lost_income= c(NA,NA,NA,NA,0,NA,NA,NA,0,NA,NA))

I have tried using fill() function from the tidyr package but it only changes the first row after a non-missing value of lost_income. I want to adapt the code so that it changes, for example, the sequence of respondent 1 from NA,NA,NA,NA,0,NA to -4,-3,-2,-1,0,1 instead of NA,NA,NA,NA,0,1.

How can I fill in missing values in R based on previous rows within group to get the desired lost_income column as shown below?

Desired output:

respondent_id time change   lost_income
             1    1      0            -4
             1    2      0            -3
             1    3      0            -2
             1    4     -1            -1
             1    5      0             0
             1    6      0             1
             2    1      0            -2
             2    2     -1            -1
             2    3      0             0
             2    4      0             1
             2    5      0             2

Any suggestions on how to achieve this in R would be greatly appreciated. Thank you in advance!

>Solution :

library(dplyr)
df %>%
  left_join(
    df %>%
      filter(lost_income == 0) %>%
      select(respondent_id, time0 = time)
  ) %>%
  mutate(lost_income = time - time0)

Result

Joining with `by = join_by(respondent_id)`
# A tibble: 11 × 5
   respondent_id  time change lost_income time0
           <dbl> <dbl>  <dbl>       <dbl> <dbl>
 1             1     1      0          -4     5
 2             1     2      0          -3     5
 3             1     3      0          -2     5
 4             1     4     -1          -1     5
 5             1     5      0           0     5
 6             1     6      0           1     5
 7             2     1      0          -2     3
 8             2     2     -1          -1     3
 9             2     3      0           0     3
10             2     4      0           1     3
11             2     5      0           2     3

Or, using dplyr 1.1.0:

df %>%
  mutate(lost_income = time - time[lost_income %in% 0], .by = respondent_id)

Result

# A tibble: 11 × 4
   respondent_id  time change lost_income
           <dbl> <dbl>  <dbl>       <dbl>
 1             1     1      0          -4
 2             1     2      0          -3
 3             1     3      0          -2
 4             1     4     -1          -1
 5             1     5      0           0
 6             1     6      0           1
 7             2     1      0          -2
 8             2     2     -1          -1
 9             2     3      0           0
10             2     4      0           1
11             2     5      0           2

Leave a Reply Cancel reply