Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create Duration Variable In R

Imagine I have a dataset with observations for a number of individuals across multiple years. Individuals can be in one of two statuses each year, A or B. I have data for which status each individual was in each year and created a dummy variable Status_change which is equal to 1 if status in the current year is different from the one last year. So my data currently looks something like:

Individual| Year | Status | Status_change |
-------------------------------------------
    1     |  1   |   A    |      NA       |
    1     |  2   |   A    |      0        |
    1     |  3   |   A    |      0        |
    1     |  4   |   B    |      1        |

What I want is to create a new variable which measures how long the individual has remained in the same status – let’s call it Duration. In the context of the above example, it would look something like:

Individual| Year | Status | Status_change | Duration |
------------------------------------------------------
    1     |  1   |   A    |      NA       |     0    |
    1     |  2   |   A    |      0        |     1    |
    1     |  3   |   A    |      0        |     2    |
    1     |  4   |   B    |      1        |     0    | 

Essentially, I am looking for a variable which is initially 0 for all individuals in year 1 and grows by 1 unit each period as long as the status remains the same. If the status switches, the variable takes the value 0 again and the whole thing starts over. So far I have attempted:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data%>%
  group_by(Individual)%>%
  arrange(Year, .by_group = TRUE)%>%
  mutate(Duration = ifelse(Year == 1, 0, ifelse(Status_Change == 1, 0, lag(Duration) + 1)))

But this gives me an error:

Error: Problem with `mutate()` column `Duration`.
i `Duration = ifelse(Year == 1, 0, ifelse(Status_Change == 1, 0, lag(Duration) + 1))`.
x could not find function "Duration"
i The error occurred in group 1: Individual = "1"

I would greatly appreciate any help you can give me! Thanks in advance!

>Solution :

This should do it:

library(dplyr)

data |>
  group_by(Individual) |>
  arrange(Year, .by_group = TRUE) |>
  ungroup() |> 
  mutate(
         ## Replace the initial NA in Status Change,
         ## which will break this code.
         Status_Change = tidyr::replace_na(Status_Change, 0),
         ## Create a variable that increases by one every time
         ## the status changes.
         Status_State  = cumsum(Status_Change)) |>
  ## Duration is just the current row_number() for each state of
  ## individual
  group_by(Individual, Status_State) |> 
  mutate(Duration = row_number()) |> 
  ungroup()

Note that we can’t just group by individual and state – we need to create an intermediary variable that maps changes in state, so that transitions from A to B and back to A again are treated as three states, rather than 2.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading