Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I use the lag function to skip over rows and provide number of previous occurrences

I have a dataframe with the first three columns ‘Person’, ‘NoShow’, and ‘date’. I want to create the two additional columns ‘prior_noshow’ and ‘prior_noshow_f’, but I cannot figure out how to do this using the lag function.

   Person NoShow       date prior_noshow prior_noshow_f
1     123      0 2019-01-01           NA             NA
2     123      1 2019-04-01            0              0
3     123      0 2020-01-01            1              1
4     123      1 2021-01-01            1              1
5     123      0 2021-04-30            1              2
6     123      0 2022-01-05            1              2
7     334      0 2019-07-07           NA             NA
8     334      1 2019-10-11            0              0
9     334      1 2020-07-07            1              1
10    334      0 2021-01-04            1              2

Column ‘prior_noshow’ should show whether a person missed a previous appointment. So, person 123 had their first noshow on 2019-04-01, and I want every following row for person 123 to have prior_noshow==1.

Column ‘prior_noshow_f" should show the frequency of previously missed appointments. So, person 123 should have prior_noshow_f==1 for dates 2020-01-01 and 2021-01-01 and prior_noshow_f==2 for dates 2021-04-30 and 2022-01-05.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Code to create the dataset:

df <- data.frame(Person = c('123','123','123','123','123','123','334','334','334','334'), 
                   NoShow = c(0,1,0,1,0,0,0,1,1,0), 
                   date = c('2019-01-01','2019-04-01','2020-01-01','2021-01-01','2021-04-30','2022-01-05','2019-07-07','2019-10-11','2020-07-07','2021-01-04')
                   )

I’ve tried different approaches I saw in previous discussions here regarding the lag-function, but I haven’t able to figure this out.

Any help would be greatly appreciated!

>Solution :

You need to lag the cumsum of NoShow to get prior_noshow_f, and prior_noshow is just as.numeric(prior_noshow_f > 0), so you could do:

library(tidyverse)

df %>% 
  group_by(Person) %>%
  mutate(prior_noshow = lag(as.numeric(cumsum(NoShow) > 0)),
         prior_noshow_f = lag(cumsum(NoShow)))
#> # A tibble: 10 x 5
#> # Groups:   Person [2]
#>    Person NoShow date       prior_noshow prior_noshow_f
#>     <int>  <int> <chr>             <dbl>          <int>
#>  1    123      0 2019-01-01           NA             NA
#>  2    123      1 2019-04-01            0              0
#>  3    123      0 2020-01-01            1              1
#>  4    123      1 2021-01-01            1              1
#>  5    123      0 2021-04-30            1              2
#>  6    123      0 2022-01-05            1              2
#>  7    334      0 2019-07-07           NA             NA
#>  8    334      1 2019-10-11            0              0
#>  9    334      1 2020-07-07            1              1
#> 10    334      0 2021-01-04            1              2

Created on 2022-08-22 with reprex v2.0.2

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading