Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

repeating row until specific value is seen or reached

I am doing survival analysis using R and need repeating row until new value is seen.
here is my data frame:

df<- data.frame(province=c(10,10,10,10,10,10,10,10,12,12,12,12,12,12,12,12), 
                 year=c(2000,2000,2001,2001,2001,2002,2002,2002,2000,2000,2000,2001,2001,2002,2002,2002), 
                 residence=c(1,1,1,1,2,1,1,2,1,2,1,1,2,1,2,1), 
                edu=c(1,2,1,2,3,1,2,3,2,1,3,2,1,2,1,3), 
                pro=c(0,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0))

what I want is repeating row grouped by province , residence and edu until pro reach to 1. for some row which do not reach to 1, row repeated for all years (in my case from 2000 to 2002) . it seems I can do this by a while loop but I do not know the procedure.
my expected output would be like this:

    province residence   edu   pro  year
      <dbl>     <dbl> <dbl> <dbl> <dbl>
 1       10         1     1     0  2000
 2       10         1     1     0  2001
 3       10         1     1     0  2002
 4       10         1     2     0  2000
 5       10         1     2     0  2001
 6       10         1     2     1  2002
 7       10         2     3     1  2001
 8       12         1     2     1  2000
 9       12         2     1     0  2000
10       12         2     1     0  2001
11       12         2     1     1  2002
12       12         1     3     0  2000
13       12         1     3     0  2001
14       12         1     3     0  2002

thank you in advance.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Perhaps I’m misinterpreting. If your first frame with 16 rows is truly the original data, and you’re trying to get to the second frame with 14 rows, then this method works.

df %>%
  select(-pro) %>%
  group_by(province, residence, edu) %>%
  summarize(year = setdiff(min(year):max(year), year)) %>%
  bind_rows(df) %>%
  arrange(province, residence, edu, year) %>%
  tidyr::fill(pro) %>%
  filter(!cumany(lag(pro == 1, default = FALSE))) %>%
  ungroup()
# # A tibble: 14 x 5
#    province residence   edu  year   pro
#       <dbl>     <dbl> <dbl> <dbl> <dbl>
#  1       10         1     1  2000     0
#  2       10         1     1  2001     0
#  3       10         1     1  2002     0
#  4       10         1     2  2000     0
#  5       10         1     2  2001     0
#  6       10         1     2  2002     1
#  7       10         2     3  2001     1
#  8       12         1     2  2000     1
#  9       12         1     3  2000     0
# 10       12         1     3  2001     0
# 11       12         1     3  2002     0
# 12       12         2     1  2000     0
# 13       12         2     1  2001     0
# 14       12         2     1  2002     1

Data

df <- structure(list(province = c(10, 10, 10, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12, 12, 12), year = c(2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002, 2000, 2000, 2000, 2001, 2001, 2002, 2002, 2002), residence = c(1, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1), edu = c(1, 2, 1, 2, 3, 1, 2, 3, 2, 1, 3, 2, 1, 2, 1, 3), pro = c(0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0)), class = "data.frame", row.names = c(NA, -16L))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading