Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Creating a conditional dummy variable column in R

I’m working with a cross-country panel dataset, one of my variables (cc_dummy) takes the value of 1 & 0 (there are also missing values indicated by NAs). I want to create a new column such that if cc_dummy takes the value 1 for three consecutive years for each country, I want only the first year of each three-year window to take the value 1 whereas the other years take the value 0. A snapshot of my data is given below. Any help would be appreciated. If possible, I would like to create the column using dplyr.

structure(list(country = c("Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Argentina", "Argentina", 
"Argentina", "Argentina", "Argentina", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", "Brazil", 
"Brazil", "Brazil"), year = c(1975, 1976, 1977, 1978, 1979, 1980, 
1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 
1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 
2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 
2014, 2015, 2016, 2017, 2018, 2019, 2020, 1975, 1976, 1977, 1978, 
1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 
1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 
2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 
2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021), 
    cc_dummy = c(NA, NA, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 
    1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, NA, NA, 0, 0, 1, 0, 
    1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 
    0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
    0, 0, 0)), row.names = c(NA, -93L), class = c("tbl_df", "tbl", 
"data.frame"))

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here’s a way to do it using data.table::rleid and dplyr grammar:

library(dplyr)
df %>% 
  group_by(country, m = data.table::rleid(cc_dummy)) %>% 
  mutate(newcol = ifelse(n() >= 3 & cumsum(cc_dummy) == 1, 1, 0))

# A tibble: 93 x 5
# Groups:   country, m [32]
   country    year cc_dummy     m newcol
   <chr>     <dbl>    <dbl> <int>  <dbl>
 1 Argentina  1975       NA     1      0
 2 Argentina  1976       NA     1      0
 3 Argentina  1977        0     2      0
 4 Argentina  1978        0     2      0
 5 Argentina  1979        0     2      0
 6 Argentina  1980        0     2      0
 7 Argentina  1981        1     3      0
 8 Argentina  1982        1     3      0
 9 Argentina  1983        0     4      0
10 Argentina  1984        1     5      0
11 Argentina  1985        0     6      0
12 Argentina  1986        0     6      0
13 Argentina  1987        1     7      1
14 Argentina  1988        1     7      0
15 Argentina  1989        1     7      0
16 Argentina  1990        0     8      0
17 Argentina  1991        0     8      0
18 Argentina  1992        0     8      0
19 Argentina  1993        0     8      0
20 Argentina  1994        0     8      0
21 Argentina  1995        0     8      0
22 Argentina  1996        0     8      0
23 Argentina  1997        0     8      0
24 Argentina  1998        0     8      0
25 Argentina  1999        0     8      0
26 Argentina  2000        0     8      0
27 Argentina  2001        0     8      0
28 Argentina  2002        1     9      0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading