Home creating new column based on other grouped variables and multiple conditions

Questions

creating new column based on other grouped variables and multiple conditions

January 11, 2022

suppose I have following data:

df1<- data.frame(province= c(1,1,2,3,3,3,4,4,4,4,4,5,5,5),year= c(2001,2001,2001,2001,2001,2001,2002,2002,2003,2003,2003,2004,2005,2005),
                 residence= c(1,1,1,2,2,2,1,1,1,2,2,2,2,2),marriage= c(1,2,2,1,2,1,1,1,2,1,1,1,2,1),count=c(4,1,3,5,3,2,2,3,2,1,2,4,2,5))

in my data marriage = 1 is ever-married and marriage = 2 is never-married. the proportion of ever-married can be estimated by column count: ever-married / ever-married + never-married

what I want is estimating the proportion of ever-married based on columns province, year and residence and two conditions:
1- if there is no ever-married based on three columns, the proportion would be 0
2- if there is no never-married based on three columns, the proportion would be 100.

my expected output would be like this:

    province year  residence    sub
        1   2001         1    0.80
        2   2001         1    0.00
        3   2001         2    0.70
        4   2002         1  100.00
        4   2003         1    0.00
        4   2003         2  100.00
        5   2004         2  100.00
        5   2005         2    0.71

thank you in advance.

>Solution :

We group by ‘province’, ‘year’, ‘residence’, create a condition based on if/else when ‘marriage’ values 1, 2 are not present, then return 0, 100 respectively or else get the ‘count’ values that correspond to ‘marriage’ value of 1, divide by the sum of ‘count’ and then sum the proportions

library(dplyr)
df1 %>% 
   group_by(province, year, residence)  %>% 
   summarise(sub = if(!any(marriage == 1)) 0 
     else if(!any(marriage == 2)) 100 else 
      sum(count[marriage == 1]/sum(count)), .groups = 'drop')

-output

# A tibble: 8 × 4
  province  year residence     sub
     <dbl> <dbl>     <dbl>   <dbl>
1        1  2001         1   0.8  
2        2  2001         1   0    
3        3  2001         2   0.7  
4        4  2002         1 100    
5        4  2003         1   0    
6        4  2003         2 100    
7        5  2004         2 100    
8        5  2005         2   0.714