Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

creating new column based on other grouped variables and multiple conditions

suppose I have following data:

df1<- data.frame(province= c(1,1,2,3,3,3,4,4,4,4,4,5,5,5),year= c(2001,2001,2001,2001,2001,2001,2002,2002,2003,2003,2003,2004,2005,2005),
                 residence= c(1,1,1,2,2,2,1,1,1,2,2,2,2,2),marriage= c(1,2,2,1,2,1,1,1,2,1,1,1,2,1),count=c(4,1,3,5,3,2,2,3,2,1,2,4,2,5))

in my data marriage = 1 is ever-married and marriage = 2 is never-married. the proportion of ever-married can be estimated by column count: ever-married / ever-married + never-married

what I want is estimating the proportion of ever-married based on columns province, year and residence and two conditions:
1- if there is no ever-married based on three columns, the proportion would be 0
2- if there is no never-married based on three columns, the proportion would be 100.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

my expected output would be like this:

    province year  residence    sub
        1   2001         1    0.80
        2   2001         1    0.00
        3   2001         2    0.70
        4   2002         1  100.00
        4   2003         1    0.00
        4   2003         2  100.00
        5   2004         2  100.00
        5   2005         2    0.71

thank you in advance.

>Solution :

We group by ‘province’, ‘year’, ‘residence’, create a condition based on if/else when ‘marriage’ values 1, 2 are not present, then return 0, 100 respectively or else get the ‘count’ values that correspond to ‘marriage’ value of 1, divide by the sum of ‘count’ and then sum the proportions

library(dplyr)
df1 %>% 
   group_by(province, year, residence)  %>% 
   summarise(sub = if(!any(marriage == 1)) 0 
     else if(!any(marriage == 2)) 100 else 
      sum(count[marriage == 1]/sum(count)), .groups = 'drop')

-output

# A tibble: 8 × 4
  province  year residence     sub
     <dbl> <dbl>     <dbl>   <dbl>
1        1  2001         1   0.8  
2        2  2001         1   0    
3        3  2001         2   0.7  
4        4  2002         1 100    
5        4  2003         1   0    
6        4  2003         2 100    
7        5  2004         2 100    
8        5  2005         2   0.714
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading