Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

adding column to dataframe using groupby on Date column

Existing Dataframe :

Sr.No        Date         Tag    score
01         10-02-2022    pass     10
02         10-02-2022    fail     5
03         10-02-2022    pass     10
04         11-02-2022    grace    3
05         11-02-2022    pass     15
06         11-02-2022    pass     15

Expected Dataframe :

Sr.No   Date       Tag  score no_of_records pass_count fail_count grace_count pass_score_total 
01    10-02-2022  pass   10         3           2          1          0             20
02    10-02-2022  fail   5          
03    10-02-2022  pass   10
04    11-02-2022  grace  3          3           2          0          1             30
05    11-02-2022  pass   15
06    11-02-2022  pass   15

no_of_records implies the no. of records for particular day.
how to proceed with groupby on Date to get the desired output..?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

First get sums for pass in s Series by aggregate sum, then get counts by Series.value_counts, join with crosstab:

s = df['score'].where(df['Tag'].eq('pass'), 0).groupby(df['Date']).sum()
                .rename('pass_score_total'))
s1 = df['Date'].value_counts().rename('no_of_records')

df1 = pd.concat([s1, pd.crosstab(df['Date'], df['Tag']).add_suffix('_count'), s], axis=1)
df = df.join(df1, on='Date')
print (df)
   Sr.No        Date    Tag  score  no_of_records  fail_count  grace_count  \
0      1  10-02-2022   pass     10              3           1            0   
1      2  10-02-2022   fail      5              3           1            0   
2      3  10-02-2022   pass     10              3           1            0   
3      4  11-02-2022  grace      3              3           0            1   
4      5  11-02-2022   pass     15              3           0            1   
5      6  11-02-2022   pass     15              3           0            1   

   pass_count  pass_score_total  
0           2                20  
1           2                20  
2           2                20  
3           2                30  
4           2                30  
5           2                30  

If need empty strings:

s = df['score'].where(df['Tag'].eq('pass'), 0).groupby(df['Date']).sum().rename('pass_score_total')
s1 = df['Date'].value_counts().rename('no_of_records')

df1 = pd.concat([s1, pd.crosstab(df['Date'], df['Tag']).add_suffix('_count'), s], axis=1)
df = df.join(df1, on='Date')
df[df1.columns] = df[df1.columns].mask(df['Date'].duplicated(), '', axis=1)
print (df)
   Sr.No        Date    Tag  score no_of_records fail_count grace_count  \
0      1  10-02-2022   pass     10             3          1           0   
1      2  10-02-2022   fail      5                                        
2      3  10-02-2022   pass     10                                        
3      4  11-02-2022  grace      3             3          0           1   
4      5  11-02-2022   pass     15                                        
5      6  11-02-2022   pass     15                                        

  pass_count pass_score_total  
0          2               20  
1                              
2                              
3          2               30  
4                              
5                              
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading