Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create and fill columns based on conditions

I got a contract DataFrame looking like this :

Customer       CONTRACT_ID      PHASING_DATE    SCHEDULED    ARTICLE_CODE
A              C0218            2021-01-01      21           001
A              COZ19            2021-01-01      23           001
A              IUD80            2021-01-01      43           001
A              PAZO1            2021-02-01      12           002
B              DZAP2            2021-01-01      3            003
B              DZAH8            2021-01-01      4            003
B              FGIG0            2021-03-01      5            003
C              SDFH4            2021-01-01      4            004
C              AZFE3            2021-04-01      54           005
C              DAZJ9            2021-04-01      32           005
..

I would like to create a DataFrame based on the number of active contracts like this :

e.g there’s 3 contracts with the same PHASING_DATE, I want 3 columns filled with the scheduled associated to each active contract

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

CUSTOMER  DATE        ARTICLE_CODE    SCHEDULED_CT_1   SCHEDULED_CT_2      SCHEDULED_CT_3    SCHEDULED_CT_4
A         2021-01-01  001             21               23                  43                0
B         2021-01-01  003             3                4                   0                 0
C         2021-01-01  004             4                0                   0                 0
A         2021-02-01  002             12               0                   0                 0
B         2021-03-01  003             5                0                   0                 0
C         2021-04-01  004             54               32                  0                 0       
                                                                                         

>Solution :

Use GroupBy.cumcount for counter, add column to DataFrame and pivoting by pivot with replace missing values, rename columns by DataFrame.add_prefix, some data cleaning and last sorting by both columns:

s = df.groupby(['Customer','PHASING_DATE','ARTICLE_CODE']).cumcount()

df = (df.assign(a=s)
        .pivot(['Customer','PHASING_DATE','ARTICLE_CODE'], 'a', 'SCHEDULED')
        .fillna(0)
        .add_prefix('SCHEDULED_CT_')
        .reset_index()
        .rename_axis(None, axis=1)
        .rename(columns={'PHASING_DATE':'DATE'})
        .sort_values(['DATE','Customer']))
print (df)
  Customer        DATE ARTICLE_CODE SCHEDULED_CT_0 SCHEDULED_CT_1  \
0        A  2021-01-01          001             21             23   
2        B  2021-01-01          003              3              4   
4        C  2021-01-01          004              4              0   
1        A  2021-02-01          002             12              0   
3        B  2021-03-01          003              5              0   
5        C  2021-04-01          005             54             32   

  SCHEDULED_CT_2  
0             43  
2              0  
4              0  
1              0  
3              0  
5              0  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading