Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas groupby comparing string value with previous row value and spot changes in new columns

I have this demo df:

info = {'customer': ['Jason', 'Jason', 'Jason', 'Jason',
                     'Molly', 'Molly', 'Molly', 'Molly'], 
'Good': ['Cookie', 'Cookie', 'Cookie', 'Cookie','Ice Cream', 
         'Ice Cream', 'Ice Cream', 'Ice Cream'],
'Date' :['2021-12-14','2022-01-04','2022-01-11','2022-01-18',
         '2022-01-12','2022-01-15','2022-01-19','2022-01-30'],
'Flavor' :['Chocolate','Vanilla','Vanilla','Strawberry',
           'Chocolate', 'Vanilla', 'Caramel', 'Caramel']}
df = pd.DataFrame(data=info)
df

gives:

   customer   Good      Date        Flavor
0   Jason   Cookie      2021-12-14  Chocolate
1   Jason   Cookie      2022-01-04  Vanilla
2   Jason   Cookie      2022-01-11  Vanilla
3   Jason   Cookie      2022-01-18  Strawberry
4   Molly   Ice Cream   2022-01-12  Chocolate
5   Molly   Ice Cream   2022-01-15  Vanilla
6   Molly   Ice Cream   2022-01-19  Caramel
7   Molly   Ice Cream   2022-01-30  Caramel

I am trying to track the change in flavor per customer per Goods in new columns FromTo. I did the grouping part:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   df.sort_values(['Date']).groupby(['customer','Good','Date'])['Flavor'].sum()

I got:

 customer  Good       Date      
    Jason     Cookie     2021-12-14     Chocolate
                         2022-01-04       Vanilla
                         2022-01-11       Vanilla
                         2022-01-18    Strawberry
    Molly     Ice Cream  2022-01-12     Chocolate
                         2022-01-15       Vanilla
                         2022-01-19       Caramel
                         2022-01-30       Caramel
    Name: Flavor, dtype: object

The first row per group is the entry point and then I would like to compare the next change per group, if it is different then we track the change in new columns (from & to) and if similar values nothing happens.

I have tried multiple methods & codes but unfortunately I do not know the best way to do so.

Expected output considering reset_index():

  customer   Good      Date        Flavor           From         To
0   Jason   Cookie      2021-12-14  Chocolate    
1   Jason   Cookie      2022-01-04  Vanilla         Chocolate    Vanilla
2   Jason   Cookie      2022-01-11  Vanilla
3   Jason   Cookie      2022-01-18  Strawberry      Vanilla      Strawberry
4   Molly   Ice Cream   2022-01-12  Chocolate
5   Molly   Ice Cream   2022-01-15  Vanilla         Chocolate    Vanilla
6   Molly   Ice Cream   2022-01-19  Caramel         Vanilla      Caramel
7   Molly   Ice Cream   2022-01-30  Caramel

>Solution :

Building on the sum you’ve created (named g), we can groupby the first 2 levels of the index and shift it, then join it back to g. After rename-ing columns, mask "To" and "From" columns depending on if there was any change or if it’s NaN. Finally, join this back to the DataFrame:

g = df.sort_values(['Date']).groupby(['customer','Good','Date'])['Flavor'].sum()
joined = g.to_frame().assign(To=g).join(g.groupby(level=[0,1]).shift().to_frame(), lsuffix='', rsuffix='_').rename(columns={'Flavor_':'From'})
joined.update(joined[['To','From']].mask(joined['From'].isna() | joined['From'].eq(joined['To']), ''))
out = joined[['Flavor','From','To']].reset_index()

Output:

  customer       Good        Date      Flavor       From          To
0    Jason     Cookie  2021-12-14   Chocolate                       
1    Jason     Cookie  2022-01-04     Vanilla  Chocolate     Vanilla
2    Jason     Cookie  2022-01-11     Vanilla                       
3    Jason     Cookie  2022-01-18  Strawberry    Vanilla  Strawberry
4    Molly  Ice Cream  2022-01-12   Chocolate                       
5    Molly  Ice Cream  2022-01-15     Vanilla  Chocolate     Vanilla
6    Molly  Ice Cream  2022-01-19     Caramel    Vanilla     Caramel
7    Molly  Ice Cream  2022-01-30     Caramel                       
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading