Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to calculate the summation for values based on consecutive days and two other columns

How can I do summation just for consecutive days and for the same name and same supplier?
For instance, for A and Supplier Wal, I need to do summation for 2021-05-31 and 2021-06-01 and then do another summation for 2021-06-08 and 2021-06-09. I need to add a new column for summation. Please take a look at the example below:

enter image description here

Here is the Pandas DataFrame code for the table:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = pd.DataFrame({'Name': ['A', 'A', 'A','A','B','B','C','C','C','C','C','C','C','C','C'],
  'Supplier': ['Wal', 'Wal', 'Wal', 'Wal', 'Co', 'Co', 'Mc', 'Mc', 'St', 'St', 'St', 'St', 'St', 'To', 'To'],
  'Date': ['2021-05-31', '2021-06-01', '2021-06-08', '2021-06-09', '2021-05-17', '2021-05-18'
          , '2021-04-07', '2021-04-08', '2021-05-11', '2021-05-12', '2021-05-13', '2021-05-18'
          , '2021-05-19', '2021-03-30', '2021-03-31'],
  'Amount': [27, 400, 410, 250, 100, 50, 22, 78, 60, 180, 100, 240, 140, 30, 110],
  'Summation': [427,427,660,660,150,150,100,100,340,340,340,380,380,140,140 ]})

>Solution :

Like this?

import pandas as pd

df = pd.DataFrame({'Name': ['A', 'A', 'A','A','B','B','C','C','C','C','C','C','C','C','C'],
  'Supplier': ['Wal', 'Wal', 'Wal', 'Wal', 'Co', 'Co', 'Mc', 'Mc', 'St', 'St', 'St', 'St', 'St', 'To', 'To'],
  'Date': ['2021-05-31', '2021-06-01', '2021-06-08', '2021-06-09', '2021-05-17', '2021-05-18'
          , '2021-04-07', '2021-04-08', '2021-05-11', '2021-05-12', '2021-05-13', '2021-05-18'
          , '2021-05-19', '2021-03-30', '2021-03-31'],
  'Amount': [27, 400, 410, 250, 100, 50, 22, 78, 60, 180, 100, 240, 140, 30, 110]})

df['Date'] = pd.to_datetime(df['Date'])
filt = df.loc[((df['Date'] - df['Date'].shift(-1)).abs() == pd.Timedelta('1d')) | (df['Date'].diff() == pd.Timedelta('1d'))]
breaks = filt['Date'].diff() != pd.Timedelta('1d')
df['Summation'] = df.groupby(['Name','Supplier',breaks.cumsum()])['Amount'].transform('sum')

print(df)

output:

   Name Supplier       Date  Amount  Summation
0     A      Wal 2021-05-31      27        427
1     A      Wal 2021-06-01     400        427
2     A      Wal 2021-06-08     410        660
3     A      Wal 2021-06-09     250        660
4     B       Co 2021-05-17     100        150
5     B       Co 2021-05-18      50        150
6     C       Mc 2021-04-07      22        100
7     C       Mc 2021-04-08      78        100
8     C       St 2021-05-11      60        340
9     C       St 2021-05-12     180        340
10    C       St 2021-05-13     100        340
11    C       St 2021-05-18     240        380
12    C       St 2021-05-19     140        380
13    C       To 2021-03-30      30        140
14    C       To 2021-03-31     110        140
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading