check if cumsum of the column is greater than range value than increment the element in list

I have a list

sample_dates = ["10/07/2021","11/07/2021","12/07/2021","13/07/2021",
                "14/07/2021","15/07/2021","16/07/2021","17/07/2021",
                "18/07/2021","19/07/2021","20/07/2021","21/07/2021",
                "22/07/2021","23/07/2021","24/07/2021"]

and dataframe like below

Truckid   Tripid   kms    
  1          1     700.3  
  1          1     608.9        
  1          1     400.2  
  1          2     100.2  
  1          2     140.8        
  1          3     1580.0 
  1          3     357.3        
  1          3     541.5  
  1          4     421.2   
  1          4     1694.4 
  1          4     1585.9 
  1          5     173.3  
  1          5     237.4   
  1          5     83.3   
  2          1     846.1  
  2          1     1167.6  
  2          2     388.8  
  2          2     70.5   
  2          2     127.1  
  2          3     126.7  
  2          3     262.4  

I want Date column by cumsum,if kms > 0 & < 2000 should have same date,if it increase 2000 than change the date, and than if it is > 2000 & < 3000 than do not change and than if its passes 3000 than again change the date. and so on

also if tripid changes than restart the counting from 0.

I want something like this

Truckid   Tripid   kms        Date
  1          1     700.3      10/07/2021
  1          1     608.9      10/07/2021      
  1          1     400.2      10/07/2021
  1          2     100.2      11/07/2021
  1          2     140.8      11/07/2021      
  1          3     1580.0     12/07/2021
  1          3     357.3      12/07/2021      
  1          3     541.5      13/07/2021
  1          4     421.2      14/07/2021 
  1          4     1694.4     15/07/2021
  1          4     1585.9     16/07/2021
  1          5     173.3      17/07/2021
  1          5     237.4      17/07/2021 
  1          5     83.3       17/07/2021
  2          1     846.1      18/07/2021
  2          1     1167.6     19/07/2021 
  2          2     388.8      20/07/2021
  2          2     70.5       20/07/2021
  2          2     127.1      20/07/2021
  2          3     126.7      21/07/2021
  2          3     262.4      21/07/2021

>Solution :

You can compute the cumsum per group and either cut is manually or use a mathematical trick to make groups.

Then map your dates:

# round to thousands, clip to get min 1000 km
kms = df.groupby(['Truckid', 'Tripid'])['kms'].cumsum().floordiv(1000).clip(1)

# OR use manual bins
kms = pd.cut(df.groupby(['Truckid', 'Tripid'])['kms'].cumsum(),
             bins=[0,2000,3000,4000]) # etc. up to max wanted value


df['Date'] = (df
              .groupby(['Truckid', 'Tripid', kms]).ngroup() # get group ID
              .map(dict(enumerate(sample_dates)))      # match to items in order
             )

alternative to use consecutive days from the starting point:

df['Date'] = pd.to_datetime(df.groupby(['Truckid', 'Tripid', kms]).ngroup(),
                            unit='d', origin='2021-07-10')

output:

    Truckid  Tripid     kms        Date
0         1       1   700.3  10/07/2021
1         1       1   608.9  10/07/2021
2         1       1   400.2  10/07/2021
3         1       2   100.2  11/07/2021
4         1       2   140.8  11/07/2021
5         1       3  1580.0  12/07/2021
6         1       3   357.3  12/07/2021
7         1       3   541.5  13/07/2021
8         1       4   421.2  14/07/2021
9         1       4  1694.4  15/07/2021
10        1       4  1585.9  16/07/2021
11        1       5   173.3  17/07/2021
12        1       5   237.4  17/07/2021
13        1       5    83.3  17/07/2021
14        2       1   846.1  18/07/2021
15        2       1  1167.6  19/07/2021
16        2       2   388.8  20/07/2021
17        2       2    70.5  20/07/2021
18        2       2   127.1  20/07/2021
19        2       3   126.7  21/07/2021
20        2       3   262.4  21/07/2021

Leave a Reply