Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to distribute pandas dataframe rows evenly across timestamps based on value of the column

E.g. DF which contains number of executions across timestamps.

             DateTime           Execution
0 2023-04-03 07:00:00                   4
1 2023-04-03 10:00:00                   1
2 2023-04-03 12:00:00                   1
3 2023-04-03 14:00:00                   1
4 2023-04-03 18:00:00                   1

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5080 entries, 0 to 5079

Below is the output I’m trying to achieve

             DateTime           Execution
0 2023-04-03 07:00:00                   1
1 2023-04-03 08:00:00                   1
2 2023-04-03 09:00:00                   1
3 2023-04-03 10:00:00                   1
4 2023-04-03 10:00:00                   1
5 2023-04-03 12:00:00                   1
6 2023-04-03 14:00:00                   1
7 2023-04-03 18:00:00                   1

So want to distribute the each executions more than 1 to evenly hourly timestamp.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried: How to divide up 'supply' evenly among rows in a dataframe by rank in Python? but this doesn’t give the desired output.

tried this but it is only for arranging evenly arrange dataframe rows based on the values in a given column

>Solution :

Use Index.repeat with DataFrame.loc for repeat rows, set 1 and add hours by to_timedelta with GroupBy.cumcount:

#if string repr of datetimes
df['DateTime'] = pd.to_datetime(df['DateTime'])

out = (df.loc[df.index.repeat(df['Execution'])]
         .assign(Execution=1, 
                 DateTime = lambda x:  x['DateTime'] + 
                            pd.to_timedelta(x.groupby(level=0).cumcount(), unit='H'))
         .reset_index(drop=True))
print (out)
             DateTime  Execution
0 2023-04-03 07:00:00          1
1 2023-04-03 08:00:00          1
2 2023-04-03 09:00:00          1
3 2023-04-03 10:00:00          1
4 2023-04-03 10:00:00          1
5 2023-04-03 12:00:00          1
6 2023-04-03 14:00:00          1
7 2023-04-03 18:00:00          1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading