Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

spread then subtract consecutive rows in pandas data frame

I have a data frame where I need to pair consecutive events within the same day and subtract the later from the former. Each has a timestamp and a date.

         time                     date   event   score
0 2022-03-07 06:45:00+00:00  2022-03-07  light   80.066667
1 2022-03-07 18:12:00+00:00  2022-03-07  dark   79.857667
2 2022-03-30 06:25:00+00:00  2022-03-30  light  107.060833
3 2022-03-30 13:38:00+00:00  2022-03-30  dark  105.324000
4 2022-03-30 13:40:00+00:00  2022-03-30  dark  105.239750
5 2022-03-30 15:47:00+00:00  2022-03-30  light  106.863143
6 2022-04-01 06:25:00+00:00  2022-04-01  light  101.271867

I have tried spreading the data frame using

df = df.pivot(index='time', columns='event', values='score')

event                             light       dark
time                                             
2022-03-07 06:45:00+00:00   80.066667         NaN
2022-03-07 18:12:00+00:00         NaN   79.857667
2022-03-30 06:25:00+00:00  107.060833         NaN
2022-03-30 13:38:00+00:00         NaN  105.324000
2022-03-30 13:40:00+00:00         NaN  105.239750
2022-03-30 15:47:00+00:00  106.863143         NaN
2022-04-01 06:25:00+00:00  101.271867         NaN

however because the events happen at different times the spread data frame has NaNs. I would ideally end up with this, where I keep the time of the first occurrence in the pair (light or dark), align the events (note: corresponding dark to match light as not yet occurred for 2022-04-01) and when light comes first I subtract dark from light and when dark comes first I subtract dark from light.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

event                             light  dark          diff
time                                             
2022-03-07 06:45:00+00:00   80.066667    79.857667    -0.208999
2022-03-30 06:25:00+00:00  107.060833    105.324000   -1.7368
2022-03-30 13:40:00+00:00  106.863143    105.239750   -1.6233
2022-04-01 06:25:00+00:00  101.271867    NaN          NaN

>Solution :

Here’s one way using groupby + cumcount to create groups, then use that groups in another groupby + first to get the first time each event happens on a day. Then pivot:

out = (df.assign(time=df.groupby(df.groupby('event').cumcount())['time'].transform('first'))
       .pivot('time', 'event', 'score').reset_index().rename_axis([None], axis=1))

Output:

                        time        dark       light
0  2022-03-07 06:45:00+00:00   79.857667   80.066667
1  2022-03-30 06:25:00+00:00  105.324000  107.060833
2  2022-03-30 13:40:00+00:00  105.239750  106.863143
3  2022-04-01 06:25:00+00:00         NaN  101.271867
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading