Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I group by a datetime column with timezone?

import pandas as pd

df = pd.DataFrame(
    {
        'StartDate':['2020-01-01 00:00:00-04:00', '2020-01-01 01:00:00-04:00', '2020-01-01 01:55:00-04:00', '2020-01-02 02:00:00-02:00', '2020-01-02 02:00:00-04:00'],
        'Weight':[100, 110, 120, 125, 155]
    }
)
df['StartDate'] = pd.to_datetime(df['StartDate'])

df

df1

I want to group the data by the hour and sum up the Weight column. So, the end result would be a df with 3 rows: current index 0, current indexes 1&2, current indexes 3&4.

I came across the Grouper function and I tried the following but it didn’t work:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = df.groupby(pd.Grouper(key='StartDate', freq='H')).sum()

I get the following error:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or
PeriodIndex, but got an instance of ‘Index’

Does anyone know what I’m doing wrong or can someone provide a solution?

Thanks

>Solution :

You first need to convert to datetime, taking into account the timezones:

df['StartDate'] = pd.to_datetime(df['StartDate'], utc=True)

df.groupby(pd.Grouper(key='StartDate', freq='H')).sum()

Output:

                           Weight
StartDate                        
2020-01-01 04:00:00+00:00     100
2020-01-01 05:00:00+00:00     230
2020-01-01 06:00:00+00:00       0
2020-01-01 07:00:00+00:00       0
2020-01-01 08:00:00+00:00       0
2020-01-01 09:00:00+00:00       0
2020-01-01 10:00:00+00:00       0
2020-01-01 11:00:00+00:00       0
2020-01-01 12:00:00+00:00       0
2020-01-01 13:00:00+00:00       0
2020-01-01 14:00:00+00:00       0
2020-01-01 15:00:00+00:00       0
2020-01-01 16:00:00+00:00       0
2020-01-01 17:00:00+00:00       0
2020-01-01 18:00:00+00:00       0
2020-01-01 19:00:00+00:00       0
2020-01-01 20:00:00+00:00       0
2020-01-01 21:00:00+00:00       0
2020-01-01 22:00:00+00:00       0
2020-01-01 23:00:00+00:00       0
2020-01-02 00:00:00+00:00       0
2020-01-02 01:00:00+00:00       0
2020-01-02 02:00:00+00:00       0
2020-01-02 03:00:00+00:00       0
2020-01-02 04:00:00+00:00     125
2020-01-02 05:00:00+00:00       0
2020-01-02 06:00:00+00:00     155

without "blanks"

df.groupby(pd.to_datetime(df['StartDate'], utc=True).dt.floor('h'))['Weight'].sum()

StartDate
2020-01-01 04:00:00+00:00    100
2020-01-01 05:00:00+00:00    230
2020-01-02 04:00:00+00:00    125
2020-01-02 06:00:00+00:00    155
Name: Weight, dtype: int64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading