Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Daily maximum in pandas with missing dates

I am currently somewhat stuck on getting the daily maximum for my dataset.
It looks like this:

                      Date  Value
0      1996-03-07 21:30:00  360.0
1      1996-03-07 21:45:00  360.0
2      1996-03-07 22:00:00  360.0
3      1996-03-07 22:15:00  360.0
4      1996-03-07 22:30:00  360.0
...                    ...    ...
867882 2021-02-03 12:45:00  361.9
867883 2021-02-03 13:00:00  361.8
867884 2021-02-03 13:15:00  361.8
867885 2021-02-03 13:30:00  361.7
867886 2021-02-03 13:45:00  361.8
[867887 rows x 2 columns]

The problem is that inside the dataset entire days are missing.
If I understood correctly the grouper in pandas needs continues days to work properly.
So I refilled the dates:

df.set_index('Date', inplace=True)
all_days = pd.date_range(df.index.min(), df.index.max(), freq='15T')
df = df.reindex(all_days)

But when I now run my code to get the daily maximum

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

daily_maximum = df.loc[df.groupby(pd.Grouper(freq='D')).idxmax().iloc[:, 0]]

I get the following error message:

The DTypes <class 'numpy.dtype[float64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.

When I check with df.index, I get

DatetimeIndex(['1996-03-07 21:30:00', '1996-03-07 21:45:00',
           '1996-03-07 22:00:00', '1996-03-07 22:15:00',
           '1996-03-07 22:30:00', '1996-03-07 22:45:00',
           '1996-03-07 23:00:00', '1996-03-07 23:15:00',
           '1996-03-07 23:30:00', '1996-03-07 23:45:00',
           ...
           '2021-02-03 11:30:00', '2021-02-03 11:45:00',
           '2021-02-03 12:00:00', '2021-02-03 12:15:00',
           '2021-02-03 12:30:00', '2021-02-03 12:45:00',
           '2021-02-03 13:00:00', '2021-02-03 13:15:00',
           '2021-02-03 13:30:00', '2021-02-03 13:45:00'],
          dtype='datetime64[ns]', length=873474, freq='15T')

and checking the dtype of my Value column returns dtype('float64').

I am probably missing something very obvious here, but I’m honestly not familiar at all with dtypes and date formats.

>Solution :

I think here is problem is not defined column after groupby, so is not returned Series, but DataFrame:

daily_maximum = df.loc[df.groupby(pd.Grouper(freq='D'))['value'].idxmax()]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading