Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Use Pandas to convert column of non continuous dates to hourly values of those dates

I have a list of non-continuous dates. I’m reading these into a dataframe and would like convert to those dates to an hourly level.

All of my searching led to resampling. I managed to convert the date column to a DateTimeIndex and resample, however that filled in every single hour from the first date to the last, and I’m looking for just 24 hours at a time.

d = {'date': ["1/1/2014", "5/26/2014", "7/4/2014"]}
df = pd.DataFrame(data = d)
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.tz_localize('US/Eastern')
df = df.set_index('date')
df = df.resample('H').ffill()

This creates a dataframe with an index that contains every (4416 total) hour from 1/1/2014 – 7/4/2014

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

What I actually want is to have a dataframe that looks like this:

date
2014-01-01 00:00:00-05:00
2014-01-01 01:00:00-05:00
2014-01-01 02:00:00-05:00
2014-01-01 03:00:00-05:00
2014-01-01 04:00:00-05:00
2014-01-01 05:00:00-05:00
2014-01-01 06:00:00-05:00
2014-01-01 07:00:00-05:00
2014-01-01 08:00:00-05:00
2014-01-01 09:00:00-05:00
2014-01-01 10:00:00-05:00
2014-01-01 11:00:00-05:00
2014-01-01 12:00:00-05:00
2014-01-01 13:00:00-05:00
2014-01-01 14:00:00-05:00
2014-01-01 15:00:00-05:00
2014-01-01 16:00:00-05:00
2014-01-01 17:00:00-05:00
2014-01-01 18:00:00-05:00
2014-01-01 19:00:00-05:00
2014-01-01 20:00:00-05:00
2014-01-01 21:00:00-05:00
2014-01-01 22:00:00-05:00
2014-01-01 23:00:00-05:00
2014-05-26 00:00:00-04:00
2014-05-26 01:00:00-04:00
2014-05-26 02:00:00-04:00
2014-05-26 03:00:00-04:00
2014-05-26 04:00:00-04:00
2014-05-26 05:00:00-04:00
2014-05-26 06:00:00-04:00
2014-05-26 07:00:00-04:00
2014-05-26 08:00:00-04:00
2014-05-26 09:00:00-04:00
2014-05-26 10:00:00-04:00
2014-05-26 11:00:00-04:00
2014-05-26 12:00:00-04:00
2014-05-26 13:00:00-04:00
2014-05-26 14:00:00-04:00
2014-05-26 15:00:00-04:00
2014-05-26 16:00:00-04:00
2014-05-26 17:00:00-04:00
2014-05-26 18:00:00-04:00
2014-05-26 19:00:00-04:00
2014-05-26 20:00:00-04:00
2014-05-26 21:00:00-04:00
2014-05-26 22:00:00-04:00
2014-05-26 23:00:00-04:00

And then again skipping to 7/4 and just including the 24 hours of that date.

>Solution :

Instead of resampling, create a date range of date to date + 24h with a frequency of 1H for each date, and then explode them all together:

hours = df['date'].apply(lambda x: pd.date_range(x, x + pd.tseries.offsets.Hour(23), freq='1H')).explode()

Output:

>>> hours
0   2014-01-01 00:00:00-05:00
0   2014-01-01 01:00:00-05:00
0   2014-01-01 02:00:00-05:00
0   2014-01-01 03:00:00-05:00
0   2014-01-01 04:00:00-05:00
               ...           
2   2014-07-04 19:00:00-04:00
2   2014-07-04 20:00:00-04:00
2   2014-07-04 21:00:00-04:00
2   2014-07-04 22:00:00-04:00
2   2014-07-04 23:00:00-04:00
Name: date, Length: 72, dtype: datetime64[ns, US/Eastern]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading