Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas: extract time-of–the day as feature for classification

I have a time series data, with a unix seconds column (among other columns):

df = pd.DataFrame(
    {
        'user': [3,3,3,3,3,6,6,6],
        'timestamp': [1459467971, 1459468020, 1459468026, 1459468031, 
                      1459468036,1513974852, 1513974853, 1513974854]
    }
)

The dataset is for classification of some events, some are frequent in the morning, others in the afternoon.

I think time-of-day could be an important discriminant as well in this task.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

How do I extract number of seconds for time of the day from this?

Note:
I know that using pandas to_datetime object would give HH:MM:SS as:

df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
#df['date'] = df['timestamp'].dt.date
df['time'] = df['timestamp'].dt.time
df
    user      timestamp          time
0   3   2016-03-31 23:46:11     23:46:11
1   3   2016-03-31 23:47:00     23:47:00
2   3   2016-03-31 23:47:06     23:47:06
3   3   2016-03-31 23:47:11     23:47:11
4   3   2016-03-31 23:47:16     23:47:16
5   6   2017-12-22 20:34:12     20:34:12
6   6   2017-12-22 20:34:13     20:34:13
7   6   2017-12-22 20:34:14     20:34:14

Isn’t there an easy way to pick out the number of seconds representing the time, like this:

    user      timestamp       time-of-day
0   3   2016-03-31 23:46:11       85571
1   3   2016-03-31 23:47:00       85620
2   3   2016-03-31 23:47:06       85626
3   3   2016-03-31 23:47:11       85631
4   3   2016-03-31 23:47:16       85636
5   6   2017-12-22 20:34:12       74052
6   6   2017-12-22 20:34:13       74053
7   6   2017-12-22 20:34:14       74054

>Solution :

You can use dt.normalize to subtract the date part:

df['time-of-day'] = (df['timestamp'].sub(df['timestamp'].dt.normalize())
                                    .dt.total_seconds().astype(int))
print(df)

# Output
   user           timestamp        date      time  time-of-day
0     3 2016-03-31 23:46:11  2016-03-31  23:46:11        85571
1     3 2016-03-31 23:47:00  2016-03-31  23:47:00        85620
2     3 2016-03-31 23:47:06  2016-03-31  23:47:06        85626
3     3 2016-03-31 23:47:11  2016-03-31  23:47:11        85631
4     3 2016-03-31 23:47:16  2016-03-31  23:47:16        85636
5     6 2017-12-22 20:34:12  2017-12-22  20:34:12        74052
6     6 2017-12-22 20:34:13  2017-12-22  20:34:13        74053
7     6 2017-12-22 20:34:14  2017-12-22  20:34:14        74054

Note: I’m not sure to use time-of-day is relevant as continuous variable. Maybe you should think to discretize into 8 blocks of 3 hours?

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading