Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Pandas: resample based on just one of the columns

I have the following data and I’m resampling my data to find out how many bikes arrive at each of the stations every 15 minutes. However, my code is aggregating my stations too, and I only want to aggregate the variable "dtm_end_trip"

Sample data:

id_trip dtm_start_trip dtm_end_trip start_station end_station
1 2018-10-01 10:15:00 2018-10-01 10:17:00 A B
2 2018-10-01 10:17:00 2018-10-01 10:18:00 B A
999999 2021-12-31 23:58:00 2022-01-01 00:22:00 C A
1000000 2021-12-31 23:59:00 2022-01-01 00:29:00 A D

Trial code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df2 =  df(['end_station', 'dtm_end_trip']).size().to_frame(name = 'count').reset_index()
df2 = df2.sort_values(by='count', ascending=False)

df2= df2.set_index('dtm_end_trip')

df2 = df2.resample('15T').count()

Output I get:

dtm_end_trip end_station count
2018-10-01 00:15:00 2 2
2018-10-01 00:30:00 0 0
2018-10-01 00:45:00 1 1
2018-10-01 01:00:00 2 2
2018-10-01 01:15:00 1 1

Desired output:

dtm_end_trip end_station count
2018-10-01 00:15:00 A 2
2018-10-01 00:15:00 B 0
2018-10-01 00:15:00 C 1
2018-10-01 00:15:00 D 2
2018-10-01 00:30:00 A 3
2018-10-01 00:30:00 B 2

The count column of the table above was, in this case, constructed with random numbers with the sole purpose of exemplifying the architecture of the desired output.

>Solution :

You can use pd.Grouper like this:

out = df.groupby([
    pd.Grouper(freq='15min', key='dtm_end_trip'),
    'end_station',
]).size()

>>> out
dtm_end_trip         end_station
2018-10-01 10:15:00  A              1
                     B              1
2022-01-01 00:15:00  A              1
                     D              1
dtype: int64

The result is a Series, but you can easily convert it to a DataFrame with the same headings as per your desired output:

>>> out.to_frame('count').reset_index()
         dtm_end_trip end_station  count
0 2018-10-01 10:15:00           A      1
1 2018-10-01 10:15:00           B      1
2 2022-01-01 00:15:00           A      1
3 2022-01-01 00:15:00           D      1

Note: this is the result from the four rows in your sample input data.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading