Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

categorical sorting within date group in dataframe

I have a dataframe like this one and would like to make sure to sort it by date and have the
same ordering for side for status emtries, I would like to see B entries on top before S for each date. But also keep the dates where there is only one entry.

tempDF = pd.DataFrame({ 'id': [12,12,12,12,45,45,45,51,51,51,51,51,51,76,76,76],
                'date': ['2015-05-01','2015-05-22','2015-05-14','2015-05-06','2015-05-03','2015-05-12','2015-05-02','2015-05-05','2015-05-01','2015-05-23','2015-05-17','2015-05-03','2015-05-05','2015-05-04','2015-05-22','2015-05-08'],
                'status': ['B','S','B','S','B','B','S','B','S','B','B','S','S','B','B','S']})
tempDF['date'] = pd.to_datetime(tempDF['date'])

I tried

tempDF.sort_values(['status']).groupby(tempDF['date']).head(2)

but this ends up the entries sorted before being grouped

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

If I understand correctly, first sort_values then grouby.head with sort=False:

(tempDF.sort_values(by=['date', 'status'], ascending=True)
       .groupby('date', sort=False).head(2)
)

Output:

    id       date status
0   12 2015-05-01      B
8   51 2015-05-01      S
6   45 2015-05-02      S
4   45 2015-05-03      B
11  51 2015-05-03      S
13  76 2015-05-04      B
7   51 2015-05-05      B
12  51 2015-05-05      S
3   12 2015-05-06      S
15  76 2015-05-08      S
5   45 2015-05-12      B
2   12 2015-05-14      B
10  51 2015-05-17      B
14  76 2015-05-22      B
1   12 2015-05-22      S
9   51 2015-05-23      B

Similarly, if you want all Bs before all Ss:

tempDF.sort_values(by=['status', 'date'], ascending=True)
       .groupby('date', sort=False).head(2)
)

Output:

    id       date status
0   12 2015-05-01      B
4   45 2015-05-03      B
13  76 2015-05-04      B
7   51 2015-05-05      B
5   45 2015-05-12      B
2   12 2015-05-14      B
10  51 2015-05-17      B
14  76 2015-05-22      B
9   51 2015-05-23      B
8   51 2015-05-01      S
6   45 2015-05-02      S
11  51 2015-05-03      S
12  51 2015-05-05      S
3   12 2015-05-06      S
15  76 2015-05-08      S
1   12 2015-05-22      S
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading