Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Reorder dataframe groupby medians following custom order

I have a dataset containing a bunch of data in the columns params and value. I’d like to count how many values each params contains (to use as labels in a boxplot), so I use mydf['params'].value_counts() to show this:

slidingwindow_250     11574
hotspots_1k_100        8454
slidingwindow_500      5793
slidingwindow_100      5366
hotspots_5k_500        3118
slidingwindow_1000     2898
hotspots_10k_1k        1772
slidingwindow_2500     1160
slidingwindow_5000      580
Name: params, dtype: int64

I have a list of all of the entries in params in the order I wish to display them in a boxplot. I try to use sort_index(level=myorder) to get them in my custom order, but the function ignores myorder and just sorts them alphabetically.

myorder = ["slidingwindow_100",
          "slidingwindow_250",
          "slidingwindow_500",
          "slidingwindow_1000",
          "slidingwindow_2500",
          "slidingwindow_5000",
          "hotspots_1k_100",
          "hotspots_5k_500",
          "hotspots_10k_1k"]

sizes_bp_log_df['params'].value_counts().sort_index(level=myorder)

hotspots_10k_1k        1772
hotspots_1k_100        8454
hotspots_5k_500        3118
slidingwindow_100      5366
slidingwindow_1000     2898
slidingwindow_250     11574
slidingwindow_2500     1160
slidingwindow_500      5793
slidingwindow_5000      580
Name: params, dtype: int64

How can I get the index of my value counts in the order I want them to be in?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

In addition, I’ll be using the median of each distribution as coordinates for the boxplot labels too, which I retrieve using sizes_bp_log_df.groupby(['params']).median(); hopefully your suggested sort methods will also work for that task.

>Solution :

Use reindex instead of sort_index

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading