Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

What are the sorting order of a Pandas dataframe?

input1

import pandas as pd
import numpy as np
np.random.seed(0)
data = {'item': np.random.choice(['skirt', 'shirt', 'coat'], 6),
        'size': np.random.choice(['S', 'M', 'L', 'XL'], 6)}
df1 = pd.DataFrame(data)

df1:

    item    size
0   skirt   S
1   shirt   XL
2   skirt   L
3   shirt   S
4   shirt   S
5   coat    S

when i sort by size

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1.sort_values('size')

out:

    item    size
2   skirt   L
0   skirt   S
3   shirt   S
4   shirt   S
5   coat    S
1   shirt   XL

The data is sorted by the size column, and when the values of the size column are the same, the rows that were originally higher remain higher.


input2

import pandas as pd
import numpy as np
pd.options.display.max_rows = 6
np.random.seed(0)
data1 = {'item': np.random.choice(['skirt', 'shirt', 'coat'], 1000000),
        'size': np.random.choice(['S', 'M', 'L', 'XL'], 1000000)}
df2 = pd.DataFrame(data1)

df2

         item size
0       skirt    M
1       shirt    L
2       skirt    M
...       ...  ...
999997   coat    S
999998  shirt    S
999999  skirt    L

[1000000 rows x 2 columns]

df2 has 1M rows

when i sort by size

df2.sort_values('size')

out:

         item size
999999  skirt    L   <- why top?
645704  shirt    L
645714  shirt    L
...       ...  ...
822256   coat   XL
699230   coat   XL
400737  skirt   XL

[1000000 rows x 2 columns]

I don’t know why 999999 row is at the top in df2.

Shouldn’t the existing order be followed if size is the same?

>Solution :

What you want is a "stable" sort. "Stable" means it maintains the current order when the keys are identical. The default algorithm, quicksort, is not stable.

df2.sort_values('size',kind='stable')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading