Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sort a subset of Pandas DataFrame

import pandas as pd
data = [[1, 1, 2, 1, 0], [ 2, 2, 2, 1, 4], [ 3, 1, 0, 1,4], [ 4, 1, 3, 1, 4], 
        [5, 1, 6, 1, 4], [ 6, 1, 2, 0, 4], [ 7, 1, 2, 7,4], [ 8, 1, 2, 1, 1], 
        [9, 1, 2, 1, 2], [10, 1, 2, 1, 3], [11, 1, 2, 1,5], [12, 1, 2, 1, 6]]
df = pd.DataFrame(data, columns=['Id','c1', 'c2','c3', 'c4'])

import scipy.integrate
import scipy.special
mat = scipy.spatial.distance.cdist(
    df[['c1','c2','c3','c4']], 
    df[['c1','c2','c3','c4']], 
    metric='euclidean'
)
new_df = pd.DataFrame(mat, index=df['Id'], columns=df['Id']) 

When I apply sorting in dataframe, it works:

new_df.sort_values(by=1,ascending=True,kind="mergesort",axis=1)

but if I apply sorting in a subset of dataframe it does not work:

i = 1
j = 2
new_dff = new_df[i:j]
new_dff.sort_values(by=1, ascending=True, kind="mergesort", axis=1)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

For subset of rows use DataFrame.loc:

i = 1
j = 2

new_dff=new_df.loc[i:j]
print (new_dff)
Id        1         2         3         4         5         6         7   \
Id                                                                         
1   0.000000  4.123106  4.472136  4.123106  5.656854  4.123106  7.211103   
2   4.123106  0.000000  2.236068  1.414214  4.123106  1.414214  6.082763   

Id        8         9         10        11        12  
Id                                                    
1   1.000000  2.000000  3.000000  5.000000  6.000000  
2   3.162278  2.236068  1.414214  1.414214  2.236068

Then sorting working well:

new_dff = new_dff.sort_values(by=1, ascending=True, kind="mergesort", axis=1)
print (new_dff)
Id        1         8         9         10        2         4         6   \
Id                                                                         
1   0.000000  1.000000  2.000000  3.000000  4.123106  4.123106  4.123106   
2   4.123106  3.162278  2.236068  1.414214  0.000000  1.414214  1.414214   

Id        3         11        5         12        7   
Id                                                    
1   4.472136  5.000000  5.656854  6.000000  7.211103  
2   2.236068  1.414214  4.123106  2.236068  6.082763  

Or for subset of columns use : for select all rows:

i = 1
j = 2

new_dff=new_df.loc[:, i:j]
print (new_dff)
Id         1         2
Id                    
1   0.000000  4.123106
2   4.123106  0.000000
3   4.472136  2.236068
4   4.123106  1.414214
5   5.656854  4.123106
6   4.123106  1.414214
7   7.211103  6.082763
8   1.000000  3.162278
9   2.000000  2.236068
10  3.000000  1.414214
11  5.000000  1.414214
12  6.000000  2.236068

Or both:

i = 1
j = 2

new_dff=new_df.loc[i:j, i:j]
print (new_dff)
Id         1         2
Id                    
1   0.000000  4.123106
2   4.123106  0.000000
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading