Follow

Follow

Contact

Home Sort a subset of Pandas DataFrame

Questions

Sort a subset of Pandas DataFrame

byMR

August 31, 2022

import pandas as pd
data = [[1, 1, 2, 1, 0], [ 2, 2, 2, 1, 4], [ 3, 1, 0, 1,4], [ 4, 1, 3, 1, 4], 
        [5, 1, 6, 1, 4], [ 6, 1, 2, 0, 4], [ 7, 1, 2, 7,4], [ 8, 1, 2, 1, 1], 
        [9, 1, 2, 1, 2], [10, 1, 2, 1, 3], [11, 1, 2, 1,5], [12, 1, 2, 1, 6]]
df = pd.DataFrame(data, columns=['Id','c1', 'c2','c3', 'c4'])

import scipy.integrate
import scipy.special
mat = scipy.spatial.distance.cdist(
    df[['c1','c2','c3','c4']], 
    df[['c1','c2','c3','c4']], 
    metric='euclidean'
)
new_df = pd.DataFrame(mat, index=df['Id'], columns=df['Id'])

When I apply sorting in dataframe, it works:

new_df.sort_values(by=1,ascending=True,kind="mergesort",axis=1)

but if I apply sorting in a subset of dataframe it does not work:

i = 1
j = 2
new_dff = new_df[i:j]
new_dff.sort_values(by=1, ascending=True, kind="mergesort", axis=1)

>Solution :

For subset of rows use DataFrame.loc:

i = 1
j = 2

new_dff=new_df.loc[i:j]
print (new_dff)
Id        1         2         3         4         5         6         7   \
Id                                                                         
1   0.000000  4.123106  4.472136  4.123106  5.656854  4.123106  7.211103   
2   4.123106  0.000000  2.236068  1.414214  4.123106  1.414214  6.082763   

Id        8         9         10        11        12  
Id                                                    
1   1.000000  2.000000  3.000000  5.000000  6.000000  
2   3.162278  2.236068  1.414214  1.414214  2.236068

Then sorting working well:

new_dff = new_dff.sort_values(by=1, ascending=True, kind="mergesort", axis=1)
print (new_dff)
Id        1         8         9         10        2         4         6   \
Id                                                                         
1   0.000000  1.000000  2.000000  3.000000  4.123106  4.123106  4.123106   
2   4.123106  3.162278  2.236068  1.414214  0.000000  1.414214  1.414214   

Id        3         11        5         12        7   
Id                                                    
1   4.472136  5.000000  5.656854  6.000000  7.211103  
2   2.236068  1.414214  4.123106  2.236068  6.082763

Or for subset of columns use : for select all rows:

i = 1
j = 2

new_dff=new_df.loc[:, i:j]
print (new_dff)
Id         1         2
Id                    
1   0.000000  4.123106
2   4.123106  0.000000
3   4.472136  2.236068
4   4.123106  1.414214
5   5.656854  4.123106
6   4.123106  1.414214
7   7.211103  6.082763
8   1.000000  3.162278
9   2.000000  2.236068
10  3.000000  1.414214
11  5.000000  1.414214
12  6.000000  2.236068

Or both:

i = 1
j = 2

new_dff=new_df.loc[i:j, i:j]
print (new_dff)
Id         1         2
Id                    
1   0.000000  4.123106
2   4.123106  0.000000

numpy

byMR

Published August 31, 2022

Add a comment

Leave a ReplyCancel reply

Read more

Questions

When is using IS and AS okay? How would you refactor this?

byMR

August 31, 2022

Questions

For each row of a numpy array, set specific elements to n

byMR

August 31, 2022

Questions

Vegalite Keep Most Recent Data (Filter Transform)

byMR

August 31, 2022

Questions

R: How to align lines with polygon

byMR

August 31, 2022

Questions

How to change column names of Pandas Series object?

byMR

August 31, 2022

Questions

Pass Variable from function to another python

byMR

August 31, 2022