I'm trying to find the closest value in a reference data frame but it is not outputting the correct row. Where am I going wrong?

I’m trying to find the closest value in a reference data frame but it is not outputting the correct row.

I am using the below data frame which is then used to find the relevant row corresponding to the closest value in column ‘P’ to a defined variable. For example if p = 0.22222 then the code should output row 2.

DF:

      P   n1      n2      n3      n4      n5      n6      n7      n8      n9   
0   NaN  0.0  0.2000  0.4000  0.6000  0.8000  1.0000  1.2000  1.4000  1.6000
1   0.0  1.0  0.8039  0.6286  0.4855  0.3753  0.2929  0.2318  0.1863  0.1520   
2   0.2  1.0  0.7983  0.6201  0.4771  0.3683  0.2876  0.2279  0.1835  0.1500   
3   0.4  1.0  0.7789  0.5924  0.4508  0.3473  0.2720  0.2167  0.1754  0.1442   
4   0.6  1.0  0.7349  0.5377  0.4043  0.3124  0.2470  0.1989  0.1628  0.1351   
5   0.8  1.0  0.6301  0.4433  0.3368  0.2658  0.2147  0.1762  0.1465  0.1234   
6   1.0  0.5  0.3828  0.3105  0.2559  0.2130  0.1787  0.1510  0.1286  0.1102   
7   1.2  0.0  0.1544  0.1871  0.1795  0.1621  0.1433  0.1257  0.1103  0.0965   
8   1.4  0.0  0.0717  0.1101  0.1216  0.1197  0.1120  0.1024  0.0925  0.0831   
9   1.6  0.0  0.0400  0.0682  0.0829  0.0876  0.0865  0.0824  0.0765  0.0707   
10  1.8  0.0  0.0249  0.0449  0.0580  0.0647  0.0668  0.0659  0.0633  0.0597   
11  2.0  0.0  0.0168  0.0312  0.0418  0.0485  0.0519  0.0528  0.0520  0.0502   
12  3.0  0.0  0.0042  0.0082  0.0118  0.0149  0.0174  0.0193  0.0207  0.0216

The function I am using however outputs the incorrect value:

p = 0.2020202
closest_p = df.iloc[(df['P']-p).abs().argsort()[:1]]

Expected output:

      P   n1      n2      n3      n4      n5      n6      n7      n8      n9   
2   0.2  1.0  0.7983  0.6201  0.4771  0.3683  0.2876  0.2279  0.1835  0.1500

However it is only outputting the last row –

      P   n1      n2      n3      n4      n5      n6      n7      n8      n9   
12  3.0  0.0  0.0042  0.0082  0.0118  0.0149  0.0174  0.0193  0.0207  0.0216

Where am i going wrong????

>Solution :

You need to use idxmin:

closest_p = (df['P']-p).abs().idxmin()

Output: 2

For the row: df.loc[(df['P']-p).abs().idxmin()]

A fix of your approach would have been to use sort_values (but it’s less efficient):

closest_p = df.loc[(df['P']-p).abs().sort_values().index[0]]

Leave a Reply