I’m trying to find the closest value in a reference data frame but it is not outputting the correct row.
I am using the below data frame which is then used to find the relevant row corresponding to the closest value in column ‘P’ to a defined variable. For example if p = 0.22222 then the code should output row 2.
DF:
P n1 n2 n3 n4 n5 n6 n7 n8 n9
0 NaN 0.0 0.2000 0.4000 0.6000 0.8000 1.0000 1.2000 1.4000 1.6000
1 0.0 1.0 0.8039 0.6286 0.4855 0.3753 0.2929 0.2318 0.1863 0.1520
2 0.2 1.0 0.7983 0.6201 0.4771 0.3683 0.2876 0.2279 0.1835 0.1500
3 0.4 1.0 0.7789 0.5924 0.4508 0.3473 0.2720 0.2167 0.1754 0.1442
4 0.6 1.0 0.7349 0.5377 0.4043 0.3124 0.2470 0.1989 0.1628 0.1351
5 0.8 1.0 0.6301 0.4433 0.3368 0.2658 0.2147 0.1762 0.1465 0.1234
6 1.0 0.5 0.3828 0.3105 0.2559 0.2130 0.1787 0.1510 0.1286 0.1102
7 1.2 0.0 0.1544 0.1871 0.1795 0.1621 0.1433 0.1257 0.1103 0.0965
8 1.4 0.0 0.0717 0.1101 0.1216 0.1197 0.1120 0.1024 0.0925 0.0831
9 1.6 0.0 0.0400 0.0682 0.0829 0.0876 0.0865 0.0824 0.0765 0.0707
10 1.8 0.0 0.0249 0.0449 0.0580 0.0647 0.0668 0.0659 0.0633 0.0597
11 2.0 0.0 0.0168 0.0312 0.0418 0.0485 0.0519 0.0528 0.0520 0.0502
12 3.0 0.0 0.0042 0.0082 0.0118 0.0149 0.0174 0.0193 0.0207 0.0216
The function I am using however outputs the incorrect value:
p = 0.2020202
closest_p = df.iloc[(df['P']-p).abs().argsort()[:1]]
Expected output:
P n1 n2 n3 n4 n5 n6 n7 n8 n9
2 0.2 1.0 0.7983 0.6201 0.4771 0.3683 0.2876 0.2279 0.1835 0.1500
However it is only outputting the last row –
P n1 n2 n3 n4 n5 n6 n7 n8 n9
12 3.0 0.0 0.0042 0.0082 0.0118 0.0149 0.0174 0.0193 0.0207 0.0216
Where am i going wrong????
>Solution :
You need to use idxmin
:
closest_p = (df['P']-p).abs().idxmin()
Output: 2
For the row: df.loc[(df['P']-p).abs().idxmin()]
A fix of your approach would have been to use sort_values
(but it’s less efficient):
closest_p = df.loc[(df['P']-p).abs().sort_values().index[0]]