I have a dataframe and want to get the idxmin of a values column, but for each ID (which can occure multiple times).
My df:
data = pd.DataFrame({'ID': [123, 122,122,122,123,125,126],
'values':[ 2, 1, 2, 8, 6, 4, 5]})
No I would use a lambda function, filter the df to get a subselected df for all ID occurences and use idxmin to get the min index value of that subselect. When I use the different parts alone, they work as intended, but when I use it together, it just outputs the same ID (overall idxmin) for every row.
data.assign(TEST = lambda x: data.loc[data["ID"]==x["ID"],"values"].idxmin())
Output:
| Index | ID | values | TEST |
|---|---|---|---|
| 0 | 123 | 2 | 1 |
| 1 | 122 | 1 | 1 |
| 2 | 122 | 2 | 1 |
| 3 | 122 | 8 | 1 |
| 4 | 123 | 6 | 1 |
| 5 | 125 | 4 | 1 |
| 6 | 126 | 5 | 1 |
Does anybody know why the behaviour is like that instead of:
| Index | ID | values | TEST |
|---|---|---|---|
| 0 | 123 | 2 | 0 |
| 1 | 122 | 1 | 1 |
| 2 | 122 | 2 | 1 |
| 3 | 122 | 8 | 1 |
| 4 | 123 | 6 | 0 |
| 5 | 125 | 4 | 5 |
| 6 | 126 | 5 | 6 |
Thanks!
>Solution :
In your assign, x is the full dataframe, thus
data.loc[data["ID"]==data["ID"],"values"].idxmin()
returns 1, and your code is equivalent to:
data.assign(TEST=1)
You need to use groupby here:
data['TEST'] = data.groupby('ID')['values'].transform('idxmin')
output:
ID values TEST
0 123 2 0
1 122 1 1
2 122 2 1
3 122 8 1
4 123 6 0
5 125 4 5
6 126 5 6