I have a dataframe df which looks something like this:
| key | id |
|---|---|
| x | 0.6 |
| x | 0.5 |
| x | 0.43 |
| x | 0.56 |
| y | 13 |
| y | 14 |
| y | 0.4 |
| y | 0.1 |
I’d like to replace the Last value for every key value with 0, so that the df looks like this:
| key | id |
|---|---|
| x | 0.6 |
| x | 0.5 |
| x | 0.43 |
| x | 0 |
| y | 13 |
| y | 14 |
| y | 0.4 |
| y | 0 |
I’ve tried the following:
for i in df['key'].unique():
df.loc[df['key'] == i, 'id'].iat[-1] = 0
the problem is it does not replace the actual value in the df. What am I missing? and perhaps there’s an even better (performing) way to tackle this problem.
>Solution :
Use Series.duplicated for get last value per key and set 0 in DataFrame.loc:
df.loc[~df['key'].duplicated(keep='last'), 'id'] = 0
print (df)
key id
0 x 0.60
1 x 0.50
2 x 0.43
3 x 0.00
4 y 13.00
5 y 14.00
6 y 0.40
7 y 0.00
How it working:
print (df.assign(mask=df['key'].duplicated(keep='last'),
invert_mask=~df['key'].duplicated(keep='last')))
key id mask invert_mask
0 x 0.60 True False
1 x 0.50 True False
2 x 0.43 True False
3 x 0.00 False True
4 y 13.00 True False
5 y 14.00 True False
6 y 0.40 True False
7 y 0.00 False True
Another solution is simply multiple id column with boolean mask:
df['id'] = df['key'].duplicated(keep='last').mul(df['id'])
print (df)
key id
0 x 0.60
1 x 0.50
2 x 0.43
3 x 0.00
4 y 13.00
5 y 14.00
6 y 0.40
7 y 0.00