I have a table that has users, food, and which one is their favorite.
| user | food | is_favorite |
|---|---|---|
| 1 | Beef | False |
| 1 | Pork | False |
| 3 | Pork | False |
| 3 | Beef | False |
| 3 | Potatoes | False |
| 4 | Beef | False |
The same user appears in several rows. I need to set exactly 1 of the rows per user as favorite (is_favorite=True):
| user | food | is_favorite |
|---|---|---|
| 1 | Beef | True |
| 1 | Pork | False |
| 3 | Pork | True |
| 3 | Beef | False |
| 3 | Potatoes | False |
| 4 | Beef | True |
Now every user has exactly 1 favorite food.
I successfully got exactly 1 row for each user, but can’t apply it to my initial df. I’m pretty sure it’s something simple I’m missing, but I don’t know pandas that well. It also feels like this is the wrong way to do it:
import pandas as pd
df = pd.DataFrame(
dict(
user=[1, 1, 3, 3, 3, 4],
food=['Beef', 'Pork', 'Pork', 'Beef', 'Potatoes', 'Beef'],
is_favorite=[False, False, False, False, False, False]))
# This works. It gives me exactly 1 row per user
first_food_per_user = df.groupby('user').nth(0).reset_index()
# This doesn't work
for _, row in first_food_per_user.iterrows():
df['is_favorite'].loc[
(df['user'] == row['user'])
&
df['food'] == row['food'],
] = True
>Solution :
No need to groupby, just use duplicated and boolean indexing:
df.loc[~df['user'].duplicated(), 'is_favorite'] = True
Output:
user food is_favorite
0 1 Beef True
1 1 Pork False
2 3 Pork True
3 3 Beef False
4 3 Potatoes False
5 4 Beef True
If you want to set a random row use groupby.sample:
idx = df.groupby('user')['is_favorite'].sample(n=1).index
df.loc[idx, 'is_favorite'] = True
Example:
user food is_favorite
0 1 Beef True
1 1 Pork False
2 3 Pork False
3 3 Beef True
4 3 Potatoes False
5 4 Beef True