I have a dataframe df1 :-
| Store_id | fruit | region |
|---|---|---|
| 1 | orange | x |
| 2 | apple | y |
| 3 | NotKnown | z |
| 5 | Notknown | q |
| 6 | banana | w |
I have a dataframe df2 :-
| Store_id | fruit | region |
|---|---|---|
| 1 | orange | x |
| 2 | apple | y |
| 3 | pears | z |
| 5 | strawberry | q |
| 6 | banana | w |
| 8 | mango | i |
Expected df1 :-
| Store_id | fruit | region |
|---|---|---|
| 1 | orange | x |
| 2 | apple | y |
| 3 | pears | z |
| 5 | strawberry | q |
| 6 | banana | w |
Store_id is the primary key.
How do I update column fruit of df1 based on df2 column fruit for value NotKnown under fruit column of df1
>Solution :
reg_to_fru = df2.set_index("region")["fruit"]
df1.fruit = df1.region.map(reg_to_fru)
you can form a mapper (a Series) from df2 as region -> fruit, then map the region in df1 with it:
In [39]: reg_to_fru = df2.set_index("region")["fruit"]
In [40]: reg_to_fru
Out[40]:
region
x orange
y apple
z pears
q strawberry
w banana
i mango
Name: fruit, dtype: object
In [41]: df1.fruit = df1.region.map(reg_to_fru)
In [42]: df1
Out[42]:
Store_id fruit region
0 1 orange x
1 2 apple y
2 3 pears z
3 5 strawberry q
4 6 banana w