I am trying to replace a value in a pandas dataframe based on another column value.
I made sample code below to replicate the issue, but essentially I want to add a column to an existing dataframe and then replace placeholder information based on another column’s value. The dataframe I am using (not in the example) is based on an excel document that will be used to webscrape information that I want returned in the new column based on the other column’s value. Just wanted to mention that before people asked why not just start the ex_list data in the dataframe. Additionally I only want it replace where the condition is met, not replacing the entire column with a set value.
Sample Code
## this would be the excel document df
sample_df = pd.DataFrame({"a":[1,2,3,4,5]})
sample_df["b"] = ""
## this data would be webscrapped using information above
ex_list = [[1, "CHANGE"],[4, "CHANGE"]]
for sub in ex_list:
location = sample_df.loc[sample_df['a']==sub[0], 'b'].iloc[0]
sample_df.replace(location, sub[1])
sample.head()
I also tried this just as a quick playaround but it produced the same output
sample_df = pd.DataFrame({"a":[1,2,3,4,5]})
sample_df["b"] = ""
ex_list = [[1, "CHANGE"],[4, "CHANGE"]]
for sub in ex_list:
sample_df[sample_df['a']==sub[0], 'b'].iloc[0] += sub[1]
sample_df.head()
Both outputs are the same and show no change:
a b
0 1
1 2
2 3
3 4
4 5
The output I am hoping for is this
a b
0 1 CHANGE
1 2
2 3
3 4 CHANGE
4 5
I would appreciate a second pair of eyes on this. Is my method of ‘locating the value’ logic off? I thought the .loc/.iloc would be best but perhaps another way of indexing is best? I would be open to any solutions!
>Solution :
IMHO, the title makes your question an XY. I think you want to simply merge both objects :
# sample_df["b"] = "" # no need for this line anymore
out = sample_df.merge(pd.DataFrame(ex_list, columns=["a", "b"]), how="left")
Output :
print(out)
a b
0 1 CHANGE
1 2 NaN
2 3 NaN
3 4 CHANGE
4 5 NaN