I have a DataFrame df with roughly this format:
| alpha | result |
|---|---|
| 0.5 | NaN |
| 1 | 0.89 |
| 1.5 | NaN |
I want to insert a new value where alpha == 0.5 in the result-column. When I try that, however, the existing values in that column are deleted.
To insert the value, I tried this code:
import pandas as pd
df = pd.read_excel(file, engine="openpyxl")
df_index = df.index[df['alpha_loss'] == 0.5]
df.loc[df_index, 'result'] = 0.2
I’ve also tried replacing the last line with this:
df.at[df_index, 'result'] = 0.2
but get the same result.
The DataFrame I was expecting is:
| alpha | result |
|---|---|
| 0.5 | 0.2 |
| 1 | 0.89 |
| 1.5 | NaN |
Instead, this is the result:
| alpha | result |
|---|---|
| 0.5 | 0.2 |
| 1 | NaN |
| 1.5 | NaN |
What is the problem here? Why does this remove the other values in my column?
I am using pandas2.0.3 and python3.9
Edit:
It was my own fault, I mistyped the column name, but did not see it, since I was trying to write to the last column and print(df) only showed the last one. The actual result was something like this:
| alpha | result | result2 |
| ——– | ——– | ——– |
| 0.5 | NaN | 0.2 |
| 1 | 0.89 | NaN |
| 1.5 | NaN | NaN |
>Solution :
You code should work with a default range index. However, the second line is not needed. Directly use boolean indexing:
df.loc[df['alpha_loss'] == 0.5, 'result'] = 0.2
Output:
alpha result
0 0.5 0.20
1 1.0 0.89
2 1.5 NaN
Your code might fail if you have duplicated indices in the index, while boolean indexing is independent of the index.