I have a data frame with a column representing phases (‘A’, ‘B’, and ‘C’). I need to slice the data frame so I have only column ‘A’, divide it by 2 and add the value to the other column.
Here is an example:
import pandas as pd
data = {'time': [1,2,1,2,1,2,],
'phase': ['A', 'A', 'B', 'B', 'C', 'C' ],
'value': [2, 3, 4, 5, 6, 7]}
df = pd.DataFrame(data)
print(df)
time phase value
0 1 A 2
1 2 A 3
2 1 B 4
3 2 B 5
4 1 C 6
5 2 C 7
slice_A = df.loc[df['phase']== 'A', 'value'] /2
print(slice_A)
0 1.0
1 1.5
Name: value, dtype: float64
df.loc[df['phase']=='B', 'value'] += slice_A
df
time phase value
0 1 A 2.0
1 2 A 3.0
2 1 B NaN
3 2 B NaN
4 1 C 6.0
5 2 C 7.0
I understand this is because the index of slice_A is not the same as:
df.loc[df['phase']=='B', 'value']
I tried to reset the series with the index of the sliced data frame. I also tried to work with data frames instead of series, but I couldn’t get it to work. But, I am still getting Nan values.
>Solution :
Convert your series into numpy array to avoid index alignment:
df.loc[df['phase']=='B', 'value'] += slice_A.to_numpy()
print(df)
# Output
time phase value
0 1 A 2.0
1 2 A 3.0
2 1 B 5.0
3 2 B 6.5
4 1 C 6.0
5 2 C 7.0
Obviously it works because your have as many A as B. You can add to B:
- a scalar value like
3 - an array of one element
[3] - an array of the same shape (here
(2,)) - a series with the same index