Suppose I have a dataframe, where every 5th row is filled with NAs, like this:
Number | A | B | C |
---|---|---|---|
4 | 2 | 1 | 0 |
5 | NA | NA | NA |
6 | 5 | 6 | 7 |
100 | 3 | 5 | 2 |
I want to replace the NA values you can find in the 5th row with the values in the 3rd row.
What I’ve tried so far:
df.loc[df['Number'] % 5 == 0, ['A','B','C']] = df.loc[df['Number'] % 3 == 0, ['A','B','C']]
This approach does not throws any error, but it does nothing. Howewer, if I replace the right side of the assignment with a constant, it works fine, like this:
df.loc[df['Number'] % 5 == 0, ['A','B','C']] = 3
Any idea on how to make it work?
EDIT:
I did not want to give the full original task, howewer this dummy example is a little bit wrong. So let’s assume that the right side has as many rows as the left side. The main objective is still, to replace every 5th row’s values with a subset of the dataframe based on conditions. I hope it is a little bit more understandable, and sorry for the miscralification.
>Solution :
Your indexes are misaligned that’s why it doesn’t work:
# Left hand side
>>> df.loc[df['Number'] % 5 == 0, ['A','B','C']]
A B C
1 NA NA NaN
3 3 5 2.0
# Right hand side
>>> df.loc[df['Number'] % 3 == 0, ['A','B','C']]
A B C
2 5 6 7.0
Pandas can’t align index between (1, 3) and 2. The solution is to convert the right hand side to numpy array with .values
or .to_numpy()
. However if you do this, you also change the fourth row.
Update:
So let’s assume that the right side has as many rows as the left side
In this case, use:
df.loc[df['Number'] % 5 == 0, ['A','B','C']] = \
df.loc[df['Number'] % 3 == 0, ['A','B','C']].values
# Before (I slightly modified your example)
>>> df
Number A B C
0 3 2.0 1.0 0.0
1 5 NaN NaN NaN
2 6 5.0 6.0 7.0
3 100 NaN NaN NaN
# After
>>> df
Number A B C
0 3 2.0 1.0 0.0
1 5 2.0 1.0 0.0
2 6 5.0 6.0 7.0
3 100 5.0 6.0 7.0