Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

In Pandas, how can I replace a subset of the dataframe with another subset of the dataframe?

Suppose I have a dataframe, where every 5th row is filled with NAs, like this:

Number A B C
4 2 1 0
5 NA NA NA
6 5 6 7
100 3 5 2

I want to replace the NA values you can find in the 5th row with the values in the 3rd row.

What I’ve tried so far:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.loc[df['Number'] % 5 == 0, ['A','B','C']] = df.loc[df['Number'] % 3 == 0, ['A','B','C']]

This approach does not throws any error, but it does nothing. Howewer, if I replace the right side of the assignment with a constant, it works fine, like this:

df.loc[df['Number'] % 5 == 0, ['A','B','C']] = 3

Any idea on how to make it work?

EDIT:

I did not want to give the full original task, howewer this dummy example is a little bit wrong. So let’s assume that the right side has as many rows as the left side. The main objective is still, to replace every 5th row’s values with a subset of the dataframe based on conditions. I hope it is a little bit more understandable, and sorry for the miscralification.

>Solution :

Your indexes are misaligned that’s why it doesn’t work:

# Left hand side
>>> df.loc[df['Number'] % 5 == 0, ['A','B','C']]
     A    B    C
1  NA   NA   NaN
3   3    5   2.0

# Right hand side
>>> df.loc[df['Number'] % 3 == 0, ['A','B','C']]
    A   B    C
2  5   6   7.0

Pandas can’t align index between (1, 3) and 2. The solution is to convert the right hand side to numpy array with .values or .to_numpy(). However if you do this, you also change the fourth row.

Update:

So let’s assume that the right side has as many rows as the left side

In this case, use:

df.loc[df['Number'] % 5 == 0, ['A','B','C']] = \
    df.loc[df['Number'] % 3 == 0, ['A','B','C']].values
# Before (I slightly modified your example)
>>> df
   Number    A    B    C
0       3  2.0  1.0  0.0
1       5  NaN  NaN  NaN
2       6  5.0  6.0  7.0
3     100  NaN  NaN  NaN

# After
>>> df
   Number    A    B    C
0       3  2.0  1.0  0.0
1       5  2.0  1.0  0.0
2       6  5.0  6.0  7.0
3     100  5.0  6.0  7.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading