Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace specific values in a dataframe by column mean in pandas

I’m a python beginner and I’m trying to do some operations with dataframes that I usually do with R language.

I Have a large dataframe with 2592 rows and 205 columns and I want to replace the 0.0 values by half the minimum value of its column.

An example with a random dataframe would be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.randint(0,10, size=(3,5)), columns = ['A', 'B', 'C', 'D', 'E'])
>>> print(df)
   A  B  C  D  E
0  5  8  9  5  0
1  0  1  7  6  9
2  2  4  5  2  4

And the result I’m looking for is:

   A  B  C  D  E
0  5  8  9  5  2
1  1  1  7  6  9
2  2  4  5  2  4

Intuitively I would do it like this:

>>> for column in df:
        for element in column:
            if element == 0:
                element = df[column].min()/2

But it doesn’t work… any help?

Thank you!

>Solution :

Use DataFrame.mask with replace minimum values without 0 divide by 2:

df1 = df.mask(df.eq(0), df.replace(0, np.nan).min().div(2), axis=1)
print(df1)
   A  B  C  D  E
0  5  8  9  5  2
1  1  1  7  6  9
2  2  4  5  2  4

For more efficient solution is possible use (thanks @mozway):

m = df.eq(0) 
df1 = df.mask(m, df[~m].min().div(2), axis=1)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading