I’m a python beginner and I’m trying to do some operations with dataframes that I usually do with R language.
I Have a large dataframe with 2592 rows and 205 columns and I want to replace the 0.0 values by half the minimum value of its column.
An example with a random dataframe would be:
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.randint(0,10, size=(3,5)), columns = ['A', 'B', 'C', 'D', 'E'])
>>> print(df)
A B C D E
0 5 8 9 5 0
1 0 1 7 6 9
2 2 4 5 2 4
And the result I’m looking for is:
A B C D E
0 5 8 9 5 2
1 1 1 7 6 9
2 2 4 5 2 4
Intuitively I would do it like this:
>>> for column in df:
for element in column:
if element == 0:
element = df[column].min()/2
But it doesn’t work… any help?
Thank you!
>Solution :
Use DataFrame.mask with replace minimum values without 0 divide by 2:
df1 = df.mask(df.eq(0), df.replace(0, np.nan).min().div(2), axis=1)
print(df1)
A B C D E
0 5 8 9 5 2
1 1 1 7 6 9
2 2 4 5 2 4
For more efficient solution is possible use (thanks @mozway):
m = df.eq(0)
df1 = df.mask(m, df[~m].min().div(2), axis=1)