Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to create 0/1 DataFrame based on two other DataFrame with conditions?

I try to create a DataFrame df3 based on two other DataFrames df1 and df2. If the value in df1 is bigger or equal the value in df2, then I want to assign a 1 in df3 and otherwise a 0.

This would be a sample data for illustration:

df1
    Date        01K  02K  03K   04K
0   2021-01-01  NaN  3.5  4.2   NaN
1   2021-01-02  2.3  0.1  5.2   2.6
2   2021-01-03  0.3  NaN  2.5   8.2
3   2021-01-04  0.4  NaN  3.0   4.2

df2
    Date        01K  02K  03K   04K
0   2021-01-01  NaN  NaN  NaN   NaN
1   2021-01-02  2.5  0.6  5.8   2.3
2   2021-01-03  0.2  NaN  2.5   8.1
3   2021-01-04  0.3  NaN  2.8   4.2

df3
    Date        01K  02K  03K   04K
0   2021-01-01  0    0    0     0
1   2021-01-02  0    0    0     1
2   2021-01-03  1    0    1     1
3   2021-01-04  1    0    1     1

For reproducibility:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K':['NaN', 2.3, 0.3, 0.4], 
    '02K':[3.5, 0.1, 'NaN', 'NaN'], 
    '03K':[4.2, 5.2, 2.5, 3.0], 
    '04K':['NaN', 2.6, 8.2, 4.2]}) 
df1 = df1.replace('NaN',np.nan)

df2 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K':['NaN', 2.5, 0.2, 0.3], 
    '02K':['NaN', 0.6, 'NaN', 'NaN'], 
    '03K':['NaN', 5.8, 2.5, 2.8], 
    '04K':['NaN', 2.3, 8.1, 4.2]}) 
df2 = df2.replace('NaN',np.nan)

I tried to do it with np.where(), but I get an array as output.

df3 = np.where(df1>=df2, 1, 0)

array([[1, 0, 0, 0, 0],
       [1, 0, 0, 0, 1],
       [1, 1, 0, 1, 1],
       [1, 1, 0, 1, 1]])

Is there an efficient way to create df3 with the same structure as the other DataFrames for a big dataset?

Thanks a lot!

>Solution :

Convert Date to DatetimeIndex in both, so possible compare with casting ouput to integers for map True, Falses to 1, 0:

df1 = df1.set_index('Date')
df2 = df2.set_index('Date')

df3 = (df1>=df2).astype(int)

If need instead 1 using -1 use:

df3 = (df1>=df2).astype(int).mul(-1)

df3 = pd.DataFrame(np.where(df1>=df2, -1, 0), 
                  index=df1.index, 
                  columns=df1.columns)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading