Home Create binary pandas DataFrame based on two DataFrames with different number of columns

Questions

Create binary pandas DataFrame based on two DataFrames with different number of columns

December 8, 2021

I have two DataFrames df1 and df2 where df2 has only one column and I try to create df3 based on the other two data sets. If both DataFrame columns have a value >0, I try to get a one, otherwise a zero.

df1:
            01K  02K  03K   04K
Date                
2021-01-01  NaN  3.5  4.2   NaN
2021-01-02  -2.3 -0.1 5.2   2.6
2021-01-03  0.3  NaN  -2.5  8.2
2021-01-04  -0.4 NaN  3.0   -4.2

df2:
            XX
Date    
2021-01-01  NaN
2021-01-02  2.5
2021-01-03  -0.2
2021-01-04  0.3

df3:
            01K  02K  03K   04K
Date                
2021-01-01  0    0    0     0
2021-01-02  0    0    1     1
2021-01-03  0    0    0     0
2021-01-04  0    0    1     0

For reproducibility:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K':['NaN', -2.3, 0.3, -0.4], 
    '02K':[3.5, -0.1, 'NaN', 'NaN'], 
    '03K':[4.2, 5.2, -2.5, 3.0], 
    '04K':['NaN', 2.6, 8.2, -4.2]}) 
df1 = df1.set_index('Date')
df1 = df1.replace('NaN',np.nan)

df2 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'XX':['NaN', 2.5, -0.2, 0.3]}) 
df2 = df2.set_index('Date')
df2 = df2.replace('NaN',np.nan)

I don’t know how to assign the condition so that the comparison is possible between two DataFrames with different number of columns.

I tried it with (but this assumes same dimensions):

df3 = ((df1 > 0) & (df2 > 0)).astype(int)

Thanks a lot!

>Solution :

Use DataFrame.mul for multiple first DataFrame with Series:

df = (df1 > 0).astype(int).mul((df2.iloc[:, 0] > 0).astype(int), axis=0)
print (df)
            01K  02K  03K  04K
Date                          
2021-01-01    0    0    0    0
2021-01-02    0    0    1    1
2021-01-03    0    0    0    0
2021-01-04    0    0    1    0

Or boroadcasting:

df = ((df1 > 0) & (df2.iloc[:, [0]].to_numpy() > 0)).astype(int)
print (df)
            01K  02K  03K  04K
Date                          
2021-01-01    0    0    0    0
2021-01-02    0    0    1    1
2021-01-03    0    0    0    0
2021-01-04    0    0    1    0