How to intersect values over multiple columns in pandas

September 22, 2022

Please find below my input/output (desired) :

INPTUT :

  Col1 Col2 Col3 Col4 Col5
0    A  NaN  NaN  NaN  NaN
1  NaN    B    C  NaN  NaN
2  NaN  NaN  NaN    D  NaN
3  NaN    E  NaN  NaN  NaN
4  NaN  NaN  NaN  NaN    F

OUTPUT (desired) :

  Col1 Col2 Col3 Col4 Col5 Intersection
0    A  NaN  NaN  NaN  NaN            A
1  NaN    B    C  NaN  NaN       [B, C]
2  NaN  NaN  NaN    D  NaN            D
3  NaN    E  NaN  NaN  NaN            E
4  NaN  NaN  NaN  NaN    F            F

I’m trying to add a column Intersection that will hold the intersection of all the columns.

My dataset looks exactly like this dataframe (except that he has thousands of rows and dozens of columns) :

import pandas as pd
import numpy as np

data = {'Col1': ['A', np.nan, np.nan, np.nan, np.nan],
        'Col2': [np.nan, 'B', np.nan, 'E', np.nan],
        'Col3': [np.nan, 'C', np.nan, np.nan, np.nan],
        'Col4': [np.nan, np.nan, 'D', np.nan, np.nan],
        'Col5': [np.nan, np.nan, np.nan, np.nan, 'F']
       }

df = pd.DataFrame(data)

Do you have any suggestions, please ?

>Solution :

Use DataFrame.stack for reshape with remove missing values and aggregate custom function for avoid one element lists:

f = lambda x: list(x) if len(x) > 1 else x
df['Intersection'] = df.stack().groupby(level=0).agg(f)

print (df)
  Col1 Col2 Col3 Col4 Col5 Intersection
0    A  NaN  NaN  NaN  NaN            A
1  NaN    B    C  NaN  NaN       [B, C]
2  NaN  NaN  NaN    D  NaN            D
3  NaN    E  NaN  NaN  NaN            E
4  NaN  NaN  NaN  NaN    F            F

Or if one element list should be instead scalars use:

df['Intersection'] = df.stack().groupby(level=0).agg(list)

print (df)
  Col1 Col2 Col3 Col4 Col5 Intersection
0    A  NaN  NaN  NaN  NaN          [A]
1  NaN    B    C  NaN  NaN       [B, C]
2  NaN  NaN  NaN    D  NaN          [D]
3  NaN    E  NaN  NaN  NaN          [E]
4  NaN  NaN  NaN  NaN    F          [F]