Home Check if a value is present in the row and extract the name of column – Pandas

Questions

Check if a value is present in the row and extract the name of column – Pandas

July 7, 2022

I have a dataframe as follows:

df = 

        A          col_1     col_45    col_9    col_10
        
1.0     4.0        45.0      NaN       34.9     NaN
2.0     4.0        NaN       NaN       23.4     45.6      
3.0     49.2       10.8      NaN       NaN      23.8

For every row in the dataframe, I want to check if there is any value present for col_1, col_45, col_9 and col_10. If there is a value present, I want to get the number for the name of the column and put in a list/array. For example.

In the first row,there are values present for col_1 and col_9. So I want to get the numbers 1 and 9 from the column names and put in a list/array. This should be done for every row.

Is there an easy way to do this?

>Solution :

For column names, you could use a stack (to get rid of all NaN automatically), then a GroupBy.agg to form the lists:

out = (df.set_index('A', append=True)
         .stack().reset_index(-1)
         .groupby(level=0)['level_2'].agg(list)
       )

output:

1.0     [col_1, col_9]
2.0    [col_9, col_10]
3.0    [col_1, col_10]
Name: level_2, dtype: object

For digits:

out = (df.set_index('A', append=True)
         .stack().reset_index(-1)
         ['level_2'].str.extract('_(\d+)', expand=False).astype(int)
         .groupby(level=0).agg(list)
       )

output:

1.0     [1, 9]
2.0    [9, 10]
3.0    [1, 10]
Name: level_2, dtype: object

Alternative

Other approach, you can use a mask and a matrix multiplication:

cols = df.filter(like='col').columns
# Index(['col_1', 'col_45', 'col_9', 'col_10'], dtype='object')
int_cols = cols.str.extract('_(\d+)$', expand=False).astype(int)
# Int64Index([1, 45, 9, 10], dtype='int64')

m = df[cols].notna()
#      col_1  col_45  col_9  col_10
# 1.0   True   False   True   False
# 2.0  False   False   True    True
# 3.0   True   False  False    True

out = (m.astype(int).mul(int_cols).where(m).convert_dtypes()
        .stack().groupby(level=0).agg(list)
      )
# 1.0     [1, 9]
# 2.0    [9, 10]
# 3.0    [1, 10]
# dtype: object