Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Check if a value is present in the row and extract the name of column – Pandas

I have a dataframe as follows:

df = 

        A          col_1     col_45    col_9    col_10
        
1.0     4.0        45.0      NaN       34.9     NaN
2.0     4.0        NaN       NaN       23.4     45.6      
3.0     49.2       10.8      NaN       NaN      23.8

For every row in the dataframe, I want to check if there is any value present for col_1, col_45, col_9 and col_10. If there is a value present, I want to get the number for the name of the column and put in a list/array. For example.

In the first row,there are values present for col_1 and col_9. So I want to get the numbers 1 and 9 from the column names and put in a list/array. This should be done for every row.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Is there an easy way to do this?

>Solution :

For column names, you could use a stack (to get rid of all NaN automatically), then a GroupBy.agg to form the lists:

out = (df.set_index('A', append=True)
         .stack().reset_index(-1)
         .groupby(level=0)['level_2'].agg(list)
       )

output:

1.0     [col_1, col_9]
2.0    [col_9, col_10]
3.0    [col_1, col_10]
Name: level_2, dtype: object

For digits:

out = (df.set_index('A', append=True)
         .stack().reset_index(-1)
         ['level_2'].str.extract('_(\d+)', expand=False).astype(int)
         .groupby(level=0).agg(list)
       )

output:

1.0     [1, 9]
2.0    [9, 10]
3.0    [1, 10]
Name: level_2, dtype: object
Alternative

Other approach, you can use a mask and a matrix multiplication:

cols = df.filter(like='col').columns
# Index(['col_1', 'col_45', 'col_9', 'col_10'], dtype='object')
int_cols = cols.str.extract('_(\d+)$', expand=False).astype(int)
# Int64Index([1, 45, 9, 10], dtype='int64')

m = df[cols].notna()
#      col_1  col_45  col_9  col_10
# 1.0   True   False   True   False
# 2.0  False   False   True    True
# 3.0   True   False  False    True

out = (m.astype(int).mul(int_cols).where(m).convert_dtypes()
        .stack().groupby(level=0).agg(list)
      )
# 1.0     [1, 9]
# 2.0    [9, 10]
# 3.0    [1, 10]
# dtype: object
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading