Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create a column aggregating column names based on boolean state of a row in Python Pandas

I am attempting to scan a Pandas dataframe and identify all the columns that are True for a particular row and aggregate the column names in the output column.

Given a sample input as follows:

a       b       c
True    False   True
True    False   False

The goal is to accomplish the following output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

a       b       c     output
True    False   True  ['a', 'c']
True    False   False ['a']

My attempt here was to use np.where as exemplified below but this is rather inefficient to scale.

df['output_1'] = np.where(df['a']==True, 'a', '')
df['output_2'] = np.where(df['b']==True, 'b', '')
df['output_3'] = np.where(df['c']==True, 'c', '')
df['output'] = df['output_1'] + df['output_2'] + df['output_3']

>Solution :

You can use melt

Data:

data = {'a': {0: True, 1: True}, 'b': {0: False, 1: False}, 'c': {0: True, 1: False}}
df = pd.DataFrame(data=data)

Code:

df["output"] = (pd
                .melt(frame=df.reset_index(), id_vars=["index"])
                .query(expr="value.eq(True)")
                .groupby(by="index")
                .variable.apply(func=list)
                )

print(df)

Result:

      a      b      c  output
0  True  False   True  [a, c]
1  True  False  False     [a]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading