I am attempting to scan a Pandas dataframe and identify all the columns that are True for a particular row and aggregate the column names in the output column.
Given a sample input as follows:
a b c
True False True
True False False
The goal is to accomplish the following output:
a b c output
True False True ['a', 'c']
True False False ['a']
My attempt here was to use np.where as exemplified below but this is rather inefficient to scale.
df['output_1'] = np.where(df['a']==True, 'a', '')
df['output_2'] = np.where(df['b']==True, 'b', '')
df['output_3'] = np.where(df['c']==True, 'c', '')
df['output'] = df['output_1'] + df['output_2'] + df['output_3']
>Solution :
You can use melt
Data:
data = {'a': {0: True, 1: True}, 'b': {0: False, 1: False}, 'c': {0: True, 1: False}}
df = pd.DataFrame(data=data)
Code:
df["output"] = (pd
.melt(frame=df.reset_index(), id_vars=["index"])
.query(expr="value.eq(True)")
.groupby(by="index")
.variable.apply(func=list)
)
print(df)
Result:
a b c output
0 True False True [a, c]
1 True False False [a]