I have the following DataFrame:
| Index | Time Lost | Cause 1 | Cause 2 | Cause 3 |
|---|---|---|---|---|
| 0 | 40 | x | Nan | Nan |
| 1 | 15 | Nan | x | Nan |
| 2 | 65 | x | Nan | Nan |
| 3 | 10 | Nan | Nan | x |
There is only one "X" per row which identifies the cause of the time lost column. I am trying to iterate through each row (and column) to determine which column holds the "X". I would then like to add a "Type" column with the name of the column header that was True for each row. This is what I would like as a result:
| Index | Time Lost | Cause 1 | Cause 2 | Cause 3 | Type |
|---|---|---|---|---|---|
| 0 | 40 | x | Nan | Nan | Cause 1 |
| 1 | 15 | Nan | x | Nan | Cause 2 |
| 2 | 65 | x | Nan | Nan | Cause 1 |
| 3 | 10 | Nan | Nan | x | Cause 3 |
Currently my code looks like this, I am trying to iterate through the DataFrame. However, I’m not sure if there is a function or non-iterative approach to assign the proper value to the "Type" column:
cols = ['Cause1', 'Cause 2', 'Cause 3']
for index, row in df.iterrows():
for col in cols:
if df.loc[index,col] =='X':
df.loc[index,'Type'] = col
continue
else:
df.loc[index,'Type'] = 'Other'
continue
The issue I get with this code is that it iterates but only identifies rows with the last item in the cols list and the remainder go to "Other".
Any help is appreciated!
>Solution :
You could use idxmax on the boolean array of your data:
df['Type'] = df.drop('Time Lost', axis=1).eq('x').idxmax(axis=1)
Note that this only report the first cause if several
output:
Time Lost Cause 1 Cause 2 Cause 3 Type
0 40 x Nan Nan Cause 1
1 15 Nan x Nan Cause 2
2 65 x Nan Nan Cause 1
3 10 Nan Nan x Cause 3