df = pd.DataFrame({
'subsegment': ['corp', np.nan, 'terr'],
'region': ['japan', np.nan, np.nan],
'subregion': [np.nan, 'se', 'ne'],
'segment': [np.nan,'ent','comm']
})
I am trying to iterate through the above dataframe and if the value is not NaN than adding the column header as the value or part of the value (depending on how many NaNs) in the new column "Mode".
Original DF
| subsegment | region | subregion | segment |
|---|---|---|---|
| corp | japan | NaN | NaN |
| NaN | NaN | se | ent |
| terr | NaN | ne | comm |
Desired Output DF
| subsegment | region | subregion | segment | mode |
|---|---|---|---|---|
| corp | japan | NaN | NaN | subsegment-region |
| NaN | NaN | se | ent | subregion-segment |
| terr | NaN | ne | comm | subsegment-subregion-segment |
I have tried to create separate smaller dfs with all the combinations of the columns to which are not null and then concatenating those dfs together but this seems extremely inefficient.
df1 = df.loc[~(df['subsegment'].isna()) & (~df['region'].isna()) & (~df['region'].isna())]
df2 = df.loc[~(df['region'].isna()) & (~df['subregion'].isna()) & (~df['segment'].isna())]
df3 = df.loc[~(df['subsegment'].isna()) & (~df['subregion'].isna()) & (~df['segment'].isna())]
pd.concat(df1,df2,df3.....)
>Solution :
You can use the below code. I’m using apply with a custom function that just finds the columns that are not null
from itertools import compress
def temp_func(x, cols):
list_a = (~x.isna()).to_list()
ret = list(compress(cols, list_a))
return '-'.join(ret)
df['mode']= df.apply(lambda x: temp_func(x, list(df.columns)), axis=1)
print(df)
Ouput:
subsegment region subregion segment mode
0 corp japan NaN NaN subsegment-region
1 NaN NaN se ent subregion-segment
2 terr NaN ne comm subsegment-subregion-segment