I have the following dataframe:
df = pd.DataFrame(data={'flag': ['col3', 'col2', 'col2'],
'col1': [1, 3, 2],
'col2': [5, 2, 4],
'col3': [6, 3, 6],
'col4': [0, 4, 4]},
index=pd.Series(['A', 'B', 'C'], name='index'))
| index | flag | col1 | col2 | col3 | col4 |
|---|---|---|---|---|---|
| A | col3 | 1 | 5 | 6 | 0 |
| B | col2 | 3 | 2 | 3 | 4 |
| C | col2 | 2 | 4 | 6 | 4 |
For each row, I want to get the value when column name is equal to the flag.
| index | flag | col1 | col2 | col3 | col4 | col_val |
|---|---|---|---|---|---|---|
| A | col3 | 1 | 5 | 6 | 0 | 6 |
| B | col2 | 3 | 2 | 3 | 4 | 2 |
| C | col2 | 2 | 4 | 6 | 4 | 4 |
– Index A has a flag of col3. So col_val should be 6 because df['col3'] for that row is 6.
– Index B has a flag of col2. So col_val should be 2 because df['col2'] for that row is 2.
– Index C has a flag of col2. So col_val should be 4 because df['col2'] for that row is 3.
>Solution :
Per this page:
idx, cols = pd.factorize(df['flag'])
df['COl_VAL'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
Output:
>>> df
flag col1 col2 col3 col4 COl_VAL
index
A col3 1 5 6 0 6
B col2 3 2 3 4 2
C col2 2 4 6 4 4