Pick a column value based on column index stored in another column (Pandas)

May 20, 2022

Let’s say we have four columns:
Column1, Column2, Column3, ind

import pandas as pd


tbl = {
        'Column1':['Spark',10000,'Python','35days'],
        'Column2' :[500,'PySpark',22000,30000],
        'Column3':['30days','40days','35days','pandas'],
        'ind':[1,2,1,3]
        }
df = pd.DataFrame(tbl)

Does anyone know is there a way to add a new column without loop that will gather values from first 3 columns based on index stored in ‘ind’ column?

‘Course’:[‘Spark’,’PySpark’,’Python’,’pandas’]

I’ve tried some combinations with iloc, lambda and apply but failed.

Expected output:

  Column1  Column2 Column3  ind   Course
0   Spark      500  30days    1    Spark
1   10000  PySpark  40days    2  PySpark
2  Python    22000  35days    1   Python
3  35days    30000  pandas    3   pandas

>Solution :

IIUC, you can try apply on rows

df['Course'] = df.apply(lambda row: row.iloc[row['ind']-1], axis=1)

Or you can try

df['Course'] = df.values[np.arange(len(df['ind'])), df['ind'].sub(1)]

print(df)

  Column1  Column2 Column3  ind   Course
0   Spark      500  30days    1    Spark
1   10000  PySpark  40days    2  PySpark
2  Python    22000  35days    1   Python
3  35days    30000  pandas    3   pandas