Let’s say we have four columns:
Column1, Column2, Column3, ind
import pandas as pd
tbl = {
'Column1':['Spark',10000,'Python','35days'],
'Column2' :[500,'PySpark',22000,30000],
'Column3':['30days','40days','35days','pandas'],
'ind':[1,2,1,3]
}
df = pd.DataFrame(tbl)
Does anyone know is there a way to add a new column without loop that will gather values from first 3 columns based on index stored in ‘ind’ column?
‘Course’:[‘Spark’,’PySpark’,’Python’,’pandas’]
I’ve tried some combinations with iloc, lambda and apply but failed.
Expected output:
Column1 Column2 Column3 ind Course
0 Spark 500 30days 1 Spark
1 10000 PySpark 40days 2 PySpark
2 Python 22000 35days 1 Python
3 35days 30000 pandas 3 pandas
>Solution :
IIUC, you can try apply on rows
df['Course'] = df.apply(lambda row: row.iloc[row['ind']-1], axis=1)
Or you can try
df['Course'] = df.values[np.arange(len(df['ind'])), df['ind'].sub(1)]
print(df)
Column1 Column2 Column3 ind Course
0 Spark 500 30days 1 Spark
1 10000 PySpark 40days 2 PySpark
2 Python 22000 35days 1 Python
3 35days 30000 pandas 3 pandas