I have two dataframes:
`df_1_long = pd.DataFrame({
'company_name': ['Company A', 'Company B', 'Company C'],
'company_country': ['USA', 'Poland', 'Canada'],
'keyword': ['holding', 'services', 'source'],
'value': [1,0,1]
})`
and second df:
`df_training = pd.DataFrame({
'holding': [1, 0, 0],
'services': [0, 1, 0],
'source': [0, 0, 1],
'sector': ['Retail', 'Finance', 'Energy']
})`
columns in df_training ['holding', 'services', 'source'] are the keywords in column ‘keyword’ in df_1_long. I would like to assign a sector to df_1_long – if ‘keyword’ in df_1_long value is 1 and in df_training value is 1 for the keyword in a column -> then assign a sector from df_training.
The output should look like that:
`expected_output = pd.DataFrame({
'company_name': ['Company A', 'Company B', 'Company C'],
'company_country': ['USA', 'Poland', 'Canada'],
'keyword': ['holding', 'services', 'source'],
'value': [1,0,1],
'sector': ['Retail', 'no_sector', 'Energy']
})`
I tried this piece of code, but keep getting errors:
`merged_df = pd.merge(df_1_long, df_training, left_on='keyword', right_on=df_training.columns[:-1])
`df_1_long['sector'] = merged_df['sector'].where(merged_df['value'] == 1, np.nan)``
>Solution :
-
You can use a for loop to iterate through the rows of df_1_long
-
Use the loc function to check if the ‘keyword’ value in df_1_long matches a column in df_training and the ‘value’ in both dataframes is 1
-
Then assign the corresponding ‘sector’ value from df_training.
df_1_long[‘sector’] = ‘no_sector’
for i, row in df_1_long.iterrows():
if row[‘value’] == 1:
sector = df_training.loc[df_training[row[‘keyword’]] == 1, ‘sector’]
if not sector.empty:
df_1_long.loc[i, ‘sector’] = sector.values[0]