Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

map keywords value to add extra column to long dataframe

I have two dataframes:

`df_1_long = pd.DataFrame({
'company_name': ['Company A', 'Company B', 'Company C'],
'company_country': ['USA', 'Poland', 'Canada'],
'keyword': ['holding', 'services', 'source'],
'value': [1,0,1]
})`

and second df:

`df_training = pd.DataFrame({
 'holding': [1, 0, 0],
 'services': [0, 1, 0],
 'source': [0, 0, 1],
 'sector': ['Retail', 'Finance', 'Energy']
 })`

columns in df_training ['holding', 'services', 'source'] are the keywords in column ‘keyword’ in df_1_long. I would like to assign a sector to df_1_long – if ‘keyword’ in df_1_long value is 1 and in df_training value is 1 for the keyword in a column -> then assign a sector from df_training.
The output should look like that:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

`expected_output = pd.DataFrame({
'company_name': ['Company A', 'Company B', 'Company C'],
'company_country': ['USA', 'Poland', 'Canada'],
'keyword': ['holding', 'services', 'source'],
'value': [1,0,1],
'sector': ['Retail', 'no_sector', 'Energy']
})`

I tried this piece of code, but keep getting errors:

`merged_df = pd.merge(df_1_long, df_training, left_on='keyword',        right_on=df_training.columns[:-1])

`df_1_long['sector'] = merged_df['sector'].where(merged_df['value'] == 1, np.nan)``

>Solution :

  1. You can use a for loop to iterate through the rows of df_1_long

  2. Use the loc function to check if the ‘keyword’ value in df_1_long matches a column in df_training and the ‘value’ in both dataframes is 1

  3. Then assign the corresponding ‘sector’ value from df_training.

    df_1_long[‘sector’] = ‘no_sector’
    for i, row in df_1_long.iterrows():
    if row[‘value’] == 1:
    sector = df_training.loc[df_training[row[‘keyword’]] == 1, ‘sector’]
    if not sector.empty:
    df_1_long.loc[i, ‘sector’] = sector.values[0]

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading