I have two dataframes,
adv_text with about 9,000 rows and
events with over 900,000 rows.
events is essentially an expanded version of
adv_text with about 100 rows per row in
adv_text. I want to add three columns from
The following code is a partial addition of one column.
events_x = events.head(30000).copy() def add_date(game_id): date = adv_text[adv_text['id_odsp'] == game_id]['date'] return(date.iloc) events_x['date'] = events_x['id_odsp'].apply(add_date)
This test code takes almost 25 seconds for 30,000 rows. At this speed, adding all three columns over the full dataframe will take nearly 40 minutes. Is this typical? Is there a faster way to accomplish this task?
IIUC, one way is to use
events_x['date'] = events_x.merge(adv_text[['id_odsp', 'date']], on='id_odsp')['date']
More information: Pandas Merging 101