Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Efficiently add a value to a new column in a large DataFrame

I have two dataframes, adv_text with about 9,000 rows and events with over 900,000 rows. events is essentially an expanded version of adv_text with about 100 rows per row in adv_text. I want to add three columns from adv_text to events.

The following code is a partial addition of one column.

events_x = events.head(30000).copy()

def add_date(game_id):
    date = adv_text[adv_text['id_odsp'] == game_id]['date']
    return(date.iloc[0])

events_x['date'] = events_x['id_odsp'].apply(add_date)

This test code takes almost 25 seconds for 30,000 rows. At this speed, adding all three columns over the full dataframe will take nearly 40 minutes. Is this typical? Is there a faster way to accomplish this task?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

IIUC, one way is to use merge:

events_x['date'] = events_x.merge(adv_text[['id_odsp', 'date']], on='id_odsp')['date']

More information: Pandas Merging 101

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading