I have two DataFrames, both have the same columns but one is for historic data and the other for ‘new’ data. New data may sometimes contain info that is already in historic data. So I want to say if the value of ‘comment_id’ in new data is already present in historic data, no nothing. Else, add that row to historic data.
I tried doing this:
historic_comments = [x for x in filtered_comments if filtered_comments['comment_id'] not in historic_comments['comment_id']]
But got error:
TypeError: unhashable type: ‘Series’
>Solution :
Use boolean mask and isin:
m = ~filtered_comments['comment_id'].isin(historic_comments['comment_id'])
out = pd.concat([historic_comments, filtered_comments[m]], axis=0, ignore_index=True)
Output:
>>> out # new historic_comments dataframe
comment_id
0 bonjour
1 hello
2 world
3 new
>>> filtered_comments
comment_id
0 hello
1 new
2 world
>>> historic_comments
comment_id
0 bonjour
1 hello
2 world