Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Groupby merge values from one dataframe to the other

I have a dataframe of "words", and I would like to check which words were included, by each person in this dataframe by comparison to a word list key.

Example data:

df = pd.DataFrame({
    'person': [1, 1, 1, 2, 3, 4, 4, 4, 4],
    'word': ['apple', 'orange', 'pear', 'apple', 'grape', 'orange', 'apple', 'pear', 'berry'],
    'count': [1, 1, 1, 1, 1, 1, 1, 1, 1]
})

word_list = ['apple', 'orange', 'pear', 'berry', 'grape']
word_df = pd.DataFrame({'word': word_list})

For example, person 1 only included apple, orange, and pear and did not include berry and grape. I know how to pd.merge values but have not been able to successfully map the values of word_df["word"] onto something like df.groupby(["person"] so that I can check what each person chose to include.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Desired output:

result_df = pd.DataFrame({
    'person': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4],
    'word': ['apple', 'orange', 'pear', 'berry', 'grape', 
             'apple', 'orange', 'pear', 'berry', 'grape', 
             'apple', 'orange', 'pear', 'berry', 'grape', 
             'orange', 'apple', 'pear', 'berry', 'grape'],
    'count': [1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0]
})

>Solution :

If performance isn’t really a concern this looks like it does the trick.

Basically create an intermediate dataframe with all possible person and fruit combinations via cross join. Then use that as the driver, "count" from the outer table will be null if that combination was not found in the original dataframe so coalescing to 0 gets 0’s for those.

df = pd.DataFrame({
    'person': [1, 1, 1, 2, 3, 4, 4, 4, 4],
    'word': ['apple', 'orange', 'pear', 'apple', 'grape', 'orange', 'apple', 'pear', 'berry'],
    'count': [1, 1, 1, 1, 1, 1, 1, 1, 1]
})

word_list = ['apple', 'orange', 'pear', 'berry', 'grape']
all_person_word_combos = word_df.merge(df['person'].drop_duplicates(), how='cross')
final_result = (
  all_person_word_combos.
  merge(df, 
        how='left', 
        on=['word', 'person']).
  fillna(0).
  sort_values(['person','word']))
final_result.display()
word person count
apple 1 1
berry 1 0
grape 1 0
orange 1 1
pear 1 1
apple 2 1
berry 2 0
grape 2 0
orange 2 0
pear 2 0
apple 3 0
berry 3 0
grape 3 1
orange 3 0
pear 3 0
apple 4 1
berry 4 1
grape 4 0
orange 4 1
pear 4 1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading