Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to handle unordered sets of tags in Python Pandas Dataframe in terms of merging

I have two Dataframes which contain tags in one of the columns. Tags can be present or absent in the list and are stored in an unordered manner. I would like to get the merge of the two datasets with rows that have the same tags. In plain python I can store them as the set datatype.

tags1 = {"tag1","tag2"}
tags2 = {"tag2","tag1"}
tags1 == tags2
True

However I cant do the same in Pandas since sets are not hashable.

df1 = pd.DataFrame([{'name': 'foo', 'tags': {"tag1","tag2"}}, {'name': 'bar', 'tags': {"tag3","tag4"}}])
df2 = pd.DataFrame([{'name': 'baz', 'tags': {"tag4","tag3"}}, {'name': 'qux', 'tags': {"tag5","tag6"}}])
df1.merge(df2, how="inner", indicator=True, on="tags")

What I would like to see

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

     name_x  value           name_y _merge
0    bar     {"tag3","tag4"} baz    both

What I get

TypeError: unhashable type: 'set'

What would be the best approach here? Unfortunatelly I cant define a fixed amount of columns since the quantity of the tags is not fixed.

>Solution :

Use frozensets and assign to left_on and right_on parameters if need original sets columns:

out = (df1.merge(df2, how="inner", indicator=True, 
                left_on=df1['tags'].apply(frozenset),
                right_on=df2['tags'].apply(frozenset)))
print (out)
          key_0 name_x        tags_x name_y        tags_y _merge
0  (tag3, tag4)    bar  {tag3, tag4}    baz  {tag3, tag4}   both

Or assign frozensets to tags columns:

out = (df1.assign(tags=df1['tags'].apply(frozenset))
          .merge(df2.assign(tags=df2['tags'].apply(frozenset)), 
                how="inner", 
                indicator=True, 
                on="tags"))
print (out)
  name_x          tags name_y _merge
0    bar  (tag3, tag4)    baz   both
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading