Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

filter dataframe of frozensets if they have a certain elemnet

I would like to filter a datframe that has association rules results. I want antecedents that contain an element like H or L in my case. The antecedents are frozenset types. I tried Hrules but it is not working.

Hrules=fdem_rules['H'  in fdem_rules['antecedents']]
Hrules=fdem_rules[frozenset({'H'})  in fdem_rules['antecedents']] 

did not work

In the following example, I need only rows 46 and 89 as they have H.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = pd.DataFrame({'antecedents': [frozenset({'N', 'M', '60'}), frozenset({'H', 'AorE'}), frozenset({'0-35', 'H', 'AorE', '60'}), frozenset({'AorE', 'M', '60', '0'}), frozenset({'0-35', 'F'})]})
             antecedents
75            (N, M, 60)
46             (H, AorE)
89   (0-35, H, AorE, 60)
103     (AorE, M, 60, 0)
38             (0-35, F)

>Solution :

set/frozenset methods

You can use apply with set/frozenset’s method. Here to check is at least H or L is present, one can use the negation of {'H', 'L'}.isdisjoint:

match = {'H', 'L'}
df['H or L'] = ~df['antecedents'].apply(match.isdisjoint)

A much faster variant of the above is to use a list comprehension:

match = {'H', 'L'}
df['H or L'] = [not match.isdisjoint(x) for x in df['antecedents']]
explode+isin+aggregate

Another option is to explode the frozenset, use isin, and aggregate the result with groupby+any:

match = {'H', 'L'}
df['H or L'] = df['antecedents'].explode().isin(match).groupby(level=0).any()

output:

>>> df[['antecedents', 'H or L']]
             antecedents  H or L
75            (N, M, 60)   False
46             (H, AorE)    True
89   (0-35, H, AorE, 60)    True
103     (AorE, M, 60, 0)   False
38             (0-35, F)   False
slicing matching rows
match = {'H', 'L'}
idx = [not match.isdisjoint(x) for x in df['antecedents']]
df[idx]

output:

            antecedents consequents other_cols
46            (H, AorE)         (N)        ...
89  (0-35, H, AorE, 60)         (0)        ...

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading