Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python product frequently bought with

I have retail store transactional data and want to see what categories are bought together. The data is in the below format:

transaction_no product_id category
1 100012 A
1 121111 A
1 121127 B
1 121127 G
2 465222 N
2 121127 M
3 121127 F
3 121127 G
3 121127 F
4 465222 M
4 121127 N

Rules:

  1. The result should be aggregated based on unique transaction numbers.
  2. Also, the order shouldn’t matter in this case (e.g. A bought with B is the same as B bought with A).
  3. If a category is repeated within the same transaction, it should be counted as 1 only (e.g. in transaction_no = 1, category A is counted once)

Expected output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

bucket count
A, B, G 1
N, M 2
F, G 1

How do I achieve this?

>Solution :

Use GroupBy.agg for aggregate frozenset, then count values by Series.value_counts and last create DataFrame with join for strings from frozensets:

df1 = (df.groupby('transaction_no')['category']
        .agg(frozenset)
        .value_counts()
        .rename(lambda x: ', '.join(sorted(x)))
        .rename_axis('bucket')
        .reset_index(name='count'))
print (df1)
    bucket  count
0     M, N      2
1     F, G      1
2  A, B, G      1

Another idea:

df1 = (df.groupby('transaction_no')['category']
        .agg(lambda x: ', '.join(sorted(set((x)))))
        .value_counts()
        .rename_axis('bucket')
        .reset_index(name='count')
        )
print (df1)
    bucket  count
0     M, N      2
1     F, G      1
2  A, B, G      1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading