Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

unique value counts of products purchased by the customer in previous year

I am building a function that creates an aggregate table with certain variables developed from df. I think I got most things right, but I am struggling understanding how to count a number of distinct products bought by the customer in previous year.

Example of my df:

df = pd.DataFrame({'year': [2020, 2021, 2021, 2021, 2022],
                   'id': [3, 1, 1, 2, 1],
                   'price': [12, 20, 30, 40, 50],
                   'age': [20, 30, 30, 44, 31],
                   'product': [book, toy, book, tv, book],
                   'quantity': [2,1,2,5,9})

here is my code so far:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['revenue'] = df.price*df.quantity
df['year'] = pd.DatetimeIndex(df['date']).year 

def table(df):
   return df.groupby(['year','id'])\
    .agg(revenue = ('revenue', 'sum'), age = ('age', 'unique'), product_year_before = ('product', 'nunique'))

It counts products correctly, but it needs to be for previous year and not for current year.

>Solution :

For previous year you can subtract one year in MultiIndex:

df1 = table(df)

s = df1['product_year_before'].rename(lambda x: x - 1, level=0)
print (s)
year  id
2019  3     1
2020  1     2
      2     1
2021  1     1
Name: product_year_before, dtype: int64

All together – if need new column use DataFrame.join with s Series:

def table(df):
   df1 = (df.groupby(['year','id'])
            .agg(revenue = ('revenue', 'sum'), 
                 age = ('age', 'unique'), 
                 product_year_before = ('product', 'nunique')))
   s = df1['product_year_before'].rename(lambda x: x - 1, level=0)
   return df1.drop('product_year_before', axis=1).join(s)
                       
df1 = table(df)
print (df1)
         revenue   age  product_year_before
year id                                    
2020 3        24  [20]                  NaN
2021 1        80  [30]                  1.0
     2       200  [44]                  NaN
2022 1       450  [31]                  NaN

With original column:

def table(df):
   df1 = (df.groupby(['year','id'])
            .agg(revenue = ('revenue', 'sum'), 
                 age = ('age', 'unique'), 
                 product_year= ('product', 'nunique')))
   s = df1['product_year'].rename(lambda x: x - 1, level=0)
   return df1.join(s.rename('product_year_before'))
                       
df1 = table(df)
print (df1)
         revenue   age  product_year  product_year_before
year id                                                  
2020 3        24  [20]             1                  NaN
2021 1        80  [30]             2                  1.0
     2       200  [44]             1                  NaN
2022 1       450  [31]             1                  NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading