Count How Many Times Customer Has Purchases

March 15, 2022

Lets say I have a dataframe like this:

           day  uid  orders     
0   2022-03-15  1      20
1   2022-03-15  2      10
2   2022-03-15  3      50 
3   2022-03-15  4       1 
4   2022-03-16  1      20  
5   2022-03-16  2      10
6   2022-03-16  3      50 
7   2022-03-16  4       1  
8   2022-03-17  1      20 
9   2022-03-17  2      10 
10  2022-03-17  3      50 
11  2022-03-17  4       1  
12  2022-03-18  1      20 
13  2022-03-18  2      10
14  2022-03-18  3      50 
15  2022-03-18  4       1

How can I get a dataframe finding the purchases done by each user id throughout the entire dataframe. Something like

 orders  users_ordered  % of total     
0   1     4              25%
1   20    4              25%
2   50     4              25%

This would mean that, throughtout the days, 4 users have 1 order, 4 users have 20 orders and 4 users have 50 orders.

% of total I think can be calculated via

 df['% of total'] = 100 * df['orders'] / df.groupby('customer__id')['orders'].transform('sum')

If i can just get how to get my target datframe.

>Solution :

Use Series.value_counts:

s = df['orders'].value_counts().rename('users_ordered')
new_df = \
    pd.concat((s,
               s.div(s.sum()).mul(100).astype(int)
                .astype(str).add('%').rename('% of total')), axis=1)\
      .rename_axis(index='orders')\
      .reset_index()
print(new_df)


   orders  users_ordered % of total
0      20              4        25%
1      10              4        25%
2      50              4        25%
3       1              4        25%

Or:

new_df = \
    pd.concat((df['orders'].value_counts().rename('users_ordered'),
               df['orders'].value_counts(normalize=True).mul(100)
                           .astype(int)
                           .astype(str).add('%').rename('% of total')), axis=1)\
      .rename_axis(index='orders')\
      .reset_index()