Lets say I have a dataframe like this:
day uid orders
0 2022-03-15 1 20
1 2022-03-15 2 10
2 2022-03-15 3 50
3 2022-03-15 4 1
4 2022-03-16 1 20
5 2022-03-16 2 10
6 2022-03-16 3 50
7 2022-03-16 4 1
8 2022-03-17 1 20
9 2022-03-17 2 10
10 2022-03-17 3 50
11 2022-03-17 4 1
12 2022-03-18 1 20
13 2022-03-18 2 10
14 2022-03-18 3 50
15 2022-03-18 4 1
How can I get a dataframe finding the purchases done by each user id throughout the entire dataframe. Something like
orders users_ordered % of total
0 1 4 25%
1 20 4 25%
2 50 4 25%
This would mean that, throughtout the days, 4 users have 1 order, 4 users have 20 orders and 4 users have 50 orders.
% of total I think can be calculated via
df['% of total'] = 100 * df['orders'] / df.groupby('customer__id')['orders'].transform('sum')
If i can just get how to get my target datframe.
>Solution :
Use Series.value_counts:
s = df['orders'].value_counts().rename('users_ordered')
new_df = \
pd.concat((s,
s.div(s.sum()).mul(100).astype(int)
.astype(str).add('%').rename('% of total')), axis=1)\
.rename_axis(index='orders')\
.reset_index()
print(new_df)
orders users_ordered % of total
0 20 4 25%
1 10 4 25%
2 50 4 25%
3 1 4 25%
Or:
new_df = \
pd.concat((df['orders'].value_counts().rename('users_ordered'),
df['orders'].value_counts(normalize=True).mul(100)
.astype(int)
.astype(str).add('%').rename('% of total')), axis=1)\
.rename_axis(index='orders')\
.reset_index()