I have a dataframe that looks like below
customer_id,month,Group
A1,Jan,Premium
A2,Jan,Lost
A3,Jan,Lost
A4,Jan,Lost
A5,Jan,Loyal
A6,Jan,Need Attention
A1,Mar,Premium
A2,Mar,Need Attention
A3,Mar,Lost
A4,Mar,Need Attention
A5,Mar,Loyal
A6,Mar,Lost
t1 = pd.read_clipboard(sep=',')
I would like to do the below
a) Create a matrix against Jan and Mar month
b) Fill the matrix with customer count under each group
I expect my output to be in a table like as below
I tried the below but not sure how to get everything in a neat table
cust_info = t1.groupby(['customer_id','month','Group']).size().reset_index()
group_info = t1.groupby(['customer_id','Group']).size().reset_index()
group_info.merge(cust_info,on='customer_id',how='left')
Is there anyway to capture their movement from one group another between the months Jan and Mar? I have a big data of 20K customers. Is there any elegant way to produce the below output?
>Solution :
Lets do dot product for efficient calculation
s = pd.crosstab([t1['month'], t1['Group']], t1['customer_id'])
s.loc['Jan'] @ s.loc['Mar'].T
Result
Group Lost Loyal Need Attention Premium
Group
Lost 1 0 2 0
Loyal 0 1 0 0
Need Attention 1 0 0 0
Premium 0 0 0 1
