I have a dataframe that looks has a breakdown by date-bucket-group (i.e. for each day, we have many buckets and within those buckets we have two groups) and looks like this:
date | bucket | Group |purchase
2020-01-01 | 1 | A | 12
2020-01-01 | 1 | B | 11
2020-01-01 | 2 | A | 14
2020-01-01 | 2 | B | 14
2020-02-01 | 1 | A | 11
2020-02-01 | 1 | B | 10
I would like to create a new dataframe, with a "difference" column that looks like this:
date | bucket | purchase | difference
2020-01-01 | 1 | 12-11=1 (Group A - Group B for that day/bucket)
2020-01-01 | 2 | 0
2020-02-01 | 1 | 1
How can I shape my df in such way?
>Solution :
You can reshape your dataframe before compute the diff:
out = (df.set_index(['date', 'bucket', 'group'])['purchase']
.unstack('group').diff(-1, axis=1)['A']
.rename('difference').reset_index())
print(out)
# Output
date bucket difference
0 2020-01-01 1 1
1 2020-01-01 2 0
2 2020-02-01 1 1
Or with pivot:
out = (df.pivot(['date', 'bucket'], 'group', 'purchase')
.diff(-1, axis=1)['A'].rename('difference').reset_index())