I have a data field like this.
| Index | Product | Purchase_Address | Order_Date |
|---|---|---|---|
| 0 | A | 604 Cherry st, Dallas | 2019-10-28 |
| 1 | B | 225 5th st, LA | 2019-10-29 |
| 2 | C | 604 Cherry st, Dallas | 2019-10-28 |
| 3 | D | 225 5th st, LA | 2019-10-29 |
| 4 | E | 967 12th st, NY | 2019-10-27 |
| 5 | F | 967 12th st, NY | 2019-10-27 |
| 6 | A | 628 Jefferson St, NY | 2019-10-20 |
| 7 | B | 628 Jefferson St, NY | 2019-10-20 |
| 8 | A | 694 Meadow St, Atlanta | 2019-10-25 |
| 9 | B | 694 Meadow St, Atlanta | 2019-10-25 |
| 10 | C | 27 Wilson St, Austin | 2019-10-26 |
| 11 | D | 27 Wilson St, Austin | 2019-10-26 |
I need to make a new data field where I would merge the products into a single column if the address and order date are the same (meaning they where ordered at the same time).
The df should look something like this:
| Index | Product | Purchase_Address |
|---|---|---|
| 0 | A, C | 604 Cherry st, Dallas |
| 1 | B, D | 225 5th st, LA |
| 2 | E, F | 967 12th st, NY |
| 3 | A, B | 628 Jefferson St, NY |
| 4 | A, B | 694 Meadow St, Atlanta |
| 5 | C, D | 27 Wilson St, Austin |
And then from that a df, where I count the number of times a combination has happened:
| Index | Product_Combination | Nr_Of_Times |
|---|---|---|
| 0 | A, C | 1 |
| 1 | B, D | 1 |
| 2 | E, F | 1 |
| 4 | A, B | 2 |
| 5 | C, D | 1 |
How would I achieve something like this?
Thanks!
>Solution :
Use Groupby.agg with Groupby.count and Series.to_frame:
In [1783]: out = df.groupby(['Purchase_Address', 'Order_Date']).agg({'Product': ','.join}).groupby('Product')['Product'].count().to_frame('Nr_Of_Times').reset_index()
In [1784]: out
Out[1784]:
Product Nr_Of_Times
0 A,B 2
1 A,C 1
2 B,D 1
3 C,D 1
4 E,F 1