I have a pandas data frame with data like this…
df0
| Customer | Product | Price |
|---|---|---|
| A | x | $12 |
| B | y | $15 |
| B | y | $10 |
| C | x | $19 |
| C | y | $9 |
| D | x | $16 |
| D | y | $8 |
| D | y | $10 |
I need to separate this data set into two data frames:
One data frames where customers only have one product. Another data frame where customers have multiple products like this…
df1
| Customer | Product | Price |
|---|---|---|
| A | x | $12 |
| B | y | $15 |
| B | y | $10 |
df2
| Customer | Product | Price |
|---|---|---|
| C | x | $19 |
| C | y | $9 |
| D | x | $16 |
| D | y | $8 |
| D | y | $10 |
I then will do some analysis on prices as a follow-on step.
I know how to filter on one column but I can’t figure out a filter which needs to use both columns like this.
Would anyone please help with the code which would split the data into two data frames?
Many thanks
>Solution :
If you use
df0.Product.groupby(df0.Customer).transform('nunique')
Will return the number of unique products for the customer in the row. So
df0[df0.Product.groupby(df0.Customer).transform('nunique') > 1]
will create a DataFrame with all those with more than one product, for example.