How can I aggreate all product names of the grouped dataframe into a new column as list or set:
import pandas as pd # 2.0.3
df = pd.DataFrame(
{
"customer_id": [1, 2, 3, 2, 1],
"order_id": [1, 2, 3, 4, 1],
"products": ["foo", "bar", "baz", "foo", "bar"],
"amount": [1, 1, 1, 1, 1]
}
)
print(df)
grouped = df.groupby(["customer_id", "order_id"])
df["product_order_count"] = grouped["amount"].transform("sum")
df["all_products"] = grouped["products"].agg(list).reset_index()
print(df)
Although I followed another question (Pandas groupby: How to get a union of strings) an exception is thrown:
Traceback (most recent call last):
File "C:\temp\tt.py", line 15, in <module>
df["all_orders"] = grouped["products"].agg(list).reset_index()
File "c:\Users\foo\.venvs\kapa_monitor-38\lib\site-packages\pandas\core\frame.py", line 3940, in __setitem__
self._set_item_frame_value(key, value)
File "c:\Users\foo\.venvs\kapa_monitor-38\lib\site-packages\pandas\core\frame.py", line 4094, in _set_item_frame_value
raise ValueError(
ValueError: Cannot set a DataFrame with multiple columns to the single column all_products
Expected output (all_products, as list or set):
customer_id order_id products amount product_order_count all_products
0 1 1 foo 1 2 'foo', 'bar'
1 2 2 bar 1 1 'bar'
2 3 3 baz 1 1 'baz'
3 2 4 foo 1 1 'foo'
4 1 1 bar 1 2 'foo', 'bar'
>Solution :
You could use transform with a function that returns something that is the same length with the group:
df["all_products"] = grouped["products"].transform(lambda x: [list(x)]*len(x))
Output:
customer_id order_id products amount product_order_count all_products
0 1 1 foo 1 2 [foo, bar]
1 2 2 bar 1 1 [bar]
2 3 3 baz 1 1 [baz]
3 2 4 foo 1 1 [foo]
4 1 1 bar 1 2 [foo, bar]
Or you can joint the strings (I don’t really recommend lists in the data):
df["all_products"] = grouped["products"].transform(','.join)
which gives
customer_id order_id products amount product_order_count all_products
0 1 1 foo 1 2 foo,bar
1 2 2 bar 1 1 bar
2 3 3 baz 1 1 baz
3 2 4 foo 1 1 foo
4 1 1 bar 1 2 foo,bar