I am looking to groupby several columns if the prefix is similar and take the sum based off of categorical values within a column.
Data
name type size month
AA:3400 5 august
AA:3401 FALSE 1 august
AA:3402 FALSE 2 august
AA:3404 TRUE 0 august
AA:3409 FALSE 1 september
AA:3410 FALSE 8 september
AA:3412 FALSE 9 september
BB:3400 TRUE 4 august
BB:3401 FALSE 7 august
Desired
name type size month
AA TRUE 0 august
AA FALSE 3 august
AA 5 august
BB TRUE 4 august
BB FALSE 7 august
AA TRUE 0 september
AA FALSE 18 september
Doing
However, how can I group if the value has the same prefix? Any suggestion is appreciated.
out = (
df.assign(type= df["name"].astype(
pd.CategoricalDtype(["TRUE", "FALSE"], ordered=True)))
.groupby([df["name", "date"].str.split(":").str[0], "type"],
dropna=False, group_keys=False)["size"].sum().reset_index()
)
However, I am not sure how to incorporate multiple fields in this grouping. Any suggestion is appreciated.
>Solution :
You can do
out = df.groupby([df["name"].str.split(":").str[0],
df['month'],
df['type']])['size'].sum().unstack('type',fill_value=0).stack().reset_index(name='size')
Out[61]:
name month type size
0 AA august FALSE 3
1 AA august None 5
2 AA august TRUE 0
3 AA september FALSE 18
4 AA september None 0
5 AA september TRUE 0
6 BB august FALSE 7
7 BB august None 0
8 BB august TRUE 4