Imagine I have this table:
Col-1 | Col-2
A | 2
A | 3
B | 1
B | 4
C | 7
Groupby on Col-1 with a sum aggregation on Col-2 will sum A to 5, B to 5, and C to 7.
What I want to know is if there is a baked in feature that allows aggregation on a target value in a column and then groups all other entries into another bin. For example, if I wanted to groupby on Col-1 targeting A and grouping all other entries into a label named other, I would end up with A as 5 and Other as 12.
Does that make sense? I know I could do some filtering sorcery and merging datasets back together, but figured there had to be a cleaner, more Pythonic way I am missing.
I have tried going through the documentation, but nothing jumped out at me.
>Solution :
One solution is to make pd.Categorical from the Column 1 -> with two categories A for string A and Other for other strings. Then group by this categorical:
tmp = (
pd.Categorical(df["Col1"], categories=["A"]).add_categories("Other").fillna("Other")
)
out = df.groupby(tmp, observed=False)["Col2"].sum()
print(out)
Prints:
A 5
Other 12
Name: Col2, dtype: int64
Another solution, group by boolean mask:
out = (
df.groupby(df["Col1"].eq("A"))["Col2"]
.sum()
.rename(index={True: "A", False: "Other"})
)
print(out)
Prints:
Col1
Other 12
A 5
Name: Col2, dtype: int64