CONTEXT
I want to group by both a rule_id and calc_id and transform multiple columns into one row where each variable is concatenated with a ",’
DATA EXAMPLE
Calc_ID Rule_ID Name Tracked?
100 Rule1 Y
100 Rule2 N
100 Rule3 N
YYY Test1 Y
YYY Test2 Y
YYY Test3 N
EXPECTED OUTCOME
Calc_ID Rule_ID Name Tracked?
100 Rule1, Rule2, Rule3 Y, N, N
YYY Test1, Test2, Test3 Y, Y, N
CURRENT CODE
I tried to apply a groupby one at a time for each of the columns but that doesn’t work.
import pandas as pd
pd = read_csv(path)
pd = pd.fillna('') # <- to fix nans on groupby calc_id / rule_id
pd = pd.groupby(['Rule_ID', 'Calc_ID'])['Name'].apply(','.join).reset_index()
# pd = pd.groupby(['Rule_ID', 'Calc_ID'])['Tracked?'].apply(','.join).reset_index()
# ^ but this doesn't work because the initial groupby removes other columns
>Solution :
You should use agg to aggregate the values in each group, instead of apply:
df = df.groupby(["Calc_ID", "Rule_ID"], as_index=False).agg(", ".join)
Calc_ID Rule_ID Name Tracked?
0 100 Rule1, Rule2, Rule3 Y, N, N
1 YYY Test1, Test2, Test3 Y, Y, N