Pandas Dataframe Aggregation

December 19, 2021

I have the following dataframe (I didnt include an index here, but obvisouly there is also an index)

ID_1	ID_2	Count
55	62	1000
62	55	1200
…	…	…

Now I would like to aggregate those two columns, since I do not care if the ID is in the column ID_1 or in ID_2.

I would like to get the following result:

ID_1	ID_2	Count
55	62	2200
62	55	2200
…	…	…

That means that I want to sum the Count column over all the rows in my dataframe where two IDs are the same (doesnt care if they are in ID_1 column or ID_2 column).

I thought about grouping the dataframe, but that did not work properly.

I am happy for any help!

>Solution :

Create virtual groups:

make_group = lambda x: tuple(sorted(x))

df['Count'] = df.groupby(df[['ID_1', 'ID_2']].apply(make_group, axis=1))['Count'] \
                .transform('sum')

Output:

>>> df
   ID_1  ID_2  Count
0    55    62   2200
1    62    55   2200

# virtual groups
>>> df[['ID_1', 'ID_2']].apply(make_group, axis=1)
0    (55, 62)
1    (55, 62)
dtype: object