I have the following dataframe (I didnt include an index here, but obvisouly there is also an index)
| ID_1 | ID_2 | Count |
|---|---|---|
| 55 | 62 | 1000 |
| 62 | 55 | 1200 |
| … | … | … |
Now I would like to aggregate those two columns, since I do not care if the ID is in the column ID_1 or in ID_2.
I would like to get the following result:
| ID_1 | ID_2 | Count |
|---|---|---|
| 55 | 62 | 2200 |
| 62 | 55 | 2200 |
| … | … | … |
That means that I want to sum the Count column over all the rows in my dataframe where two IDs are the same (doesnt care if they are in ID_1 column or ID_2 column).
I thought about grouping the dataframe, but that did not work properly.
I am happy for any help!
>Solution :
Create virtual groups:
make_group = lambda x: tuple(sorted(x))
df['Count'] = df.groupby(df[['ID_1', 'ID_2']].apply(make_group, axis=1))['Count'] \
.transform('sum')
Output:
>>> df
ID_1 ID_2 Count
0 55 62 2200
1 62 55 2200
# virtual groups
>>> df[['ID_1', 'ID_2']].apply(make_group, axis=1)
0 (55, 62)
1 (55, 62)
dtype: object