I have this dataframe below for airline flights:

```
df = pd.DataFrame({
'ORIGIN_x':["LGA","ORD"],
'DESTINATION_x':["ORD","LGA"],
'number_of_flights_x':[3576,3580],
'ORIGIN_y':["ORD","LGA"],
'DESTINATION_y':["LGA","ORD"],
'number_of_flights_y':[3580,3576]
})
```

I am trying to get below output:

```
df_final = pd.DataFrame({
'ORIGIN_x':["LGA"],
'DESTINATION_x':["ORD"],
'ORIGIN_y':["ORD"],
'DESTINATION_y':["LGA"],
'number_of_flights':[7156], #sum of 3576+3580
})
```

So two sample rows should match and I should see only one row because ORIGIN_x=DESTINATION_y & ORIGIN_y=DESTINATION_x. And I need to sum the number of flights(i.e. sum of number_of_flights_x column)

PS: Sorry if the title of my question is not clear

I tried to do a self-inner join but it is still multiplying the rows.

Thank you!

### >Solution :

Given your format with duplicated information, I would drop the `_y`

and `groupby.agg`

:

```
import numpy as np
# only keep _x
tmp = df.filter(like='_x').copy()
# sort ORIGIN/DESTINATION always in the same order
tmp[['ORIGIN_x', 'DESTINATION_x']] = np.sort(tmp[['ORIGIN_x', 'DESTINATION_x']])
# aggregate
out = tmp.groupby(['ORIGIN_x', 'DESTINATION_x'], as_index=False).sum()
print(out)
```

Output:

```
ORIGIN_x DESTINATION_x number_of_flights_x
0 LGA ORD 7156
```

Then, if needed, you can recreate the `_y`

columns by copying the existing ones (swapping ORIGIN and DESTINATION), but IMO this is not necessary.