Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: How to self-join cross-functionally

I have this dataframe below for airline flights:

df = pd.DataFrame({
    
    'ORIGIN_x':["LGA","ORD"],
    'DESTINATION_x':["ORD","LGA"],
    'number_of_flights_x':[3576,3580],
    'ORIGIN_y':["ORD","LGA"],
    'DESTINATION_y':["LGA","ORD"],
    'number_of_flights_y':[3580,3576]
    
})

I am trying to get below output:

df_final = pd.DataFrame({
    
    'ORIGIN_x':["LGA"],
    'DESTINATION_x':["ORD"],
    'ORIGIN_y':["ORD"],
    'DESTINATION_y':["LGA"],
    'number_of_flights':[7156], #sum of 3576+3580
    
})

So two sample rows should match and I should see only one row because ORIGIN_x=DESTINATION_y & ORIGIN_y=DESTINATION_x. And I need to sum the number of flights(i.e. sum of number_of_flights_x column)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

PS: Sorry if the title of my question is not clear

I tried to do a self-inner join but it is still multiplying the rows.

Thank you!

>Solution :

Given your format with duplicated information, I would drop the _y and groupby.agg:

import numpy as np

# only keep _x
tmp = df.filter(like='_x').copy()

# sort ORIGIN/DESTINATION always in the same order
tmp[['ORIGIN_x', 'DESTINATION_x']] = np.sort(tmp[['ORIGIN_x', 'DESTINATION_x']])

# aggregate
out = tmp.groupby(['ORIGIN_x', 'DESTINATION_x'], as_index=False).sum()

print(out)

Output:

  ORIGIN_x DESTINATION_x  number_of_flights_x
0      LGA           ORD                 7156

Then, if needed, you can recreate the _y columns by copying the existing ones (swapping ORIGIN and DESTINATION), but IMO this is not necessary.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading