List all non unique columns of a dataframe

August 17, 2022

I have a datafame that contains relationship between parent-child-origin-destination, it looks something like this.

    parent_origin   parent_destination  child_origin    child_destination
0      ABD                 NCL             ABD               ALM
1      ABD                 NCL             ABD               DHM
2      ABD                 YRK             ABD               ALM
3      ABD                 YRK             ABD               NTR
4      ABD                 KGX             ABD               SVG

I would like to group by on child_origin & child_destination to know if there are any child_origin, child_destination pairs that have 2 diffrent parent_origin & parent_destination and list out the result. I also want to print out the list of parent_origin & parent_destination that have the same child od pair.

For example, in the above dataframe i want the expected output to be like:

    child_origin    child_destination   parent_origin   parent_destination
1      ABD                 ALM             ABD               NCL
                                           ABD               YRK

What i have tried:

I can do a group by to get the values which have duplicate parents & the count of duplicates but i am not able to figure out how to diplay the actual parents values.

>>> grp = df.groupby(['child_origin','child_destination']).size().reset_index().rename(columns={0:'count'})
>>> grp[grp['count] > 1]

This gives me the count of all child_ods that have multiple parents but i want to knwo the value of parents as well.

PS: I am fairly new to pandas.

>Solution :

How about:

df.loc[df.groupby(['child_origin','child_destination'])['parent_origin'].transform("count") >1]

If you want the columns in order:

df.loc[df.groupby(['child_origin','child_destination'])['parent_origin'].transform("count") >1, 
       ['child_origin', 'child_destination', 'parent_origin', 'parent_destination']]