Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

List all non unique columns of a dataframe

I have a datafame that contains relationship between parent-child-origin-destination, it looks something like this.

    parent_origin   parent_destination  child_origin    child_destination
0      ABD                 NCL             ABD               ALM
1      ABD                 NCL             ABD               DHM
2      ABD                 YRK             ABD               ALM
3      ABD                 YRK             ABD               NTR
4      ABD                 KGX             ABD               SVG

I would like to group by on child_origin & child_destination to know if there are any child_origin, child_destination pairs that have 2 diffrent parent_origin & parent_destination and list out the result. I also want to print out the list of parent_origin & parent_destination that have the same child od pair.

For example, in the above dataframe i want the expected output to be like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    child_origin    child_destination   parent_origin   parent_destination
1      ABD                 ALM             ABD               NCL
                                           ABD               YRK

What i have tried:

I can do a group by to get the values which have duplicate parents & the count of duplicates but i am not able to figure out how to diplay the actual parents values.

>>> grp = df.groupby(['child_origin','child_destination']).size().reset_index().rename(columns={0:'count'})
>>> grp[grp['count] > 1]

This gives me the count of all child_ods that have multiple parents but i want to knwo the value of parents as well.

PS: I am fairly new to pandas.

>Solution :

How about:

df.loc[df.groupby(['child_origin','child_destination'])['parent_origin'].transform("count") >1]

If you want the columns in order:

df.loc[df.groupby(['child_origin','child_destination'])['parent_origin'].transform("count") >1, 
       ['child_origin', 'child_destination', 'parent_origin', 'parent_destination']]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading