Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to loop through and return any value if it is found inside any other column within a dataframe using pandas?

How to loop through and return any value if it is found inside any other column, and store it in a list using pandas? It doesn’t matter how many times it is found, just that it is found at least one more time in a different column. If the value has repeated within the same column, it’s not included in the list. Each value must be compared to every other value except from within the same column, if that makes sense.

combined_insp = []
test_df = pd.DataFrame({'area_1': ['John', 'Mike', 'Mary', 'Sarah'],
                        'area_2': ['John', 'Bob', 'Mary', 'Mary'],
                        'area_3': ['Jane', 'Sarah', 'David', 'Michael'],
                        'area_4': ['Diana', 'Mike', 'Bill', 'Bill']})

Expected output would be

combined_insp = [‘John’, ‘Mary’, ‘Sarah’, ‘Mike’]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

A solution with itertools and set algebra:

from itertools import combinations

combined_insp = set.union(*[set(test_df[c1]).intersection(test_df[c2]) 
                            for (c1, c2) in combinations(test_df.columns, 2)])

For each unique combination of columns we take the intersection of the values. Then we take the union of all the results.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading