How to loop through and return any value if it is found inside any other column, and store it in a list using pandas? It doesn’t matter how many times it is found, just that it is found at least one more time in a different column. If the value has repeated within the same column, it’s not included in the list. Each value must be compared to every other value except from within the same column, if that makes sense.
combined_insp = [] test_df = pd.DataFrame({'area_1': ['John', 'Mike', 'Mary', 'Sarah'], 'area_2': ['John', 'Bob', 'Mary', 'Mary'], 'area_3': ['Jane', 'Sarah', 'David', 'Michael'], 'area_4': ['Diana', 'Mike', 'Bill', 'Bill']})
Expected output would be
combined_insp = [‘John’, ‘Mary’, ‘Sarah’, ‘Mike’]
>Solution :
A solution with itertools and set algebra:
from itertools import combinations
combined_insp = set.union(*[set(test_df[c1]).intersection(test_df[c2])
for (c1, c2) in combinations(test_df.columns, 2)])
For each unique combination of columns we take the intersection of the values. Then we take the union of all the results.