Home How to get all unique combinations of values in one column that are in another column

Questions

How to get all unique combinations of values in one column that are in another column

December 30, 2021

Starting with a dataframe like this:

df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['a', 'b', 'b', 'b', 'a']})
   A  B
0  1  a
1  2  b
2  3  b
3  4  b
4  5  a

What is the best way of getting to a dataframe like this?

pd.DataFrame({'source': [1, 2, 2, 3], 'target': [5, 3, 4, 4]})
   source  target
0       1       5
1       2       3
2       2       4
3       3       4

For each time a row in column A has the same value in column B as another row in column A, I want to save the unique instances of that relationship in a new dataframe.

This is pretty close:

df.groupby('B')['A'].unique()
B
a       [1, 5]
b    [2, 3, 4]
Name: A, dtype: object

But I’d ideally convert it into a single dataframe now and my brain has gone kaput.

>Solution :

In your case , you can do itertools.combinations

import itertools
s = df.groupby('B')['A'].apply(lambda x : set(list(itertools.combinations(x, 2)))).explode().tolist()
out = pd.DataFrame(s,columns=['source','target'])
out
Out[312]: 
   source  target
0       1       5
1       3       4
2       2       3
3       2       4