I have a table:
| id | Name |
|---|---|
| 1 | A |
| 2 | A |
| 2 | B |
| 2 | C |
| 3 | B |
| 3 | C |
| 4 | C |
| 4 | A |
And I need a new table where unique IDs are calculated at the intersection of names. That is, for example, ID 2 has both A and B, that is, at the intersection of A and B, there is already one ID. Something like this:
| Name | A | B | C |
|---|---|---|---|
| A | – | 1 | 2 |
| B | 1 | – | 2 |
| C | 2 | 2 | – |
I tried with pandas.pivot_table, but i couldn’t understand how
>Solution :
Let us use crosstab to create frequency table, then calculate the inner product to generate similarity / intersection matrix and mask the diagonal values with 0
s = pd.crosstab(df['Name'], df['id']).clip(upper=1)
s = s @ s.T
np.fill_diagonal(s.values, 0)
print(s)
Name A B C
Name
A 0 1 2
B 1 0 2
C 2 2 0