Suppose I have a dataframe like this
player teammates
0 A [C,F]
1 C [A,F]
2 B [B]
3 D [H,J,K]
4 H [J,K]
5 Q [D]
Now rows 3, 4 and 5 represent some challenging data points. If the teammates column contained the entire team for each player, the problem would be trivial.
The expected output would be a list of all teams, so like:
[[A,C,F], [B], [D,H,J,K,Q]]
The first step could be to just consolidate both columns into one via
df.apply(lambda row: list(set([row['player']]+row['teammates'])), axis=1), like so
0 [A,C,F]
1 [A,C,F]
2 [B]
3 [D,H,J,K]
4 [H,J,K]
5 [Q,D]
but checking pairwise for common elements and further consolidating seems very inefficient. Is there an efficient way to get the desired output?
>Solution :
Create connected_components with explode column teammates by DataFrame.explode:
import networkx as nx
# Create the graph from the dataframe
g = nx.Graph()
g.add_edges_from(df[['player','teammates']].explode('teammates').itertuples(index=False))
new = list(nx.connected_components(g))
print (new)
[{'F', 'A', 'C'}, {'B'}, {'Q', 'K', 'H', 'J', 'D'}]
If need lists:
L = [list(x) for x in new]