Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Given a dataframe with one column of players and other column with a subset of teammates, form the entire teams

Suppose I have a dataframe like this

    player  teammates
0   A       [C,F]
1   C       [A,F]
2   B       [B]
3   D       [H,J,K]
4   H       [J,K]
5   Q       [D]

Now rows 3, 4 and 5 represent some challenging data points. If the teammates column contained the entire team for each player, the problem would be trivial.

The expected output would be a list of all teams, so like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[[A,C,F], [B], [D,H,J,K,Q]]

The first step could be to just consolidate both columns into one via

df.apply(lambda row: list(set([row['player']]+row['teammates'])), axis=1), like so

0  [A,C,F]
1  [A,C,F]
2  [B]
3  [D,H,J,K]
4  [H,J,K]
5  [Q,D]

but checking pairwise for common elements and further consolidating seems very inefficient. Is there an efficient way to get the desired output?

>Solution :

Create connected_components with explode column teammates by DataFrame.explode:

import networkx as nx

# Create the graph from the dataframe
g = nx.Graph()

g.add_edges_from(df[['player','teammates']].explode('teammates').itertuples(index=False))

new = list(nx.connected_components(g))
print (new)
[{'F', 'A', 'C'}, {'B'}, {'Q', 'K', 'H', 'J', 'D'}]

If need lists:

L = [list(x) for x in new]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading