Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas create groups from column values

I have a dataframe df as follows:

Col1    Col2
A1      A1
B1      A1
B1      B1
C1      C1
D1      A1
D1      B1
D1      D1
E1      A1

I am trying to achieve the following:

Col1    Group
A1      A1
B1      A1
D1      A1
E1      A1
C1      C1

i.e. in df every value which have relationship gets grouped together as a single value. i.e. in the example above (A1, A1), (B1, A1), (B1, B1), (D1, A1), (D1, B1), (D1, D1), (E1, A1) can either directly or indirectly be all linked to A1 (the first in alphabet sort) so they all get assigned the group id A1 and so on.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I am not sure how to do this.

>Solution :

This can be approached using a graph.

Here is your graph:

graph

You can use networkx to find the connected_components:

import networkx as nx

G = nx.from_pandas_edgelist(df, source='Col1', target='Col2')

d = {}
for g in nx.connected_components(G):
    g = sorted(g)
    for x in g:
        d[x] = g[0]

out = pd.Series(d)

output:

A1    A1
B1    A1
D1    A1
E1    A1
C1    C1
dtype: object
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading