Home How to group by one column or another in pandas

Questions

How to group by one column or another in pandas

May 23, 2022

I have a table like:

    col1    col2
0   1       a
1   2       b
2   2       c
3   3       c
4   4       d

I’d like rows to be grouped together if they have a matching value in col1 or col2. That is, I’d like something like this:

> (
    df
    .groupby(set('col1', 'col2'))  # Made-up syntax
    .ngroup())
0  0
1  1
2  1
3  1
4  2

Is there a way to do this with pandas?

>Solution :

This is not easy to achieve simply with pandas. Indeed, two far away groups can become connected when two items are connected in the second group.

You can approach this using graph theory. Find the connected components using edges formed by the two (or more) groups. A python library for this is networkx:

import networkx as nx

g1 = df.groupby('col1').ngroup()
g2 = 'a'+df.groupby('col2').ngroup().astype(str)

# make graph and get connected components to form a mapping dictionary
G = nx.from_edgelist(zip(g1, g2))
d = {k:v for v,s in enumerate(nx.connected_components(G)) for k in s}

# find common group
group = g1.map(d)

df.groupby(group).ngroup()

output:

0    0
1    1
2    1
3    1
4    2
dtype: int64

graph:

byMR

Published May 23, 2022

Add a comment

Does running a synchronous method using Task.Run() make it asynchronous

byMR

May 23, 2022

Questions

Shared object reference between different Rust containers without copy

byMR

May 23, 2022

Questions

Filter sumproduct formula based on array

byMR

May 23, 2022

Questions

How to check values from list in DataFrame column and insert value by condition?

byMR

May 23, 2022

Questions

Converting local time to UTC in snowflake

byMR

May 23, 2022

Questions

Google Sheets, writing array to sheet via script

byMR

May 23, 2022

How to group by one column or another in pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Does running a synchronous method using Task.Run() make it asynchronous

Shared object reference between different Rust containers without copy

Filter sumproduct formula based on array

How to check values from list in DataFrame column and insert value by condition?

Converting local time to UTC in snowflake

Google Sheets, writing array to sheet via script

Keep Up to Date with the Most Important News

How to group by one column or another in pandas

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Does running a synchronous method using Task.Run() make it asynchronous

Shared object reference between different Rust containers without copy

Filter sumproduct formula based on array

How to check values from list in DataFrame column and insert value by condition?

Converting local time to UTC in snowflake

Google Sheets, writing array to sheet via script

Discover more from Dev solutions