Home Pandas groupby with capture groups from extractall

Questions

Pandas groupby with capture groups from extractall

February 6, 2024

I am working with Pandas extract() method to feed the capture group into groupby. A small example of the result of this process is something like the one illustrated below:

import pandas as pd
import re
from io import StringIO

DATA = StringIO('''
colA;colB;colC
Foo;1;1
Bar;2;2
Foo,Bar;3;3
''')
df = pd.read_csv(DATA, sep=';')

m1 = df['colA'].str.extract('(Bar|Foo)', flags=re.IGNORECASE, expand=False)

for t, _d in df.groupby(m1):
    t
    _d

# 'Bar'
#   colA  colB  colC
# 1  Bar     2     2
#
# 'Foo'
#       colA  colB  colC
# 0      Foo     1     1
# 2  Foo,Bar     3     3

However, the row with index 2 (third row) is only captured in the Foo group, whereas I wanted to capture it in both Foo and Bar.

Playing around with the extractall() method seems to capture all matched groups, but apparently cannot be used together with groupby() because pandas complaints about the grouper not being 1-dimensional: ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional

m2 = df['colA'].str.extractall('(Bar|Foo)', flags=re.IGNORECASE)

#            0
#   match
# 0 0      Foo
# 1 0      Bar
# 2 0      Foo
#   1      Bar

The desired output for groupby() would be something like the following:

for t, _d in df.groupby(somematch):
    t
    _d

# 'Bar'
#       colA  colB  colC
# 1      Bar     2     2
# 2  Foo,Bar     3     3
#
# 'Foo'
#       colA  colB  colC
# 0      Foo     1     1
# 2  Foo,Bar     3     3

Any suggestions are welcome.

>Solution :

groupby is designed so that each row belongs to exactly one group, which is not true in your use-case. You’d need to extract the rows manually. One way to do so:

words = ['Foo','Bar']
for word in words:
    print(df.loc[df['colA'].str.contains(word)])

Output:

      colA  colB  colC
0      Foo     1     1
2  Foo,Bar     3     3
      colA  colB  colC
1      Bar     2     2
2  Foo,Bar     3     3

Or, if your data is given as in the example, you can use get_dummies to split the keywords within each cell:

keys = df['colA'].str.lower().str.get_dummies(',')
for key,data in keys.items():
    print(df.loc[data.eq(1)])

pandas

byMR

Published February 06, 2024

Add a comment

Scraping data using python and requests html and export in excel file

byMR

February 6, 2024

Questions

Async Entity Framework Core Query Throws InvalidOperation Exception

byMR

February 6, 2024

Questions

Numpy array not updating during a for loop

byMR

February 6, 2024

Questions

How do I make a button in HTML work similarly to the scroll wheel on a mouse or like an arrow key when viewing a webpage?

byMR

February 6, 2024

Questions

How to convert from If Statement to Select Case Statement

byMR

February 6, 2024

Questions

Why does Angular MDC adds appearance="outlined" to template files?

byMR

February 6, 2024

Pandas groupby with capture groups from extractall

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Scraping data using python and requests html and export in excel file

Async Entity Framework Core Query Throws InvalidOperation Exception

Numpy array not updating during a for loop

How do I make a button in HTML work similarly to the scroll wheel on a mouse or like an arrow key when viewing a webpage?

How to convert from If Statement to Select Case Statement

Why does Angular MDC adds appearance="outlined" to template files?

Keep Up to Date with the Most Important News

Pandas groupby with capture groups from extractall

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Scraping data using python and requests html and export in excel file

Async Entity Framework Core Query Throws InvalidOperation Exception

Numpy array not updating during a for loop

How do I make a button in HTML work similarly to the scroll wheel on a mouse or like an arrow key when viewing a webpage?

How to convert from If Statement to Select Case Statement

Why does Angular MDC adds appearance="outlined" to template files?

Discover more from Dev solutions