Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filtering data-frame columns using regex, then using .groupby to calculate sum

I have a dataframe which I want to group, filter columns by regex, and then sum.

My code looks like this:

import pandas as pd

df = pd.DataFrame({'ID':[1,1,2,2,3,3], 
                   'Invasive' : [12,1,1,0,1,0], 
                   'invasive': [1,4,5,3,4,6],
                   'Wild':[4,7,1,0,0,0],
                   'wild':[0,0,9,8,3,2],
                   'Crop':[0,0,0,0,0,0],
                   'Crop_2':[2,3,2,2,1,2]})

df.groupby(['ID']).filter(regex='(Invasive)|(invasive)|(Wild)|(wild)').sum()

The error message I get is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

DataFrameGroupBy.filter() missing 1 required positional argument: 'func'

I get the same Err message if groupby comes after filter

Why does this happen? Where do I input the func argument?

EDIT:

My Expected output is one column that has summed across the filtered columns and is grouped by ID. E.g.:

   ID  Output
0   1      29
1   2      27
2   3      16

>Solution :

What you want to do doesn’t make sense, groupby.filter is to filter rows, not to be confused with DataFrame.filter.

You likely want to filter the columns, then to aggregate:

df.filter(regex='(?i)(Invasive|Wild)').groupby(df['ID']).sum()

NB. I replaced (Invasive)|(invasive)|(Wild)|(wild) by (?i)(Invasive|Wild), which means ‘InvasiveORWild` independently of the case.

Output:

    Invasive  invasive  Wild  wild
ID                                
1         13         5    11     0
2          1         8     1    17
3          1        10     0     5

edit:

the output that you show needs a further summation per row:

out = (df.filter(regex='(?i)(Invasive|Wild)')
         .groupby(df['ID']).sum()
         .sum(axis=1)
         .reset_index(name='Output')
      )

# or with summation before:
out = (df.filter(regex='(?i)(Invasive|Wild)')
         .sum(axis=1)
         .groupby(df['ID']).sum()
         .reset_index(name='Output')
      )

Output:

   ID  Output
0   1      29
1   2      27
2   3      16
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading