Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sum the values of a groupby from dataframe columns using a pattern in a list

Context: I’m trying to get the sum of the groups created using a groupby using a list of patterns that are present on the dataframe columns.

For example, let’s say we have this dataframe:

df = pd.DataFrame({'123_Pattern1_a':[0,1,2],'X_Y_Pattern2_X':[3,4,5],'Z_D_Pattern2_Y':[4,5,7],'312_Pattern1_Z':[8,2,4]})

I now would like to create a group by using the "Pattern" and get the sum of values for those columns for each row

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

If we have a list like this:

pattern = ['Pattern1','Pattern2']

With the dataframe above, the output should be another dataframe as such:

df_final = pd.DataFrame({'Pattern1':[8,3,6],'Pattern2':[7,9,12]}) 

Basically, "concatenating" all the columns that have a specific pattern on the given column name and get the sum of these values by row

I was trying something like this:

pattern = ['Pattern1','Pattern2','Pattern3',...]

grouped = pd.DataFrame(data_media.groupby(data_media.columns.str.extract(pattern, expand=False), axis=1))

But it doesn’t work since extract is a regex and I’m using a list with the patterns. How could I create a regex that would work for this problem? Or is there another way to do this?

Thank you!

>Solution :

Using melt and pivot_table:

pattern = ['Pattern1','Pattern2']

df_final = (df
 .reset_index().melt('index')
 .assign(variable=lambda d: d['variable'].str.extract(fr'({"|".join(pattern)})'))
 .pivot_table(index='index', columns='variable', values='value', aggfunc='sum')
)

One option using wide_to_long and groupby.sum (works with previous example before OP update):

pattern = ['Pattern1','Pattern2']

df_final = (pd
    .wide_to_long(df.reset_index(), stubnames=pattern, i='index', j='x',
                 sep='_', suffix='.+')
   .groupby(level=0).sum()
)

output:

       Pattern1  Pattern2
index                    
0           8.0       7.0
1           3.0       9.0
2           6.0      12.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading