Home Sum the values of a groupby from dataframe columns using a pattern in a list

Questions

Sum the values of a groupby from dataframe columns using a pattern in a list

August 8, 2022

Context: I’m trying to get the sum of the groups created using a groupby using a list of patterns that are present on the dataframe columns.

For example, let’s say we have this dataframe:

df = pd.DataFrame({'123_Pattern1_a':[0,1,2],'X_Y_Pattern2_X':[3,4,5],'Z_D_Pattern2_Y':[4,5,7],'312_Pattern1_Z':[8,2,4]})

I now would like to create a group by using the "Pattern" and get the sum of values for those columns for each row

If we have a list like this:

pattern = ['Pattern1','Pattern2']

With the dataframe above, the output should be another dataframe as such:

df_final = pd.DataFrame({'Pattern1':[8,3,6],'Pattern2':[7,9,12]})

Basically, "concatenating" all the columns that have a specific pattern on the given column name and get the sum of these values by row

I was trying something like this:

pattern = ['Pattern1','Pattern2','Pattern3',...]

grouped = pd.DataFrame(data_media.groupby(data_media.columns.str.extract(pattern, expand=False), axis=1))

But it doesn’t work since extract is a regex and I’m using a list with the patterns. How could I create a regex that would work for this problem? Or is there another way to do this?

Thank you!

>Solution :

Using melt and pivot_table:

pattern = ['Pattern1','Pattern2']

df_final = (df
 .reset_index().melt('index')
 .assign(variable=lambda d: d['variable'].str.extract(fr'({"|".join(pattern)})'))
 .pivot_table(index='index', columns='variable', values='value', aggfunc='sum')
)

One option using wide_to_long and groupby.sum (works with previous example before OP update):

pattern = ['Pattern1','Pattern2']

df_final = (pd
    .wide_to_long(df.reset_index(), stubnames=pattern, i='index', j='x',
                 sep='_', suffix='.+')
   .groupby(level=0).sum()
)

output:

       Pattern1  Pattern2
index                    
0           8.0       7.0
1           3.0       9.0
2           6.0      12.0

pandas

byMR

Published August 08, 2022

Add a comment

Async call in while loop with time interval in Nodejs

byMR

August 8, 2022

Questions

Java required type <T> provided <?> generic compilation error

byMR

August 8, 2022

Questions

Why is this code getting an error in C, but not in C++?

byMR

August 8, 2022

Questions

Count frequencies (unique rows) from a pandas list type column

byMR

August 8, 2022

Questions

What determines exactly when scheduled lambda invocation triggered by cloudwatch event happens?

byMR

August 8, 2022

Questions

How to do LEFT OUTER JOIN but only return first row if matched?

byMR

August 8, 2022

Sum the values of a groupby from dataframe columns using a pattern in a list

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Async call in while loop with time interval in Nodejs

Java required type <T> provided <?> generic compilation error

Why is this code getting an error in C, but not in C++?

Count frequencies (unique rows) from a pandas list type column

What determines exactly when scheduled lambda invocation triggered by cloudwatch event happens?

How to do LEFT OUTER JOIN but only return first row if matched?

Keep Up to Date with the Most Important News

Sum the values of a groupby from dataframe columns using a pattern in a list

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Async call in while loop with time interval in Nodejs

Java required type <T> provided <?> generic compilation error

Why is this code getting an error in C, but not in C++?

Count frequencies (unique rows) from a pandas list type column

What determines exactly when scheduled lambda invocation triggered by cloudwatch event happens?

How to do LEFT OUTER JOIN but only return first row if matched?

Discover more from Dev solutions