Home Pandas: Tidy up groupby aggregation

Questions

Pandas: Tidy up groupby aggregation

December 19, 2021

I really struggle with tidying up the table into a "normal" dataframe again after having aggregated something.
I had a table like that (columns):

RnnSize     EmbSize     RnnLayer    Epochs  Alpha   Eval    Run     Result

So I calculated average and std of the Result column over multiple runs using that command:

df.groupby(["RnnSize", "EmbSize", "RnnLayer", "Epochs", "Alpha", "Eval"]).agg({'Result': ['mean', 'std']})

The output is a DataFrame like that:

                                                             Result
                                                             mean   std
RnnSize     EmbSize     RnnLayer    Epochs  Alpha   Eval

It looks a bit like three levels.

df.columns outputs the following multiindex:

MultiIndex([(   'index',    ''),
            ( 'RnnSize',    ''),
            ( 'EmbSize',    ''),
            ('RnnLayer',    ''),
            (  'Epochs',    ''),
            (   'Alpha',    ''),
            (    'Eval',    ''),
            (  'Result', 'std'),
            (  'Result', 'std')],
           )

How do I flatten that again, removing "Result" and putting mean and std into the same "level" as the rest?
There are so many commands like reset_index, drop_level and so on, but I did not find out yet how to fix that. It quite confuses me.

Edit: For reproducability, here is my entire code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

dfRuns = pd.read_csv("Results.csv", encoding="utf-8")
dfRuns

dfAv = dfRuns.copy()
dfAv = dfAv.groupby(["RnnSize", "EmbSize", "RnnLayer", "Epochs", "Alpha", "Eval"]).agg({'Result': ['mean', 'std']})

And the (shortened) csv file Results.csv:

RnnSize,EmbSize,RnnLayer,Epochs,Alpha,Eval,Run,Result
128,200,2,150,0.1,Precision,1,0.5940
128,200,2,150,0.1,Recall,1,0.5038
128,200,2,150,0.1,F1,1,0.5144
128,200,2,150,0.1,Precision,2,0.5851
128,200,2,150,0.1,Recall,2,0.4995
128,200,2,150,0.1,F1,2,0.5082

>Solution :

Use reset_index() and then flatten the indexes:

df = df.reset_index()
df.columns = [' '.join(col).rstrip() for col in df.columns.to_numpy()]

aggregation

byMR

Published December 19, 2021

Add a comment

How do I control the checkbox with the filter?

byMR

December 19, 2021

Questions

UITableViewDiffableDataSource are not deinit

byMR

December 19, 2021

Questions

Sqlite subtract different rows from different columns

byMR

December 19, 2021

Questions

Global vs script variable

byMR

December 19, 2021

Questions

Query to fetch a value from the same columns as other values, but different rows, in mysql

byMR

December 19, 2021

Questions

Are strings arrays of characters?

byMR

December 19, 2021

Pandas: Tidy up groupby aggregation

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How do I control the checkbox with the filter?

UITableViewDiffableDataSource are not deinit

Sqlite subtract different rows from different columns

Global vs script variable

Query to fetch a value from the same columns as other values, but different rows, in mysql

Are strings arrays of characters?

Keep Up to Date with the Most Important News

Pandas: Tidy up groupby aggregation

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How do I control the checkbox with the filter?

UITableViewDiffableDataSource are not deinit

Sqlite subtract different rows from different columns

Global vs script variable

Query to fetch a value from the same columns as other values, but different rows, in mysql

Are strings arrays of characters?

Discover more from Dev solutions