Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Use entire groupby object on custom function

I have a data frame that looks something like:

df =

date           col1      col2      col3
---------------------------------------
2022/03/01     1         5         10
2022/03/01     3         6         12
2022/03/01     5         7         14
2022/03/02     6         8         15
2022/03/02     2         9         17
2022/03/02     8         10        19
2022/03/03     2         11        21
2022/03/03     10        12        22
2022/03/03     9         13        23

I then have a function that looks something like:

my_func(df):
    <do something with the `df` given to the function>

    return result

So in my case, the result is just a single float calculated from doing several things to the data frame used.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

What I would like to do is to groupby the date in the original data frame, and then use those group objects as input in the function, and the returning the calculated value for all rows, i.e. the resulting data frames would look something like:

df_group_object1 =

date           col1      col2      col3     result
--------------------------------------------------
2022/03/01     1         5         10       15
2022/03/01     3         6         12       15
2022/03/01     5         7         14       15


df_group_object2 =

date           col1      col2      col3     result
--------------------------------------------------
2022/03/02     6         8         15       25
2022/03/02     2         9         17       25
2022/03/02     8         10        19       25


df_group_object3 =

date           col1      col2      col3     result
--------------------------------------------------
2022/03/03     2         11        21       56
2022/03/03     10        12        22       56
2022/03/03     9         13        23       56

Where the result column is just random values that I put in. The real value would come from the my_func.

My idea was to do something like:

df["result"] = df.groupby(["date"]).transform(my_func)

But it seems like the groupby object I thought would be give to the function is not the entire data frame at all.

So is there a way to do this ?

>Solution :

Assuming you want to do operations on the grouped DataFrames and then collect the results, you could just use a for loop on the groupby object:

import pandas as pd

df = pd.DataFrame({'col1':[1,1,2,2,3], 'col2':[1,2,3,4,5]})
def my_func(df):
    return df['col2'] + 1

# let's say you want to groupby col1 and operate on the rest of the columns
group_object = []
for group_name, df_chunk in df.groupby('col1'):
    df_chunk['result'] = my_func(df_chunk)
    group_object.append(df_chunk)

group_object[0]:

    col1    col2    result
0   1       1       2
1   1       2       3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading