Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

dataframe groupby aggregation count function with condition for binning purpose

So I have a dataframe like this

df = pd.DataFrame({
'A': [1,1,2,2,3,3,3],
'B': [1,3,1,3,1,2,1],
'C': [1,3,5,3,7,7,1]})

    A   B   C
0   1   1   1
1   1   3   3
2   2   1   5
3   2   3   3
4   3   1   7
5   3   2   7
6   3   1   1

I want to create a binning of column B (count) with groupby of column A
for example B_bin1 where B < 3 and B_bin2 is the rest (>=3), C_bin1 for C < 5 and C_bin2 for the rest

From that example the output I want is like this

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    A   B_bin1  B_bin2  C_bin1  C_bin2
0   1   1       1       2       0
1   2   1       1       1       1
2   3   3       0       1       2

I found similar question Pandas groupby with bin counts
, it is working for 1 bin

bins = [0,2,10]
temp_df=df.groupby(['A', pd.cut(df['B'], bins)])
temp_df.size().unstack()
B   (0, 2]  (2, 10]
A       
1   1       1
2   1       1
3   3       0

but when I tried using more than 1 bin, it is not working (my real data has a lot of binning groups)

bins = [0,2,10]
bins2 = [0,4,10]
temp_df=df.groupby(['A', pd.cut(df['B'], bins), pd.cut(df['C'], bins2)])
temp_df.size().unstack()
        C   (0, 4]  (4, 10]
A       B       
1   (0, 2]  1       0
    (2, 10] 1       0
2   (0, 2]  0       1
    (2, 10] 1       0
3   (0, 2]  1       2
    (2, 10] 0       0

My workaround is by create small temporary df and then binning them using 1 group 1 by 1 and then merge them in the end

I also still trying using aggregation (probably using pd.NamedAgg too) similar to this, but I wonder if that can works

df.groupby('A').agg(
    b_count = ('B', 'count'),
    b_sum = ('B', 'sum')
    c_count = ('C', 'count'),
    c_sum = ('C', 'sum')
)

Is anyone has another idea for this?

>Solution :

Because you need processing each bin separately instead groupby+size+unstack is used crosstab with join DataFrames by concat:

bins = [0,2,10]
bins2 = [0,4,10]

temp_df1=pd.crosstab(df['A'], pd.cut(df['B'], bins, labels=False)).add_prefix('B_')
temp_df2=pd.crosstab(df['A'], pd.cut(df['C'], bins2, labels=False)).add_prefix('C_')

df = pd.concat([temp_df1, temp_df2], axis=1).reset_index()
print (df)
   A  B_0  B_1  C_0  C_1
0  1    1    1    2    0
1  2    1    1    1    1
2  3    3    0    1    2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading