Home Is this group by size behavior correct?

Questions

Is this group by size behavior correct?

June 20, 2022

I have this sample dataset:

mydf = pd.DataFrame({'city':['Porto','Loa','Porto','Porto','Loa'],\
                     'town':['A','C','A','B','C']})
mydf['city'] = pd.Categorical(mydf['city'])
mydf['town'] = pd.Categorical(mydf['town'])
mydf
    city    town
0   Porto   A
1   Loa     C
2   Porto   A
3   Porto   B
4   Loa     C

And I want to count the occurrences grouped by city and town. So I tried this:

mydf.groupby(['city','town']).size().to_frame()
              0
city    town    
Loa     A     0
        B     0
        C     2
Porto   A     2
        B     1
        C     0

But this is wrong, since city C is located only in Loa, not in Porto, and cities A and B are located only in Porto. My expected result is this:

              0
city    town    
Loa     C     2
Porto   A     2
        B     1

Sure I can avoid the pd.Categorical conversion in ‘city’ and ‘town’, but I don’t understand that behavior. Is there a parameter I should use to avoid this and get the right and simplified expected result?

>Solution :

Yes, the groupby + size behavior is expected.

By default, if any of the grouping columns are categorical then it will show all the values for categorical columns regardless whether they appear in a particular group or not.

To turn this default behaviors off, you can set the optional parameter observed=True in groupby which will show only observed values(actual appearing values) of categorical columns:

mydf.groupby(['city','town'], observed=True).size().to_frame()

            0
city  town   
Porto A     2
      B     1
Loa   C     2

pandas

byMR

Published June 20, 2022

Add a comment

Github actions: Output from one step is only accessible in the next step and not other steps

byMR

June 20, 2022

Questions

Unable to append item in a list after slicing it python

byMR

June 20, 2022

Questions

Route is not defined Inertia JS

byMR

June 20, 2022

Questions

How to combine these 2 simple python files without using class?

byMR

June 20, 2022

Questions

Remove the column used for index with pd.MultiIndex.from_product

byMR

June 20, 2022

Questions

How to use useFormik hook with async await in onSubmit event?

byMR

June 20, 2022

Is this group by size behavior correct?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Github actions: Output from one step is only accessible in the next step and not other steps

Unable to append item in a list after slicing it python

Route is not defined Inertia JS

How to combine these 2 simple python files without using class?

Remove the column used for index with pd.MultiIndex.from_product

How to use useFormik hook with async await in onSubmit event?

Keep Up to Date with the Most Important News

Is this group by size behavior correct?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Github actions: Output from one step is only accessible in the next step and not other steps

Unable to append item in a list after slicing it python

Route is not defined Inertia JS

How to combine these 2 simple python files without using class?

Remove the column used for index with pd.MultiIndex.from_product

How to use useFormik hook with async await in onSubmit event?

Discover more from Dev solutions