Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

filling a data frame with N count for two distinct categories

sample data:

type1 = ['bunny','cow','pig','chicken','elephant']
type2 = ['flower','tree','grass']

data_type1 = np.random.choice(a = type1, p = [0.05,0.15,0.25,0.05,0.5], size=1000)
data_type2 = np.random.choice(a = type2, p = [0.25,0.25,0.50], size=1000)
data_value = stats.poisson.rvs(loc = 18, mu = 30, size = 1000)

df = pd.DataFrame({'type1':data_type1,'type2':data_type2,'data':data_value})

grp1 = pd.unique(df.type1.values)
grp1.sort()
grp2 = pd.unique(df.type2.values)
grp2.sort()
m_df = pd.DataFrame(index = grp1,columns=grp2)

given a df structured as ‘df’, how would I populate ‘m_df’ with the count of each type combo. I want to accomplish df.groupby(['type1','type2'])['data'].count() but I’m not sure how to write it into a df so it looks nice and is more useful.

Edit: below is a simpler df that also works as an example. The expected output is the number of values for each type combination — the exact output shown when the input is df.groupby(['type1','type2'])['data'].count(). The question is how to represent that output as a dataframe resembling m_df.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = pd.DataFrame({'type1': ['bunny','cow','pig','chicken','elephant','cow','pig'],
           'data': [32,23,45,35,20,28,55],
           'type2':['female', 'male','male','male','male','female','female']})

>Solution :

IIUC, use value_counts (same as groupby_count) then unstack the second index level:

>>> (df.value_counts(['type1', 'type2'])
       .unstack('type2', fill_value=0)
       .rename_axis(index=None, columns=None))

          female  male
bunny          1     0
chicken        0     1
cow            1     1
elephant       0     1
pig            1     1

You can also use pd.crosstab or pivot_table:

>>> (pd.crosstab(df['type1'], df['type2'], 1, aggfunc='count')
   .fillna(0).astype(int).rename_axis(index=None, columns=None))

>>> (df.assign(val=1).pivot_table('val', 'type1', 'type2', aggfunc=sum, fill_value=0)
   .rename_axis(index=None, columns=None))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading