Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Adding columns & values per group occurrence in pandas after filtering

I have a df

import seaborn as sns
import numpy as np
import pandas as pd

df = sns.load_dataset('titanic')

I am trying to add 8 columns from 1-8 which is the age of the passenger and to count how many passengers fit in to each of those categories 1-8

I tried:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df_temp = df.groupby('sex').agg({'fare':np.mean})

Which returns:

        fare
sex 
female  44.48
male    25.52

Then I have age_buckets = range(1,9) which are the age categories I am interested in.

The goal is to have this kind of structure:

sex            fare                 age_1       age_2       age_3         ...     age_8
male           25                   0           10          3                     x  
female         44                   1           2           2                     y  

Which would mean that there were 0 age_1 males and 1 girl that was 1 year old and so on.

So I tried the following:

df_age = df[df.age.isin(age_buckets)].groupby(['sex','age']) \
                                  .count() \
                                  .survived \
                                  .reset_index()

Which returns the data I want but not in the format I can add to my primary df_temp as a column per age.

So at this point I am not sure if this is the right way to achieve the desired result I am trying to create and don’t know how to work forward.

>Solution :

Use cut with crosstab and add DataFrame.add_prefix:

df = (df_temp.join(pd.crosstab(df['sex'],pd.cut(df['age'], 
                                                bins=range(0,9),
                                                labels=range(1,9)))
                     .add_prefix('age_')))
print (df)
             fare  age_1  age_2  age_3  age_4  age_5  age_6  age_7  age_8
sex                                                                      
female  44.479818      4      6      2      5      4      2      1      2
male    25.523893     10      4      4      5      0      1      2      2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading