Adding columns & values per group occurrence in pandas after filtering

December 3, 2021

I have a df

import seaborn as sns
import numpy as np
import pandas as pd

df = sns.load_dataset('titanic')

I am trying to add 8 columns from 1-8 which is the age of the passenger and to count how many passengers fit in to each of those categories 1-8

I tried:

df_temp = df.groupby('sex').agg({'fare':np.mean})

Which returns:

        fare
sex 
female  44.48
male    25.52

Then I have age_buckets = range(1,9) which are the age categories I am interested in.

The goal is to have this kind of structure:

sex            fare                 age_1       age_2       age_3         ...     age_8
male           25                   0           10          3                     x  
female         44                   1           2           2                     y

Which would mean that there were 0 age_1 males and 1 girl that was 1 year old and so on.

So I tried the following:

df_age = df[df.age.isin(age_buckets)].groupby(['sex','age']) \
                                  .count() \
                                  .survived \
                                  .reset_index()

Which returns the data I want but not in the format I can add to my primary df_temp as a column per age.

So at this point I am not sure if this is the right way to achieve the desired result I am trying to create and don’t know how to work forward.

>Solution :

Use cut with crosstab and add DataFrame.add_prefix:

df = (df_temp.join(pd.crosstab(df['sex'],pd.cut(df['age'], 
                                                bins=range(0,9),
                                                labels=range(1,9)))
                     .add_prefix('age_')))
print (df)
             fare  age_1  age_2  age_3  age_4  age_5  age_6  age_7  age_8
sex                                                                      
female  44.479818      4      6      2      5      4      2      1      2
male    25.523893     10      4      4      5      0      1      2      2