Resampling and grouping in more than one pandas column

Hi i want to convert my dataframe from

df = pd.DataFrame({
        'date': [
           '2019-08-01', '2019-12-01',
           '2019-23-01', '2019-15-01',
           '2020-26-02', '2020-10-10',
           '2020-08-10', '2021-01-04'
        ],
        'loc': [
           'axp','axp', 'axp', 'axp',
           'axe', 'axp', 'axp', 'axe'
        ],
        'category': [
           'domestic', 'domestic', 'domestic', 
           'domestic',
           'wild', 'domestic', 'domestic', 'wild'
        ],
        'status': [
           'found', 'found', 'not found', 'found',
           'found', 'found', 'found', 'not found'
        ]
    })

to this format where each loc is reduced to the count of its associated category and status per month and year

loc_mnth_yr n_domestic n_wild n_found n_not_found
axp_01_2019 4 0 3 1
axe_02_2020 0 1 1 0
axp_10_2020 2 0 2 0
axe_04_2021 0 1 0 1

i have tried using the below codes but am not yet getting the desired results

df = df.resample('M')['category'].value_counts().to_frame().rename({'category':'count'},axis=1)
df = df.pivot_table(index='date', columns='category')

>Solution :

Use DataFrame.assign for loc_mnth_yr column, then unpivot by DataFrame.melt ad count values by crosstab:

df['date'] = pd.to_datetime(df['date'], format='%Y-%d-%m')

df = (df.assign(loc_mnth_yr = df['loc']+ '_' + df['date'].dt.strftime('%m_%Y'))
        .melt(id_vars='loc_mnth_yr', value_vars=['category','status']))

df = pd.crosstab(df['loc_mnth_yr'], df['value']).add_prefix('n_')
print (df)
value        n_domestic  n_found  n_not found  n_wild
loc_mnth_yr                                          
axe_02_2020           0        1            0       1
axe_04_2021           0        0            1       1
axp_01_2019           4        3            1       0
axp_10_2020           2        2            0       0

If order is important use:

df['date'] = pd.to_datetime(df['date'], format='%Y-%d-%m')

df = (df.assign(mnth_yr = df['date'].dt.strftime('%m_%Y'))
        .melt(id_vars=['loc', 'mnth_yr'], value_vars=['category','status']))

df = (pd.crosstab([df['mnth_yr'], df['loc']], df['value'])
        .reindex(columns=df['value'].unique())
        .add_prefix('n_'))
df.index = [f'{b}_{a}' for a, b in df.index]

df = df.rename_axis(index='loc_mnth_yr', columns=None).reset_index()
print (df)
   loc_mnth_yr  n_domestic  n_wild  n_found  n_not found
0  axp_01_2019           4       0        3            1
1  axe_02_2020           0       1        1            0
2  axe_04_2021           0       1        0            1
3  axp_10_2020           2       0        2            0

Leave a Reply