Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Resampling and grouping in more than one pandas column

Hi i want to convert my dataframe from

df = pd.DataFrame({
        'date': [
           '2019-08-01', '2019-12-01',
           '2019-23-01', '2019-15-01',
           '2020-26-02', '2020-10-10',
           '2020-08-10', '2021-01-04'
        ],
        'loc': [
           'axp','axp', 'axp', 'axp',
           'axe', 'axp', 'axp', 'axe'
        ],
        'category': [
           'domestic', 'domestic', 'domestic', 
           'domestic',
           'wild', 'domestic', 'domestic', 'wild'
        ],
        'status': [
           'found', 'found', 'not found', 'found',
           'found', 'found', 'found', 'not found'
        ]
    })

to this format where each loc is reduced to the count of its associated category and status per month and year

loc_mnth_yr n_domestic n_wild n_found n_not_found
axp_01_2019 4 0 3 1
axe_02_2020 0 1 1 0
axp_10_2020 2 0 2 0
axe_04_2021 0 1 0 1

i have tried using the below codes but am not yet getting the desired results

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = df.resample('M')['category'].value_counts().to_frame().rename({'category':'count'},axis=1)
df = df.pivot_table(index='date', columns='category')

>Solution :

Use DataFrame.assign for loc_mnth_yr column, then unpivot by DataFrame.melt ad count values by crosstab:

df['date'] = pd.to_datetime(df['date'], format='%Y-%d-%m')

df = (df.assign(loc_mnth_yr = df['loc']+ '_' + df['date'].dt.strftime('%m_%Y'))
        .melt(id_vars='loc_mnth_yr', value_vars=['category','status']))

df = pd.crosstab(df['loc_mnth_yr'], df['value']).add_prefix('n_')
print (df)
value        n_domestic  n_found  n_not found  n_wild
loc_mnth_yr                                          
axe_02_2020           0        1            0       1
axe_04_2021           0        0            1       1
axp_01_2019           4        3            1       0
axp_10_2020           2        2            0       0

If order is important use:

df['date'] = pd.to_datetime(df['date'], format='%Y-%d-%m')

df = (df.assign(mnth_yr = df['date'].dt.strftime('%m_%Y'))
        .melt(id_vars=['loc', 'mnth_yr'], value_vars=['category','status']))

df = (pd.crosstab([df['mnth_yr'], df['loc']], df['value'])
        .reindex(columns=df['value'].unique())
        .add_prefix('n_'))
df.index = [f'{b}_{a}' for a, b in df.index]

df = df.rename_axis(index='loc_mnth_yr', columns=None).reset_index()
print (df)
   loc_mnth_yr  n_domestic  n_wild  n_found  n_not found
0  axp_01_2019           4       0        3            1
1  axe_02_2020           0       1        1            0
2  axe_04_2021           0       1        0            1
3  axp_10_2020           2       0        2            0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading