Follow

Follow

Contact

Home Tricky max min groupby transformation using Pandas

Questions

Tricky max min groupby transformation using Pandas

byMR

June 6, 2023

I wish to groupby ID and find the min and max counts for each month.

Data

DATE        ID  name    
4/30/2023   AA  hi  
4/5/2023    AA  hi  
4/1/2023    AA  hi  
4/1/2023    AA  hello   
4/30/2023   AA  hello   
4/5/2023    AA  hello   
4/5/2023    AA  hey 
4/30/2023   AA  hey 
4/5/2023    AA  ok  
4/30/2023   AA  ok  
4/30/2023   AA  ok  
5/1/2023    AA  ok
5/1/2023    AA  hey
5/25/2023   AA  hi
4/1/2023    BB  hey 
4/2/2023    BB  hi  
4/2/2023    BB  hello

Desired

ID  DATE        stat    count
AA  4/1/2023    min     2
AA  4/30/2023   max     5
AA  5/25/2023   min     1
AA  5/1/2023    max     2
BB  4/1/2023    min     1
BB  4/2/2023    max     2

Doing

result = df.groupby(['ID', 'DATE', 'name']).size().reset_index(name='count')
result['stat'] = result.groupby(['ID', 'DATE'])['count'].transform(lambda x: 'min' if x.idxmin() == x.idxmax() else 'max')

however this is not stating the dates. Any suggestion is appreciated.

>Solution :

This should do the trick (pun intended ;])

import pandas as pd
import numpy as np

data = [["4/30/2023", "AA", "hi"]] # Fill rest of data, this was for testing
df = pd.DataFrame(data, columns=['DATE', 'ID', 'name'])

# Convert 'DATE' column to datetime format and extract month and year
df['DATE'] = pd.to_datetime(df['DATE'])
df['month'] = df['DATE'].dt.month
df['year'] = df['DATE'].dt.year

# Group by 'ID', 'month', and 'year' and calculate the count of names
result = df.groupby(['ID', 'year', 'month', 'DATE'])['name'].size().reset_index(name='count')

# Find the min and max counts for each ID and month combination
result_min = result.groupby(['ID', 'year', 'month'])['count'].min().reset_index(name='min_count')
result_max = result.groupby(['ID', 'year', 'month'])['count'].max().reset_index(name='max_count')

# Merge the min and max counts with the original result DataFrame
result = result.merge(result_min, on=['ID', 'year', 'month']).merge(result_max, on=['ID', 'year', 'month'])

# Create a 'stat' column based on the min and max counts
result['stat'] = np.where(result['count'] == result['min_count'], 'min', 'max')

# Drop unnecessary columns and reset index
result = result.drop(columns=['min_count', 'max_count']).reset_index(drop=True)

# We use .to_string() so we can remove the pandas indexing on the left of the print
print(result[['ID', 'DATE', 'stat', 'count']].to_string(index=False))

numpy

byMR

Published June 06, 2023

Add a comment

Leave a ReplyCancel reply

Read more

Questions

Is there an advantage to using Array.Empty vs. new List when initializing a List property?

byMR

June 6, 2023

Questions

How do I match rows within a group and filter the rows that do not agree?

byMR

June 6, 2023

Questions

How to Parse XML Using DOMDocument in VBA

byMR

June 6, 2023

Questions

How can I dynamically update a matrix containing f-string in Python?

byMR

June 6, 2023

Questions

Removing strings outside of parentheses in python

byMR

June 6, 2023

Questions

How to set line length in Bootstrap?

byMR

June 6, 2023