How to adjust the value counts based on a metric?

Advertisements

My input is this dataframe :

data = {
    'ISSUE': ['#1', '#2', '#3', '#4', '#5', '#6', '#7', '#8', '#9', '#10', '#11', '#12'],
    'CATEGORY': ['CAT1', 'CAT1', 'CAT2', 'CAT3', 'CAT1', 'CAT3', 'CAT3', 'CAT2', 'CAT2', 'CAT2', 'CAT4', 'CAT5'],
    'AGENT': ['A', 'A', 'B', 'A', 'C', 'C', 'A', 'A', 'B', 'C', 'C', 'A'],
    'STATUS': ['FINISHED'] * 12
}
df = pd.DataFrame(data)
print(df)

   ISSUE CATEGORY AGENT    STATUS
0     #1     CAT1     A  FINISHED
1     #2     CAT1     A  FINISHED
2     #3     CAT2     B  FINISHED
3     #4     CAT3     A  FINISHED
4     #5     CAT1     C  FINISHED
5     #6     CAT3     C  FINISHED
6     #7     CAT3     A  FINISHED
7     #8     CAT2     A  FINISHED
8     #9     CAT2     B  FINISHED
9    #10     CAT2     C  FINISHED
10   #11     CAT4     C  FINISHED
11   #12     CAT5     A  FINISHED

Each category has a complexity of resolution time :

# EASY  (around 10 mins) :  CAT1+CAT3+CAT5
# MEDIUM (around 2 hours) : CAT2
# HARD  (>= 1 day) : CAT4

When using a value counts I got this :

AGENT
A    6
C    4
B    2
Name: count, dtype: int64

For example one that interpret these values might think that Agent A is the most productive one. It could be but we need to verify that.

For that I started making a code but ignore how to purse so I can readjust the value counts :

complexity = {
    'CAT1': 10/60,
    'CAT2': 2,
    'CAT3': 10/60,
    'CAT4': 8*60,
    'CAT5': 10/60
}

df['TIME'] = df['CATEGORY'].map(complexity)

df['AGENT'].value_counts() * df.groupby('AGENT')['TIME'].sum()

# AGENT
# A      17.000000
# B       8.000000
# C    1929.333333
# dtype: float64

Have you guys encountered a similar subject ? I’m open to any suggestion.

>Solution :

Since you’re now using a quantitative value, value_counts is no longer appropriate, better use a groupby.sum:

df.groupby('AGENT', sort=False)['TIME'].sum().sort_values(ascending=False)

Output:

AGENT
C    482.333333
B      4.000000
A      2.833333
Name: TIME, dtype: float64

Or if you want to have the productivity relative to the total number of tasks, then a mean:

df.groupby('AGENT', sort=False)['TIME'].mean().sort_values(ascending=False)

Output:

AGENT
C    120.583333
B      2.000000
A      0.472222
Name: TIME, dtype: float64

Leave a ReplyCancel reply