Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to adjust the value counts based on a metric?

My input is this dataframe :

data = {
    'ISSUE': ['#1', '#2', '#3', '#4', '#5', '#6', '#7', '#8', '#9', '#10', '#11', '#12'],
    'CATEGORY': ['CAT1', 'CAT1', 'CAT2', 'CAT3', 'CAT1', 'CAT3', 'CAT3', 'CAT2', 'CAT2', 'CAT2', 'CAT4', 'CAT5'],
    'AGENT': ['A', 'A', 'B', 'A', 'C', 'C', 'A', 'A', 'B', 'C', 'C', 'A'],
    'STATUS': ['FINISHED'] * 12
}
df = pd.DataFrame(data)
print(df)

   ISSUE CATEGORY AGENT    STATUS
0     #1     CAT1     A  FINISHED
1     #2     CAT1     A  FINISHED
2     #3     CAT2     B  FINISHED
3     #4     CAT3     A  FINISHED
4     #5     CAT1     C  FINISHED
5     #6     CAT3     C  FINISHED
6     #7     CAT3     A  FINISHED
7     #8     CAT2     A  FINISHED
8     #9     CAT2     B  FINISHED
9    #10     CAT2     C  FINISHED
10   #11     CAT4     C  FINISHED
11   #12     CAT5     A  FINISHED

Each category has a complexity of resolution time :

# EASY  (around 10 mins) :  CAT1+CAT3+CAT5
# MEDIUM (around 2 hours) : CAT2
# HARD  (>= 1 day) : CAT4

When using a value counts I got this :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

AGENT
A    6
C    4
B    2
Name: count, dtype: int64

For example one that interpret these values might think that Agent A is the most productive one. It could be but we need to verify that.

For that I started making a code but ignore how to purse so I can readjust the value counts :

complexity = {
    'CAT1': 10/60,
    'CAT2': 2,
    'CAT3': 10/60,
    'CAT4': 8*60,
    'CAT5': 10/60
}

df['TIME'] = df['CATEGORY'].map(complexity)

df['AGENT'].value_counts() * df.groupby('AGENT')['TIME'].sum()

# AGENT
# A      17.000000
# B       8.000000
# C    1929.333333
# dtype: float64

Have you guys encountered a similar subject ? I’m open to any suggestion.

>Solution :

Since you’re now using a quantitative value, value_counts is no longer appropriate, better use a groupby.sum:

df.groupby('AGENT', sort=False)['TIME'].sum().sort_values(ascending=False)

Output:

AGENT
C    482.333333
B      4.000000
A      2.833333
Name: TIME, dtype: float64

Or if you want to have the productivity relative to the total number of tasks, then a mean:

df.groupby('AGENT', sort=False)['TIME'].mean().sort_values(ascending=False)

Output:

AGENT
C    120.583333
B      2.000000
A      0.472222
Name: TIME, dtype: float64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading