Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas value_counts with manual labels and sorting

I have a field containing codes (DMDEDUC2 in this example). I’d like to calculate a frequency table (value_counts) on this field and display it with user-specified labels. The code below achieves exactly what I want… but it feels to me like I have surely missed a more standard way to achieve the intended result.

Logically, the value_counts and the replace lines cannot be simplified. But surely the rest could be more elegant.

Is there a simpler way to get to this result? A more panda-ish solution?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# Tiny dataset for clarity
import pandas as pd
df = pd.DataFrame({ 'DMDEDUC2': [5, 3, 3, 5, 4, 2, 4, 4] })
d = {
      1: "<9"
    , 2: "9-11"
    , 3: "HS/GED"
    , 4: "Some college/AA"
    , 5: "College"
    , 7: "Refused"
    , 9: "Don't know"
}

# First get value counts (vc) for DMDEDUC2
# This line gets all the data I need in the correct order... 
# but without the labels I need.
vc = df.DMDEDUC2.value_counts().sort_index()

# Convert the resulting Series to a DataFrame 
# to allow for clear labels in a logical order
vc = vc.to_frame()
vc['DMDEDUC2x'] = vc.index
vc.DMDEDUC2x = vc.DMDEDUC2x.replace(d)
vc = vc.set_index('DMDEDUC2x')
vc = vc.rename({'DMDEDUC2':'COUNTS'}, axis=1)
print(vc)

Desired output (it’s ordered by the [non-displayed] code, not by value or label):

                 COUNTS
DMDEDUC2x              
<9                  655
9-11                643
HS/GED             1186
Some college/AA    1621
College            1366
Don't know            3

Desired output for the tiny sample dataset:

                 COUNTS
DMDEDUC2x              
9-11                  1
HS/GED                2
Some college/AA       3
College               2

>Solution :

I think it can easily be condensed into two lines:

vc = df.DMDEDUC2.value_counts().sort_index().to_frame(name='COUNTS')
vc.index = vc.index.map(d).rename('DMDEDUC2')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading