Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Different Data Name Output

I want to count the highest age of diabetes in this dataframe. Where the expected output of this code is like this:

age
25    14
31    13
41    13
29    13
43    11
22    11
28    10
33    10
38    10
36    10
Name: age, dtype: int64

However when I run it with this command:

(data_clean['age'].where(data_clean['class'] == 'Diabetes')).value_counts().head(10)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The output produced is like this:

age
25.0    14
31.0    13
41.0    13
29.0    13
43.0    11
22.0    11
28.0    10
33.0    10
38.0    10
36.0    10
Name: count, dtype: int64

Here’s the csv file I used in this case: CSV file link

The resulting output index is float, while the expected output index should be integer. And the output name is count, while the expected output name should be age. Do you have any suggestions about it? I appreciate any help you can give me. Thank you

>Solution :

Don’t use where which will convert the non Diabetes data to NaN and thus to float, instead perform boolean indexing to only select the valid rows:

out = (data_clean
        .loc[data_clean['class'] == 'Diabetes', 'age']
        .value_counts().head(10)
      )
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading