Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Count only NAs across categories

I have a pandas dataframe that looks something like this:

Category Score ID
A nan 1
A 95 1
A nan 2
B nan 2
B nan 2
B nan 3
C 95 3
C nan 3

I want to find the count of those IDs with only nan values across each category. Based on the table above, my result would look like:

Category Count
A 1
B 2
C 0

Here’s what I have so far but I’m not sure this is providing the correct numbers:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

out = (df[df['score'].isna()]
        .groupby('category')
        .apply(lambda g: g.groupby('id')
        .filter(lambda x: len(x) >= 1)['id'].nunique())
        .to_frame('Count')
        .reset_index())
print(out)

Thanks for your help

>Solution :

Try this:

First you groupby Category and ID and check with isna() and all() if every value in a group is nan, which will return True and False. Converting True and False to integer will change the values to 1 and 0. Then groupby again, but only the Categorys and sum the values (sum the 1’s which came from True)

out = (df
       .groupby(['Category', 'ID'])['Score']
       .apply(lambda x: x.isna().all().astype(int))
       .groupby('Category')
       .sum()
       .reset_index(name='count')
      )
print(out)
  Category  count
0        A      1
1        B      2
2        C      0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading