I have a pandas dataframe that looks something like this:
| Category | Score | ID |
|---|---|---|
| A | nan | 1 |
| A | 95 | 1 |
| A | nan | 2 |
| B | nan | 2 |
| B | nan | 2 |
| B | nan | 3 |
| C | 95 | 3 |
| C | nan | 3 |
I want to find the count of those IDs with only nan values across each category. Based on the table above, my result would look like:
| Category | Count |
|---|---|
| A | 1 |
| B | 2 |
| C | 0 |
Here’s what I have so far but I’m not sure this is providing the correct numbers:
out = (df[df['score'].isna()]
.groupby('category')
.apply(lambda g: g.groupby('id')
.filter(lambda x: len(x) >= 1)['id'].nunique())
.to_frame('Count')
.reset_index())
print(out)
Thanks for your help
>Solution :
Try this:
First you groupby Category and ID and check with isna() and all() if every value in a group is nan, which will return True and False. Converting True and False to integer will change the values to 1 and 0. Then groupby again, but only the Categorys and sum the values (sum the 1’s which came from True)
out = (df
.groupby(['Category', 'ID'])['Score']
.apply(lambda x: x.isna().all().astype(int))
.groupby('Category')
.sum()
.reset_index(name='count')
)
print(out)
Category count
0 A 1
1 B 2
2 C 0