I’m having a dataset with entries where one column is an identifier, let’s say column A. I’d like to count how many entries in column A which is unique and where column B is between x and y and column C is equal with z.
To examplify:
| Row | Column A | Column B | Column C |
|---|---|---|---|
| 1 | 1001 | 4 | 1 |
| 2 | 1001 | 3 | 0 |
| 3 | 1001 | 6 | 1 |
| 4 | 1001 | 4 | 1 |
| 5 | 1002 | 7 | 0 |
| 6 | 1002 | 7 | 1 |
| 7 | 1002 | 2 | 1 |
| 8 | 1002 | 3 | 1 |
| 9 | 1003 | 0 | 1 |
| 10 | 1003 | 3 | 0 |
| 11 | 1003 | 3 | 1 |
| 12 | 1003 | 4 | 1 |
What I want to achieve is following:
Count how many unique values of column A which has exactly two entries in column B between 2-4 and where column C is equal to 1.
Looking at the table this would return 1 since only Column A=1002 meets all criteria (row 7 and 8).
I’ve tried some code but I don’t know how to succeed with the unique value criteria in column A.
>Solution :
This should work. First I subset on your conditions, then I count the the number of occurrences, check if it is 2, and then sum those.
sum(df[(df['Column B ']> 1) & (df['Column B ']<4) & (df['Column C'] == 1)]['Column A '].value_counts() == 2)