Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to estimate count for Pandas dataframe column values based on multiple conditions?

Given a pandas DataFrame df of the following format:-

StudyNo   Condition1      Condition2      ResultVal
1222          25              30              0.5
1245          30              45              0.12
1672          40              15              0.46
1222          25              30              0.95
1245          30              45              0.88
1245          30              45              0.23

I would like to calculate frequency/count of the number of ResultVal attributes existing in the dataset for a specific StudyNo under a given Condition1 and Condition2. There are no exactly duplicate rows in this dataset.

e.g. for StudyNo 1245 at Condition1 of 30 and Condition2 of 45, the count would be 3 (because 3 ResultVal attributes exist for this aspect). Also, I would like to print out a list of StudyNo, Condition1 and Condition2 in cases wherein < =1 ResultVal attributes exist (which in this example would be [1672,40,15].

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Any help in this regard would be much appreciated.

>Solution :

You can groupby "StudyNo", "Condition1" and "Condition2" and transform count of "ResultVal" to create column "Frequency".

Then create a boolean mask for rows where "Frequency" is less than or equal to 1 and filter the rows that satisfy it.

cols = ['StudyNo','Condition1','Condition2']
df['Frequency'] = df.groupby(cols)['ResultVal'].transform('count')
appears_only_once = df.loc[df['Frequency'] <= 1, cols].to_numpy().tolist()

Output:

   StudyNo  Condition1  Condition2  ResultVal  Frequency
0     1222          25          30       0.50          2
1     1245          30          45       0.12          3
2     1672          40          15       0.46          1
3     1222          25          30       0.95          2
4     1245          30          45       0.88          3
5     1245          30          45       0.23          3

The list:

[[1672, 40, 15]]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading