Given a pandas DataFrame df of the following format:-
StudyNo Condition1 Condition2 ResultVal
1222 25 30 0.5
1245 30 45 0.12
1672 40 15 0.46
1222 25 30 0.95
1245 30 45 0.88
1245 30 45 0.23
I would like to calculate frequency/count of the number of ResultVal attributes existing in the dataset for a specific StudyNo under a given Condition1 and Condition2. There are no exactly duplicate rows in this dataset.
e.g. for StudyNo 1245 at Condition1 of 30 and Condition2 of 45, the count would be 3 (because 3 ResultVal attributes exist for this aspect). Also, I would like to print out a list of StudyNo, Condition1 and Condition2 in cases wherein < =1 ResultVal attributes exist (which in this example would be [1672,40,15].
Any help in this regard would be much appreciated.
>Solution :
You can groupby "StudyNo", "Condition1" and "Condition2" and transform count of "ResultVal" to create column "Frequency".
Then create a boolean mask for rows where "Frequency" is less than or equal to 1 and filter the rows that satisfy it.
cols = ['StudyNo','Condition1','Condition2']
df['Frequency'] = df.groupby(cols)['ResultVal'].transform('count')
appears_only_once = df.loc[df['Frequency'] <= 1, cols].to_numpy().tolist()
Output:
StudyNo Condition1 Condition2 ResultVal Frequency
0 1222 25 30 0.50 2
1 1245 30 45 0.12 3
2 1672 40 15 0.46 1
3 1222 25 30 0.95 2
4 1245 30 45 0.88 3
5 1245 30 45 0.23 3
The list:
[[1672, 40, 15]]