Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to calculate the mean for each individual and subtract every observation from that mean value?

I am looking for a simple and fast solution to 1) calculate the mean SG (course) for each subid and 2) subtract each subid’s performance (index) from this value. So subid 2 had 1 and 10 in the SG course. Now I take those two values and calculate the mean. Next I every single performance for the subid in course A to this value. How can I do this?

sectionIndex = [1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10]
subid = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2]
course = ['A','A','A','A','A','A','A','A','SG','SG', 'A','A','A','A','A','A','A','A','SG','SG']
performance = [3,6,1,6,5,6,7,8,9,13,1,2,3,3,4,4,6,3,1,10]
ddict = {
    'subid': subid,
    'sectionIndex': sectionIndex,
    'course': course,
    'peformance': performance

}

df = pd.DataFrame(ddict)

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Use GroupBy.transform with replace not matched values to missing values in Series.where with mean, last subtract peformance column:

s = df['peformance'].where(df['course'].eq('SG')).groupby(df['subid']).transform('mean')
df['new'] = s.sub(df['peformance'])

print (df)
    subid  sectionIndex course  peformance   new
0       1             1      A           3   8.0
1       1             2      A           6   5.0
2       1             3      A           1  10.0
3       1             4      A           6   5.0
4       1             5      A           5   6.0
5       1             6      A           6   5.0
6       1             7      A           7   4.0
7       1             8      A           8   3.0
8       1             9     SG           9   2.0
9       1            10     SG          13  -2.0
10      2             1      A           1   4.5
11      2             2      A           2   3.5
12      2             3      A           3   2.5
13      2             4      A           3   2.5
14      2             5      A           4   1.5
15      2             6      A           4   1.5
16      2             7      A           6  -0.5
17      2             8      A           3   2.5
18      2             9     SG           1   4.5
19      2            10     SG          10  -4.5

Details:

print (df['peformance'].where(df['course'].eq('SG')))
0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
5      NaN
6      NaN
7      NaN
8      9.0
9     13.0
10     NaN
11     NaN
12     NaN
13     NaN
14     NaN
15     NaN
16     NaN
17     NaN
18     1.0
19    10.0
Name: peformance, dtype: float64
    
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading