Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Calculate %-deviation with values from a pandas Dataframe

I am fairly new to python and I have the following dataframe

    setting_id  subject_id  seconds  result_id  owner_id  average  duration_id
0            7           1        0     1680.5       2.0   24.000          1.0
1            7           1     3600     1690.5       2.0   46.000          2.0
2            7           1    10800     1700.5       2.0  101.000          4.0
3            7           2        0     1682.5       2.0   12.500          1.0
4            7           2     3600     1692.5       2.0   33.500          2.0
5            7           2    10800     1702.5       2.0   86.500          4.0
6            7           3        0     1684.5       2.0    8.500          1.0
7            7           3     3600     1694.5       2.0   15.000          2.0
8            7           3    10800     1704.5       2.0   34.000          4.0

What I need to do is Calculate the deviation (%) from averages with a "seconds"-value not equal to 0 from those averages with a seconds value of zero, where the subject_id and Setting_id are the same

i.e. setting_id ==7 & subject_id ==1 would be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

(result/baseline)*100

——> for 3600 seconds: (46/24)*100 = +192%

——> for 10800 seconds: (101/24)*100 = +421%

…. baseline = average-result with a seconds value of 0

…. result = average-result with a seconds value other than 0

The resulting df should look like this

   setting_id  subject_id  seconds  owner_id  average  deviation  duration_id
0           7           1        0         2       24          0            1
1           7           1     3600         2       46        192            2
2           7           1    10800         2      101        421            4

I want to use these calculations then to plot a regression graph (with seaborn) of deviations from baseline

I have played around with this df for 2 days now and tried different forloops but I just can´t figure out the correct way.

>Solution :

You can use:

# identify rows with 0
m = df['seconds'].eq(0)
# compute the sum of rows with 0
s = (df['average'].where(m)
     .groupby([df['setting_id'], df['subject_id']])
     .sum()
    )

# compute the deviation per group
deviation = (
 df[['setting_id', 'subject_id']]
 .merge(s, left_on=['setting_id', 'subject_id'], right_index=True, how='left')
 ['average']
 .rdiv(df['average']).mul(100)
 .round().astype(int) # optional
 .mask(m, 0)
)

df['deviation'] = deviation
# or
# out = df.assign(deviation=deviation)

Output:

   setting_id  subject_id  seconds  result_id  owner_id  average  duration_id  deviation
0           7           1        0     1680.5       2.0     24.0          1.0          0
1           7           1     3600     1690.5       2.0     46.0          2.0        192
2           7           1    10800     1700.5       2.0    101.0          4.0        421
3           7           2        0     1682.5       2.0     12.5          1.0          0
4           7           2     3600     1692.5       2.0     33.5          2.0        268
5           7           2    10800     1702.5       2.0     86.5          4.0        692
6           7           3        0     1684.5       2.0      8.5          1.0          0
7           7           3     3600     1694.5       2.0     15.0          2.0        176
8           7           3    10800     1704.5       2.0     34.0          4.0        400
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading