Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create relative value column by groups of two different columns

I have a DataFrame df1 which contains time-series data with Date, Location, and Value. I try to create df2 with a new column Relative_Value, which is the value of the specific row devided by the sum of the values for a location on a specific date (groupby date and location).

For example:

  • Date=20220101, Location=FE, Value=4: 4/(4+2+6+4+1) = 0.235
df1:
    Date        Location    Value
0   20220101    FE          4
1   20220101    FE          2
2   20220101    FE          6
3   20220101    FE          4
4   20220101    FE          1
5   20220101    RP          4
6   20220101    RP          6
7   20220101    RP          4
8   20220102    FE          1
9   20220102    FE          3
10  20220102    FE          4
11  20220102    FE          1
12  20220102    RP          8
13  20220102    RP          4
14  20220102    RP          5
15  20220102    RP          9

df2:
    Date        Location    Value   Relativ_Value
0   20220101    FE          4       0.235          <-- example
1   20220101    FE          2       0.117
2   20220101    FE          6       0.352
3   20220101    FE          4       0.235
4   20220101    FE          1       0.058
5   20220101    RP          4       0.285
6   20220101    RP          6       0.428
7   20220101    RP          4       0.285
8   20220102    FE          1       0.111
9   20220102    FE          3       0.333
10  20220102    FE          4       0.444
11  20220102    FE          1       0.111
12  20220102    RP          8       0.307
13  20220102    RP          4       0.153
14  20220102    RP          5       0.192
15  20220102    RP          9       0.346

I tried it by first groupby the two columns and the apply function:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df2 = df1.groupby(['Date', 'Location']).apply(lambda x: x/sum(x))

What would be an efficient approach to create df2?

For reproducability:

import pandas as pd
df1 = pd.DataFrame({
    'Date':[20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102],
    'Location':['FE', 'FE', 'FE', 'FE', 'FE', 'RP', 'RP', 'RP', 'FE', 'FE', 'FE', 'FE', 'RP', 'RP', 'RP', 'RP'],
    'Value':[4, 2, 6, 4, 1, 4, 6, 4, 1, 3, 4, 1, 8, 4, 5, 9]})

Many thanks!

>Solution :

Use GroupBy.transform for improve performance:

df2 = df.assign(Relativ_Value=df['Value']
                        .div(df1.groupby(['Date', 'Location'])['Value'].transform('sum'))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading