Home Create relative value column by groups of two different columns

Questions

Create relative value column by groups of two different columns

October 10, 2022

I have a DataFrame df1 which contains time-series data with Date, Location, and Value. I try to create df2 with a new column Relative_Value, which is the value of the specific row devided by the sum of the values for a location on a specific date (groupby date and location).

For example:

Date=20220101, Location=FE, Value=4: 4/(4+2+6+4+1) = 0.235

df1:
    Date        Location    Value
0   20220101    FE          4
1   20220101    FE          2
2   20220101    FE          6
3   20220101    FE          4
4   20220101    FE          1
5   20220101    RP          4
6   20220101    RP          6
7   20220101    RP          4
8   20220102    FE          1
9   20220102    FE          3
10  20220102    FE          4
11  20220102    FE          1
12  20220102    RP          8
13  20220102    RP          4
14  20220102    RP          5
15  20220102    RP          9

df2:
    Date        Location    Value   Relativ_Value
0   20220101    FE          4       0.235          <-- example
1   20220101    FE          2       0.117
2   20220101    FE          6       0.352
3   20220101    FE          4       0.235
4   20220101    FE          1       0.058
5   20220101    RP          4       0.285
6   20220101    RP          6       0.428
7   20220101    RP          4       0.285
8   20220102    FE          1       0.111
9   20220102    FE          3       0.333
10  20220102    FE          4       0.444
11  20220102    FE          1       0.111
12  20220102    RP          8       0.307
13  20220102    RP          4       0.153
14  20220102    RP          5       0.192
15  20220102    RP          9       0.346

I tried it by first groupby the two columns and the apply function:

df2 = df1.groupby(['Date', 'Location']).apply(lambda x: x/sum(x))

What would be an efficient approach to create df2?

For reproducability:

import pandas as pd
df1 = pd.DataFrame({
    'Date':[20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102],
    'Location':['FE', 'FE', 'FE', 'FE', 'FE', 'RP', 'RP', 'RP', 'FE', 'FE', 'FE', 'FE', 'RP', 'RP', 'RP', 'RP'],
    'Value':[4, 2, 6, 4, 1, 4, 6, 4, 1, 3, 4, 1, 8, 4, 5, 9]})

Many thanks!

>Solution :

Use GroupBy.transform for improve performance:

df2 = df.assign(Relativ_Value=df['Value']
                        .div(df1.groupby(['Date', 'Location'])['Value'].transform('sum'))

dataframe

byMR

Published October 10, 2022

Add a comment

Get a new array from an array of object

byMR

October 10, 2022

Questions

Accessing isActive on NavLink in React Router with TypeScript

byMR

October 10, 2022

Questions

warning C4244: 'return': conversion from 'double' to 'float', possible loss of data

byMR

October 10, 2022

Questions

React-Hook-Form register function without prop spreading

byMR

October 10, 2022

Questions

R – dplyr keep 2 most recent (date) rows

byMR

October 10, 2022

Questions

Python Pandas can't read .xls file though engine is xlrd

byMR

October 10, 2022

Create relative value column by groups of two different columns

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Get a new array from an array of object

Accessing isActive on NavLink in React Router with TypeScript

warning C4244: 'return': conversion from 'double' to 'float', possible loss of data

React-Hook-Form register function without prop spreading

R – dplyr keep 2 most recent (date) rows

Python Pandas can't read .xls file though engine is xlrd

Keep Up to Date with the Most Important News

Create relative value column by groups of two different columns

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Get a new array from an array of object

Accessing isActive on NavLink in React Router with TypeScript

warning C4244: 'return': conversion from 'double' to 'float', possible loss of data

React-Hook-Form register function without prop spreading

R – dplyr keep 2 most recent (date) rows

Python Pandas can't read .xls file though engine is xlrd

Discover more from Dev solutions