Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Store the result in new columns, named based on another variable (Pandas)

I have a dataframe. What I need is to calculate the difference between the variables A and B, and store the result in the new columns based on the variable df['Value']. If the Value == 1, then the result is stored in column named Diff_1, if the Value == 2, then column Diff_2, and so on.

Here is the code so far, but obviously the line df_red['Diff_' + str(value) ] = df_red['A'] - df_red['B'] is not doing what I want:

import pandas as pd

df = pd.read_excel(r'E:\...\.xlsx')
print(df)

value = list(set(df['Value']))
print(value)

for value in value:
    df_red = df[df['Value'] == value]
    df_red['Diff_' + str(value) ] = df_red['A'] - df_red['B']


Out[126]: 
   ID  Value      A     B
0   1      1   56.0  49.0
1   2      3   56.0  50.0
2   3      4  103.0  44.0
3   4      2   89.0  44.0
4   5      1   84.0  41.0
5   6      1   77.0  43.0
6   7      2   71.0  35.0
7   8      4   77.0  32.0
print(value)
[1, 2, 3, 4]

After a simple operation of df['A'] - df['B'] the result should look like this.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Out[128]: 
   ID  Value      A     B  Diff_1  Diff_2  Diff_3  Diff_4
0   1      1   56.0  49.0     7.0     0.0     0.0     0.0
1   2      3   56.0  50.0     0.0     0.0     6.0     0.0
2   3      4  103.0  44.0     0.0     0.0     0.0    60.0
3   4      2   89.0  44.0     0.0    45.0     0.0     0.0
4   5      1   84.0  41.0    43.0     0.0     0.0     0.0
5   6      1   77.0  43.0    34.0     0.0     0.0     0.0
6   7      2   71.0  35.0     0.0    36.0     0.0     0.0
7   8      4   77.0  32.0     0.0     0.0     0.0    45.0

Not so great way of doing this would be like this, however I am looking for some more efficient, better ways:

df['Diff_1'] = df[df['Value']==1]['A'] - df[df['Value']==1]['B']
df['Diff_2'] = df[df['Value']==2]['A'] - df[df['Value']==2]['B']
df['Diff_3'] = df[df['Value']==3]['A'] - df[df['Value']==3]['B']
df['Diff_4'] = df[df['Value']==4]['A'] - df[df['Value']==4]['B']

>Solution :

Here is my approach which may not be fastest, but it’s a start:

for i in df['Value'].unique():
   df.loc[df['Value'] == i, 'Diff_' + str(i)] = df['A'] - df['B']
   df.fillna(0, inplace = True)

Output of my fake data:

  Value A   B   Diff_1  Diff_2  Diff_3  Diff_4
0   1   20  2   18.0    0.0     0.0     0.0
1   1   30  5   25.0    0.0     0.0     0.0
2   2   40  7   0.0     33.0    0.0     0.0
3   2   50  15  0.0     35.0    0.0     0.0
4   3   60  25  0.0     0.0     35.0    0.0
5   3   20  7   0.0     0.0     13.0    0.0
6   4   15  36  0.0     0.0     0.0     -21.0
7   4   14  3   0.0     0.0     0.0     11.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading