Home How to calculate new column based on values from multiple columns

Questions

How to calculate new column based on values from multiple columns

August 1, 2022

I’m trying to calculate the standard deviation of values from other columns, but also based on another column value, like this:

Consider this sample dataframe:

product	january	february	march	april	sales_months
prod1	3	6	7	1	4
prod2	0	0	5	14	2
prod3	2	4	67	1	4

The sales_months column indicates how many months of sales the product has had. So, if a product has only 2 months of sales (because it’s a new product, as is the case with the second row) we only use the data for the last 2 months to get the standard deviation. If it has a 4 in the sales_months column, it indicates it has been sold for the whole 4 month period, so we can use the data of all 4 columns to get the standar deviation.
So, I would like to get something like this:

product	january	february	march	april	sales_months	std_dev
prod1	3	6	7	1	4	2.38
prod2	0	0	5	14	2	4.5
prod3	2	4	67	1	4	28.02

How can I do this in for all rows of the dataframe. I tried building a function and applying it the dataframe, with my limited pandas knowledge, but I just caused the kernel to die:

def get_std_dev(row):
    std_dev = 0.0
    months = row.SALES_MONTHS
    if months < 4:
        m_tmp = int(months)
        std_dev = df_co_info.iloc[:, 1:m_tmp].std(axis=1)
    else:
        std_dev = df_co_info.iloc[:, 1:4].std(axis=1)
    return std_dev

df_co_info['stdDev'] = df_co_info.apply(get_std_dev, axis = 1)

>Solution :

Using numpy and a mask:

df2 = df.drop(columns=['product', 'sales_months'])
a = df2.to_numpy()
mask = np.arange(a.shape[1]) >= a.shape[1]-df['sales_months'].to_numpy()[:,None]

df['std_dev'] = df2.where(mask).std(axis=1)

output:

  product  january  february  march  april  sales_months    std_dev
0   prod1        3         6      7      1             4   2.753785
1   prod2        0         0      5     14             2   6.363961
2   prod3        2         4     67      1             4  32.357379

masked df2:

   january  february  march  april
0      3.0       6.0      7      1
1      NaN       NaN      5     14
2      2.0       4.0     67      1

dataframe

byMR

Published August 01, 2022

Add a comment

Override colour of a classes sub div

byMR

August 1, 2022

Questions

Remove specific td borders on a table

byMR

August 1, 2022

Questions

Mark a Date as "Overdue 10-13 days" in Google Sheets

byMR

August 1, 2022

Questions

Free memory by external function

byMR

August 1, 2022

Questions

Selenium python- Is it possible to draw a circle around clicked GUI elements?

byMR

August 1, 2022

Questions

Angular don't apply styles

byMR

August 1, 2022

How to calculate new column based on values from multiple columns

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Override colour of a classes sub div

Remove specific td borders on a table

Mark a Date as "Overdue 10-13 days" in Google Sheets

Free memory by external function

Selenium python- Is it possible to draw a circle around clicked GUI elements?

Angular don't apply styles

Keep Up to Date with the Most Important News

How to calculate new column based on values from multiple columns

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Override colour of a classes sub div

Remove specific td borders on a table

Mark a Date as "Overdue 10-13 days" in Google Sheets

Free memory by external function

Selenium python- Is it possible to draw a circle around clicked GUI elements?

Angular don't apply styles

Discover more from Dev solutions