I have to compute a column based on operation between two pandas dataframes. Let me give an example.
Some example would be:
df1 = pd.DataFrame({'data': [0.4, 0.112, 0.7]})
df2 = pd.DataFrame({'data': [321, 3 , 1, 1, 2, ,4, 5, 6, 7, 8, 9, 12, 0.4, 0.112]})
For the last value in the first dataframe, I would want to do the following computation:
(0.7 - df2.mean()) / df2.std()
For the value before the last value in the first dataframe, I would like to do the same operation, but ignoring the last value in the second dataframe:
(0.4 - df2[:-1].mean()) / df2[:-1].std()
same logic for the first value:
(0.112 - df2[:-2].mean()) / df2[:-2].std()
So, in general, the i-th value in my final result should be calculated as follows:
result[i] = (df1[i] - df2[:(df2.shape[0] - i].mean()) / df2[:(df2.shape[0] - i].std()
How to do this using Pandas without python for loops?
>Solution :
You can use expanding functions:
e = df2['data'].expanding()
df1['out'] = ((df1['data']-e.mean()[-len(df1):].to_numpy()[:len(df1)])
/e.std()[-len(df1):].to_numpy()
)
Output:
data out
0 0.400 -0.341906
1 0.112 -0.331316
2 0.700 -0.311916