Home How to add up value in column v, by multiplying previous cell with a fixed factor, then add current cell of another column

Questions

How to add up value in column v, by multiplying previous cell with a fixed factor, then add current cell of another column

byMR

September 23, 2024

Starting with a "pd.DataFrame" df :~

         n    v
    0    1  0.0
    1    2  0.0
    2    3  0.0
    3    4  0.0
    4    5  0.0
    5    6  0.0

I’d like to add up value in column "v", whereby a cell in column "v" is produced by multiplying previous cell of "v" with a fixed factor, then add current cell value of column "n". (See sample calculation table below)

    ## sample calculation table :~
    n   v[i]      n  + v[i-1] * fixed factor 
    1   1.0     = 1  + 0.0    * 0.5
    2   2.5     = 2  + 1.0    * 0.5   
    3   4.25    = 3  + 2.5    * 0.5
    4   6.125   = 4  + 4.25   * 0.5
    5   8.0625  = 5  + 6.125  * 0.5
    6  10.03125 = 6  + 8.063  * 0.5

Managed to do it with row-by-row iteration (see for-loop below).

However I think vectorised methods (like cumsum and shift) may be more efficient, but could not figure out how; because cumsum is complicated by multiplication, starting with an empty column v, and need to reference to previous cell of a column.

Wonder how to do this with vectorised methods ?

To reproduce my code :~

    df = pd.DataFrame({'n':[1,2,3,4,5,6]})
    df['v'] = 0.0

    def fnv(nu, vu):
        return nu + vu * 0.5

    for i in range(0, df.shape[0]):
        df.v.at[i] = fnv(df.n.at[i], df.v.at[i-1] if i>0 else 0.0)

    df (RESULTS) :~
         n      v
    0    1   1.00
    1    2   2.50
    2    3   4.25
    3    4   6.12
    4    5   8.06
    5    6  10.03

>Solution :

Since n is variable, you can’t easily vectorize this (you could using a matrix operation, see below, but this would take O(n^2) space).

A good tradeoff might be to use numba to speed the operation:

from numba import jit

@jit(nopython=True)
def fnv(n, factor=0.5):
    out = []
    prev = 0
    for x in n:
        out.append(x + prev*factor)
        prev = out[-1]
    return out

df['v'] = fnv(df['n'].to_numpy())

Output:

   n         v
0  1   1.00000
1  2   2.50000
2  3   4.25000
3  4   6.12500
4  5   8.06250
5  6  10.03125

vectorized approach

You could vectorize using a square matrix:

x = np.arange(len(df))
tmp = x[:, None]-x
df['v'] = np.nansum(np.where(tmp>=0, 0.5**tmp, np.nan) * df['n'].to_numpy(),
                    axis=1)

Intermediates:

# tmp
[[ 0 -1 -2 -3 -4 -5]
 [ 1  0 -1 -2 -3 -4]
 [ 2  1  0 -1 -2 -3]
 [ 3  2  1  0 -1 -2]
 [ 4  3  2  1  0 -1]
 [ 5  4  3  2  1  0]]

# tmp >= 0
[[ True False False False False False]
 [ True  True False False False False]
 [ True  True  True False False False]
 [ True  True  True  True False False]
 [ True  True  True  True  True False]
 [ True  True  True  True  True  True]]

# np.where(tmp>=0, 0.5**tmp, np.nan) * df['n'].to_numpy()
[[1.          nan     nan     nan     nan     nan]
 [0.5     2.          nan     nan     nan     nan]
 [0.25    1.      3.          nan     nan     nan]
 [0.125   0.5     1.5     4.          nan     nan]
 [0.0625  0.25    0.75    2.      5.          nan]
 [0.03125 0.125   0.375   1.      2.5     6.     ]]