Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to add up value in column v, by multiplying previous cell with a fixed factor, then add current cell of another column

Starting with a "pd.DataFrame" df :~

         n    v
    0    1  0.0
    1    2  0.0
    2    3  0.0
    3    4  0.0
    4    5  0.0
    5    6  0.0

I’d like to add up value in column "v", whereby a cell in column "v" is produced by multiplying previous cell of "v" with a fixed factor, then add current cell value of column "n". (See sample calculation table below)

    ## sample calculation table :~
    n   v[i]      n  + v[i-1] * fixed factor 
    1   1.0     = 1  + 0.0    * 0.5
    2   2.5     = 2  + 1.0    * 0.5   
    3   4.25    = 3  + 2.5    * 0.5
    4   6.125   = 4  + 4.25   * 0.5
    5   8.0625  = 5  + 6.125  * 0.5
    6  10.03125 = 6  + 8.063  * 0.5

Managed to do it with row-by-row iteration (see for-loop below).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

However I think vectorised methods (like cumsum and shift) may be more efficient, but could not figure out how; because cumsum is complicated by multiplication, starting with an empty column v, and need to reference to previous cell of a column.

Wonder how to do this with vectorised methods ?

To reproduce my code :~

    df = pd.DataFrame({'n':[1,2,3,4,5,6]})
    df['v'] = 0.0

    def fnv(nu, vu):
        return nu + vu * 0.5

    for i in range(0, df.shape[0]):
        df.v.at[i] = fnv(df.n.at[i], df.v.at[i-1] if i>0 else 0.0)
    df (RESULTS) :~
         n      v
    0    1   1.00
    1    2   2.50
    2    3   4.25
    3    4   6.12
    4    5   8.06
    5    6  10.03

>Solution :

Since n is variable, you can’t easily vectorize this (you could using a matrix operation, see below, but this would take O(n^2) space).

A good tradeoff might be to use numba to speed the operation:

from numba import jit

@jit(nopython=True)
def fnv(n, factor=0.5):
    out = []
    prev = 0
    for x in n:
        out.append(x + prev*factor)
        prev = out[-1]
    return out

df['v'] = fnv(df['n'].to_numpy())

Output:

   n         v
0  1   1.00000
1  2   2.50000
2  3   4.25000
3  4   6.12500
4  5   8.06250
5  6  10.03125

vectorized approach

You could vectorize using a square matrix:

x = np.arange(len(df))
tmp = x[:, None]-x
df['v'] = np.nansum(np.where(tmp>=0, 0.5**tmp, np.nan) * df['n'].to_numpy(),
                    axis=1)

Intermediates:

# tmp
[[ 0 -1 -2 -3 -4 -5]
 [ 1  0 -1 -2 -3 -4]
 [ 2  1  0 -1 -2 -3]
 [ 3  2  1  0 -1 -2]
 [ 4  3  2  1  0 -1]
 [ 5  4  3  2  1  0]]

# tmp >= 0
[[ True False False False False False]
 [ True  True False False False False]
 [ True  True  True False False False]
 [ True  True  True  True False False]
 [ True  True  True  True  True False]
 [ True  True  True  True  True  True]]

# np.where(tmp>=0, 0.5**tmp, np.nan) * df['n'].to_numpy()
[[1.          nan     nan     nan     nan     nan]
 [0.5     2.          nan     nan     nan     nan]
 [0.25    1.      3.          nan     nan     nan]
 [0.125   0.5     1.5     4.          nan     nan]
 [0.0625  0.25    0.75    2.      5.          nan]
 [0.03125 0.125   0.375   1.      2.5     6.     ]]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading