Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Some manipulations with groupby Pandas

I have this Dataframe

import pandas as pd
import math
from pandas import Timestamp

Date = [Timestamp('2024-03-16 23:59:42'), Timestamp('2024-03-16 23:59:42'), Timestamp('2024-03-16 23:59:44'), Timestamp('2024-03-16 23:59:44'), Timestamp('2024-03-16 23:59:44'), Timestamp('2024-03-16 23:59:47'), Timestamp('2024-03-16 23:59:48'), Timestamp('2024-03-16 23:59:48'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49'), Timestamp('2024-03-16 23:59:49')]
Price = [0.6729, 0.6728, 0.6728, 0.6728, 0.6728, 0.673, 0.6728, 0.6729, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6728, 0.6729, 0.6728]
Side = [-1, -1, -1, 1, -1, 1, -1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, -1]
Amount = [1579.2963000000002, 7.400799999999999, 6.728, 177.61919999999998, 797.2679999999999, 33650.0, 131.196, 48.448800000000006, 0.6728, 0.6728, 0.6728, 6.728, 0.6728, 1.3456, 0.6728, 0.6728, 0.6728, 0.6728, 0.6729, 0.6728]
buy = [math.nan, math.nan, math.nan, 177.61919999999998, math.nan, 33650.0, math.nan, 48.448800000000006, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, math.nan, 49.121700000000004, math.nan]

df = pd.DataFrame({
    'Date':Date,
    'Price':Price,
    'Side':Side,
    'Amount':Amount,
    'buy':buy
})

print(df)

I got buy column using

df['buy'] = df[df['Side'] == 1].groupby([df['Date'].dt.floor('H'), 'Price'])['Amount'].cumsum()

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

But I want to get 0 in the buy column instead of nan values, if this price has not yet been met in the group or the previous value of the cumulative sum

Result buy column need – [0,0,0,177.6192,177.6192,33650, 177.6192,48.4488, 177.6192,…..]

How can I implement this?

>Solution :

You could reindex, ffill and fillna:

df['buy'] = (df[df['Side'] == 1].groupby([df['Date'].dt.floor('H'), 'Price'])['Amount'].cumsum()
             .reindex(df.index).ffill().fillna(0)
            )

Or in two steps:

df['buy'] = df[df['Side'] == 1].groupby([df['Date'].dt.floor('H'), 'Price'])['Amount'].cumsum()
df['buy'] = df['buy'].ffill().fillna(0)

Output:

                  Date   Price  Side      Amount         buy
0  2024-03-16 23:59:42  0.6729    -1   1579.2963      0.0000
1  2024-03-16 23:59:42  0.6728    -1      7.4008      0.0000
2  2024-03-16 23:59:44  0.6728    -1      6.7280      0.0000
3  2024-03-16 23:59:44  0.6728     1    177.6192    177.6192
4  2024-03-16 23:59:44  0.6728    -1    797.2680    177.6192
5  2024-03-16 23:59:47  0.6730     1  33650.0000  33650.0000
6  2024-03-16 23:59:48  0.6728    -1    131.1960  33650.0000
7  2024-03-16 23:59:48  0.6729     1     48.4488     48.4488
8  2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
9  2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
10 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
11 2024-03-16 23:59:49  0.6728    -1      6.7280     48.4488
12 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
13 2024-03-16 23:59:49  0.6728    -1      1.3456     48.4488
14 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
15 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
16 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
17 2024-03-16 23:59:49  0.6728    -1      0.6728     48.4488
18 2024-03-16 23:59:49  0.6729     1      0.6729     49.1217
19 2024-03-16 23:59:49  0.6728    -1      0.6728     49.1217

Alternatively, if you want to fill with the first valid non-NA you could use:

df['buy'] = df[df['Side'] == 1].groupby([df['Date'].dt.floor('H'), 'Price'])['Amount'].cumsum()
m = df['buy'].notna()
df['buy'] = df['buy'].fillna(df.loc[df['buy'].first_valid_index(), 'buy']).where(m.cummax(), 0)

Output:

                  Date   Price  Side      Amount         buy
0  2024-03-16 23:59:42  0.6729    -1   1579.2963      0.0000
1  2024-03-16 23:59:42  0.6728    -1      7.4008      0.0000
2  2024-03-16 23:59:44  0.6728    -1      6.7280      0.0000
3  2024-03-16 23:59:44  0.6728     1    177.6192    177.6192
4  2024-03-16 23:59:44  0.6728    -1    797.2680    177.6192
5  2024-03-16 23:59:47  0.6730     1  33650.0000  33650.0000
6  2024-03-16 23:59:48  0.6728    -1    131.1960    177.6192
7  2024-03-16 23:59:48  0.6729     1     48.4488     48.4488
8  2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
9  2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
10 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
11 2024-03-16 23:59:49  0.6728    -1      6.7280    177.6192
12 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
13 2024-03-16 23:59:49  0.6728    -1      1.3456    177.6192
14 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
15 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
16 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
17 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
18 2024-03-16 23:59:49  0.6729     1      0.6729     49.1217
19 2024-03-16 23:59:49  0.6728    -1      0.6728    177.6192
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading