How to group each individual transaction from a buy and sell history

Given a data frame that is sorted by symbol and date. I would like to add a new column that identifies each individual transaction. Each individual transaction means that the position is opened and closed after a set of buy and sell orders.

Input data

time           symbol        side      price     size
08-19-2021      AAPL          buy        150       1 ----> position opened here
08-20-2021      AAPL          buy        100       1
08-21-2021      AAPL          buy        200       1
08-22-2021      AAPL          sell        300      1
08-23-2021      AAPL          sell       350       0.5
08-24-2021      AAPL          sell        400      1.5 ----> position closed here
08-25-2021      AAPL          buy        150       2  -----> position opened here
08-26-2021      AAPL          sell        100      2  ----> position closed here
08-27-2021      AAPL          buy        100       2 ---> position opened here
08-28-2021      AAPL          sell        150      1.5
08-29-2021      AAPL          buy        100       0.5
08-29-2021      AAPL          sell        50       0.5
08-29-2021      AAPL          sell       50        0.5 ---> position closed here
08-29-2021      AAPL          buy        100       1 ----> position opened here
08-29-2021      AAPL          sell        50       0.5 ---> position still open here

Desired output (added column trade #)

time           symbol        side      price     size   trade #
08-19-2021      AAPL          buy        150       1      0
08-20-2021      AAPL          buy        100       1      0
08-21-2021      AAPL          buy        200       1      0
08-22-2021      AAPL          sell        300      1      0
08-23-2021      AAPL          sell       350       0.5    0
08-24-2021      AAPL          sell        400      1.5    0
-------------------------------
08-25-2021      AAPL          buy        150       2      2
08-27-2021      AAPL          sell        100      2      2 
-------------------------------    
08-28-2021      AAPL          buy        100       2      3
08-29-2021      AAPL          sell        150      1.5    3
08-29-2021      AAPL          buy        100       0.5    3
08-29-2021      AAPL          sell        50       0.5    3
08-29-2021      AAPL          sell       50        0.5    3
------------------------------
08-29-2021      AAPL          buy        100       1      4
08-29-2021      AAPL          sell        50       0.5    4

The last trade is still open so I would like to still have a number for it.

Input data

pd.DataFrame.from_dict({‘symbol’: [‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’],
‘side’: [‘buy’, ‘buy’, ‘buy’, ‘sell’, ‘sell’, ‘sell’, ‘buy’, ‘sell’, ‘buy’, ‘sell’, ‘buy’, ‘sell’, ‘sell’, ‘buy’, ‘sell’],
‘size’: [1,1,1,1,0.5,1.5,2,2,2,1.5, 0.5, 0.5, 0.5, 1, 0.5]})

>Solution :

Your position is basically closed when your total size is zero. So in a first step compute the size of your position by

df['position_size']=(df.side.map({'buy':1,'sell':-1})*df['size']).cumsum()

Then you can identify the points where the position changes using by checking its position_size. The cummulative sum of these changepoints is your trade number.

df['trade_nr']=(df.position_size==0).shift(1,fill_value=0).cumsum()

Leave a Reply