Given a data frame that is sorted by symbol and date. I would like to add a new column that identifies each individual transaction. Each individual transaction means that the position is opened and closed after a set of buy and sell orders.
Input data
time symbol side price size
08-19-2021 AAPL buy 150 1 ----> position opened here
08-20-2021 AAPL buy 100 1
08-21-2021 AAPL buy 200 1
08-22-2021 AAPL sell 300 1
08-23-2021 AAPL sell 350 0.5
08-24-2021 AAPL sell 400 1.5 ----> position closed here
08-25-2021 AAPL buy 150 2 -----> position opened here
08-26-2021 AAPL sell 100 2 ----> position closed here
08-27-2021 AAPL buy 100 2 ---> position opened here
08-28-2021 AAPL sell 150 1.5
08-29-2021 AAPL buy 100 0.5
08-29-2021 AAPL sell 50 0.5
08-29-2021 AAPL sell 50 0.5 ---> position closed here
08-29-2021 AAPL buy 100 1 ----> position opened here
08-29-2021 AAPL sell 50 0.5 ---> position still open here
Desired output (added column trade #)
time symbol side price size trade #
08-19-2021 AAPL buy 150 1 0
08-20-2021 AAPL buy 100 1 0
08-21-2021 AAPL buy 200 1 0
08-22-2021 AAPL sell 300 1 0
08-23-2021 AAPL sell 350 0.5 0
08-24-2021 AAPL sell 400 1.5 0
-------------------------------
08-25-2021 AAPL buy 150 2 2
08-27-2021 AAPL sell 100 2 2
-------------------------------
08-28-2021 AAPL buy 100 2 3
08-29-2021 AAPL sell 150 1.5 3
08-29-2021 AAPL buy 100 0.5 3
08-29-2021 AAPL sell 50 0.5 3
08-29-2021 AAPL sell 50 0.5 3
------------------------------
08-29-2021 AAPL buy 100 1 4
08-29-2021 AAPL sell 50 0.5 4
The last trade is still open so I would like to still have a number for it.
Input data
pd.DataFrame.from_dict({‘symbol’: [‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’, ‘AAPL’],
‘side’: [‘buy’, ‘buy’, ‘buy’, ‘sell’, ‘sell’, ‘sell’, ‘buy’, ‘sell’, ‘buy’, ‘sell’, ‘buy’, ‘sell’, ‘sell’, ‘buy’, ‘sell’],
‘size’: [1,1,1,1,0.5,1.5,2,2,2,1.5, 0.5, 0.5, 0.5, 1, 0.5]})
>Solution :
Your position is basically closed when your total size is zero. So in a first step compute the size of your position by
df['position_size']=(df.side.map({'buy':1,'sell':-1})*df['size']).cumsum()
Then you can identify the points where the position changes using by checking its position_size
. The cummulative sum of these changepoints is your trade number.
df['trade_nr']=(df.position_size==0).shift(1,fill_value=0).cumsum()