I have a dataframe that looks like this:
ID TradeDate party Deal Asset Start Expire Fixed Quantity MTM Float
1 04/11/2024 party1 Sell HO 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
1 04/11/2024 party1 Sell HO 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
1 04/11/2024 party1 Sell HO 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
1 04/11/2024 party1 Buy HO 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
1 04/11/2024 party1 Buy HO 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
1 04/11/2024 party1 Sell WTI 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
1 04/11/2024 party1 Sell WTI 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
1 04/11/2024 party1 Buy WTI 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
I group the data by Start, Asset, and Deal and I perform caluculations to transform the dataframe into this:
groups = df.groupby(['Start', 'Asset', 'Deal'])
ID TradeDate party Deal Asset Start Expire Fixed Quantity MTM Float
1 04/11/2024 party1 Sell HO 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
1 04/11/2024 party1 Sell HO 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
1 04/11/2024 party1 Sell HO 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
total 3000 7500.00
ID TradeDate party Deal Asset Start Expire Fixed Quantity MTM Float
1 04/11/2024 party1 Buy HO 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
1 04/11/2024 party1 Buy HO 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
total 3000 5000.00
ID TradeDate party Deal Asset Start Expire Fixed Quantity MTM Float
1 04/11/2024 party1 Sell WTI 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
1 04/11/2024 party1 Sell WTI 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
total 3000 5000.00
ID TradeDate party Deal Asset Start Expire Fixed Quantity MTM Float
1 04/11/2024 party1 Buy WTI 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00
total 1000 2500.00
My objective is to transform these groups another time so that I can output only the data I need. The expected output for this step should look something like this:
party Deal Asset Start MTM Float
party1 Sell HO 01/01/2024 7500.00 10.00
party1 Buy HO 01/01/2024 5000.00 10.00
party1 Sell WTI 01/01/2024 5000.00 10.00
party1 Buy WTi 01/01/2024 2500.00 10.00
Do I need to perform another groupby of some sort? or is there another function that could achieve this? Any suggestions would help.
Note: in the second step, those are individual dataframes that come from a list. You may need an intermediate step to concat these together and then get to the final output.
>Solution :
I think you just need pd.Groupby + .agg:
out = df.groupby(["Start", "Asset", "Deal"], sort=False, as_index=False).agg(
{"party": "first", "MTM": "sum", "Float": "first"}
)
print(out)
Prints:
Start Asset Deal party MTM Float
0 01/01/2024 HO Sell party1 7500.0 10.0
1 01/01/2024 HO Buy party1 5000.0 10.0
2 01/01/2024 WTI Sell party1 5000.0 10.0
3 01/01/2024 WTI Buy party1 2500.0 10.0