Unexpected output from pandas' DataFrameGroupBy.diff function

Consider the following piece of python code, which is essentially copied from the first code insert in the Transformation section of pandas‘ user guide’s Group by: split-apply-combine chapter. import pandas as pd import numpy as np speeds = pd.DataFrame( data = {‘class’: [‘bird’, ‘bird’, ‘mammal’, ‘mammal’, ‘mammal’], ‘order’: [‘Falconiformes’, ‘Psittaciformes’, ‘Carnivora’, ‘Primates’, ‘Carnivora’], ‘max_speed’: [389.0,… Read More Unexpected output from pandas' DataFrameGroupBy.diff function

April 23, 2024 MRLeave a comment

Pandas all() but with a threshold

Suppose we have the following dataframe and program logic import pandas as pd df = pd.DataFrame({‘A’: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ‘B’: [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]}) def more_than(series, threshold=5): try: trues = series.value_counts()[True] p = trues / len(series) * 100 except KeyError: p =… Read More Pandas all() but with a threshold

April 17, 2024 MRLeave a comment

How to create a summary table from a dictionnary of lists with different len?

My input is this dict : response = { ‘A’: [‘CATEGORY 2’], ‘B’: [‘CATEGORY 1’, ‘CATEGORY 2’], ‘C’: [], ‘D’: [‘CATEGORY 3′], } And I’m trying to make this kind of dataframe : | ITEM | CATEGORY 1 | CATEGORY 2 | CATEGORY 3 | | A | | x | | | B |… Read More How to create a summary table from a dictionnary of lists with different len?

April 16, 2024 MRLeave a comment

min from columns from dict

I have a dict with item\column name and a df with columns from dict and other columns. How can I add column to df with min value for every item just from columns corresponding from dict? import pandas as pd my_dict={‘Item1’:[‘Col1′,’Col3’], ‘Item2’:[‘Col2′,’Col4’] } df=pd.DataFrame({ ‘Col0’:[‘Item1′,’Item2’], ‘Col1’:[20,25], ‘Col2’:[89,15], ‘Col3’:[26,30], ‘Col4’:[40,108], ‘Col5’:[55,2] }) df[‘min’]=? I tried df[‘min’]=df[df.columns[df.columns.isin(my_dict)]].min(axis=1),… Read More min from columns from dict

April 13, 2024 MRLeave a comment

create new dataframe after performing calculations from groupby

I have a dataframe that looks like this: ID TradeDate party Deal Asset Start Expire Fixed Quantity MTM Float 1 04/11/2024 party1 Sell HO 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00 1 04/11/2024 party1 Sell HO 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00 1 04/11/2024 party1 Sell HO 01/01/2024 02/01/2024 10.00 1000 2500.00 10.00 1 04/11/2024 party1… Read More create new dataframe after performing calculations from groupby

April 11, 2024 MRLeave a comment

How to get timestamp differences per group?

I have a column date in my dataset and column group.i want to get parameter differrence between min and max date in "date" column per group. how to do that? here i example of my data: group date main 2024-01-01 main 2024-01-03 main 2024-01-05 second 2024-02-05 second 2024-02-20 desire result: group date_diff main 4 second… Read More How to get timestamp differences per group?

April 10, 2024 MRLeave a comment

Create new row index and calculate sum for each column

I have a dataframe that is shaped like this: com1 com2 com3 party1 10 0 0 party2 0 20 10 party3 0 0 25 I want to create a new row index called total, and then take the sum of each column and display it like this com1 com2 com3 party1 10 0 0 party2… Read More Create new row index and calculate sum for each column

April 8, 2024 MRLeave a comment

How to highlight the membership of a column of identifiers?

My input is this df : df = pd.DataFrame({‘group’: [‘A’, ‘A’, ‘A’, ‘A’, ‘B’, ‘B’, ‘B’, ‘C’], ‘identifier’: [1, 2, 1, 3, 1, 2, 5, 4]}) print(df) group identifier 0 A 1 1 A 2 2 A 1 3 A 3 4 B 1 5 B 2 6 B 5 7 C 4 And my… Read More How to highlight the membership of a column of identifiers?

April 8, 2024 MRLeave a comment

Pandas rolling average in time window

I have the dataframe below. event_timestamp is a column of type dtype: datetime64[ns]. event_timestamp value 2024-02-02 09:29:19.623481531 8 2024-02-02 09:29:19.907333355 9 2024-02-02 09:29:19.907373437 10 2024-02-02 09:29:21.366842178 11 2024-02-02 09:29:21.366886264 12 2024-02-02 09:29:21.512928275 13 2024-02-02 09:29:21.512968294 14 2024-02-02 09:29:23.050536162 15 2024-02-02 09:29:23.300983260 16 2024-02-02 09:29:23.318874509 17 2024-02-02 09:29:23.318916726 18 What I am trying to achieve: For… Read More Pandas rolling average in time window

April 5, 2024 MRLeave a comment

How can I efficiently optimize the creation of conditional columns based on multiple columns in Pandas DataFrames?

I have a dataframe with over 20 thousand rows and need to create a column based on more than 10 conditions from four other columns. Instead of writing multiple lines of .loc, I decided to create a function. To enhance the performance and readability of this function, I opted to group the necessary columns into… Read More How can I efficiently optimize the creation of conditional columns based on multiple columns in Pandas DataFrames?