Home Pandas most efficient way to filter dataframe based on groupby mask

Questions

Pandas most efficient way to filter dataframe based on groupby mask

June 27, 2022

I would like to filter a dataframe based on the values in that df’s groupby results on a column. For example, if I have a dataframe with columns := ticker, year, price, I’d like to filter out of the df tickers whose first year is >= 1990.

or more technically where the ticker evaluates to True in df.groupby('ticker').['year'].min() < 1990

I am currently doing it this way:

ticker_min_date_bool = df.groupby('ticker')['year'].min() < 1990 # get booleans
tickers_filt = [i for i in ticker_min_date_bool.index if ticker_min_date_bool[i]] # make list of tickers with criteria
df_new = df[df.ticker.isin(tickers_filt)] # filter df based on above list

However this feels a little clumsy to do in 3 lines and doesn’t seem to scale well for larger datasets.

Are there any dataframe methods that accomplish this more efficiently?

>Solution :

Just do transform

ticker_min_date_bool = df.groupby('ticker')['year'].transform('min') < 1990
df_new = df[ticker_min_date_bool]

Or without groupby

s = df.loc[df['year']<1990,'ticker']
df_new = df[df['ticker'].isin(s)]

dataframe

byMR

Published June 27, 2022

Add a comment

How do I get a component to "listen" to changes in global state in a sibling component?

byMR

June 27, 2022

Questions

too many indices for array with matplotlib subplots

byMR

June 27, 2022

Questions

Option for FILTER ISNA MATCH but for sensitive case and can be used on columns with different total rows

byMR

June 27, 2022

Questions

Cast a column as the type of another column

byMR

June 27, 2022

Questions

CSS – don't animate value on elements with certain class?

byMR

June 27, 2022

Questions

How to check if given year, month, day and time format is valid in C++?

byMR

June 27, 2022

Pandas most efficient way to filter dataframe based on groupby mask

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How do I get a component to "listen" to changes in global state in a sibling component?

too many indices for array with matplotlib subplots

Option for FILTER ISNA MATCH but for sensitive case and can be used on columns with different total rows

Cast a column as the type of another column

CSS – don't animate value on elements with certain class?

How to check if given year, month, day and time format is valid in C++?

Keep Up to Date with the Most Important News

Pandas most efficient way to filter dataframe based on groupby mask

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How do I get a component to "listen" to changes in global state in a sibling component?

too many indices for array with matplotlib subplots

Option for FILTER ISNA MATCH but for sensitive case and can be used on columns with different total rows

Cast a column as the type of another column

CSS – don't animate value on elements with certain class?

How to check if given year, month, day and time format is valid in C++?

Discover more from Dev solutions