Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas most efficient way to filter dataframe based on groupby mask

I would like to filter a dataframe based on the values in that df’s groupby results on a column. For example, if I have a dataframe with columns := ticker, year, price, I’d like to filter out of the df tickers whose first year is >= 1990.

or more technically where the ticker evaluates to True in df.groupby('ticker').['year'].min() < 1990

I am currently doing it this way:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ticker_min_date_bool = df.groupby('ticker')['year'].min() < 1990 # get booleans
tickers_filt = [i for i in ticker_min_date_bool.index if ticker_min_date_bool[i]] # make list of tickers with criteria
df_new = df[df.ticker.isin(tickers_filt)] # filter df based on above list

However this feels a little clumsy to do in 3 lines and doesn’t seem to scale well for larger datasets.

Are there any dataframe methods that accomplish this more efficiently?

>Solution :

Just do transform

ticker_min_date_bool = df.groupby('ticker')['year'].transform('min') < 1990
df_new = df[ticker_min_date_bool]

Or without groupby

s = df.loc[df['year']<1990,'ticker']
df_new = df[df['ticker'].isin(s)]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading