Home Query/Filter a pandas df using a dict of lists

Questions

Query/Filter a pandas df using a dict of lists

February 6, 2023

My problem

I have a dict d that can be of varying length consisting of the following format:

d = {
  "foo": [
    50,
    100
  ],
  "bar": [
    5,
    10
  ]
}

Where the key is a column name and the value is a two length list for the min and max value of said column to filter a datframe df on. Thus, given the input above I’d like to filter df.foo between 50-100 and df.bar between 5-10.

What I have tried

Of course, I could just hard code it like so:

df.loc[(df.list(d.items())[0][0] > list(d.items())[0][1][0]) & (df.list(d.items())[0][0] < list(d.items())[0][1][1]) ...]

etc, but the number of keys (columns to filter on) may vary and also this just incredibly ugly code. Is there a cleaner/vectorized way to do this?

Context

I am building a streamlit app where a user can create n min max filters on a dataframe, and the format listed above is the format streamlit’s slider returns

>Solution :

IIUC, one way using pandas.Series.between:

# sample
import numpy as np
np.random.seed(1234)

df = pd.DataFrame({"foo": np.random.random(10) * 100,
                   "bar": np.random.random(10) * 10})

         foo       bar
0  19.151945  3.578173
1  62.210877  5.009951
2  43.772774  6.834629
3  78.535858  7.127020
4  77.997581  3.702508
5  27.259261  5.611962
6  27.646426  5.030832
7  80.187218  0.137684
8  95.813935  7.728266
9  87.593263  8.826412

Code:

new_df = df[np.logical_and(*[df[k].between(*v) for k, v in d.items()])]
print(new_df)

Output:

         foo       bar
1  62.210877  5.009951
3  78.535858  7.127020
8  95.813935  7.728266
9  87.593263  8.826412

pandas

byMR

Published February 06, 2023

Add a comment

Rename Pandas column values

byMR

February 6, 2023

Questions

AWS SQS how does SQS know that processing of a message failed?

byMR

February 6, 2023

Questions

Pyarrow schema with Timestamp unit 's' when written to Parquet changed to 'ms' upon reloaded

byMR

February 6, 2023

Questions

Reading specific number of lines

byMR

February 6, 2023

Questions

I am unable to get the values from a form from cshtml page to my controller

byMR

February 6, 2023

Questions

I am unable to get the values from a form from cshtml page to my controller

byMR

February 6, 2023

Query/Filter a pandas df using a dict of lists

My problem

What I have tried

MEDevel.com: Open-source for Healthcare and Education

Context

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Rename Pandas column values

AWS SQS how does SQS know that processing of a message failed?

Pyarrow schema with Timestamp unit 's' when written to Parquet changed to 'ms' upon reloaded

Reading specific number of lines

I am unable to get the values from a form from cshtml page to my controller

I am unable to get the values from a form from cshtml page to my controller

Keep Up to Date with the Most Important News

Query/Filter a pandas df using a dict of lists

My problem

What I have tried

MEDevel.com: Open-source for Healthcare and Education

Context

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Rename Pandas column values

AWS SQS how does SQS know that processing of a message failed?

Pyarrow schema with Timestamp unit 's' when written to Parquet changed to 'ms' upon reloaded

Reading specific number of lines

I am unable to get the values from a form from cshtml page to my controller

I am unable to get the values from a form from cshtml page to my controller

Discover more from Dev solutions