My problem
I have a dict d that can be of varying length consisting of the following format:
d = {
"foo": [
50,
100
],
"bar": [
5,
10
]
}
Where the key is a column name and the value is a two length list for the min and max value of said column to filter a datframe df on. Thus, given the input above I’d like to filter df.foo between 50-100 and df.bar between 5-10.
What I have tried
Of course, I could just hard code it like so:
df.loc[(df.list(d.items())[0][0] > list(d.items())[0][1][0]) & (df.list(d.items())[0][0] < list(d.items())[0][1][1]) ...]
etc, but the number of keys (columns to filter on) may vary and also this just incredibly ugly code. Is there a cleaner/vectorized way to do this?
Context
I am building a streamlit app where a user can create n min max filters on a dataframe, and the format listed above is the format streamlit’s slider returns
>Solution :
IIUC, one way using pandas.Series.between:
# sample
import numpy as np
np.random.seed(1234)
df = pd.DataFrame({"foo": np.random.random(10) * 100,
"bar": np.random.random(10) * 10})
foo bar
0 19.151945 3.578173
1 62.210877 5.009951
2 43.772774 6.834629
3 78.535858 7.127020
4 77.997581 3.702508
5 27.259261 5.611962
6 27.646426 5.030832
7 80.187218 0.137684
8 95.813935 7.728266
9 87.593263 8.826412
Code:
new_df = df[np.logical_and(*[df[k].between(*v) for k, v in d.items()])]
print(new_df)
Output:
foo bar
1 62.210877 5.009951
3 78.535858 7.127020
8 95.813935 7.728266
9 87.593263 8.826412