Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter pandas column based on ranges in a huge list

Trying to filter ‘time’ data into ‘time_filtered’ based on lut_lst ranges, ergo if ‘time’ value falls in any of the ranges, exchange with NaN otherwise copy value into new column.

import numpy as np
# creates look up list for ranges that need to be excluded
lut_lst = []
for i in range(0,2235,15):
    a= range(i,2+i)
    b= range(14+i, 15+i)
    lut_lst.append(a)
    lut_lst.append(b)

lut_lst
[range(0, 2),
 range(14, 15),
 range(15, 17),
 range(29, 30),
 range(30, 32),
 range(44, 45),
 range(45, 47),
 range(59, 60),
...
 range(2190, 2192),
 range(2204, 2205),
 range(2205, 2207),
 range(2219, 2220),
 range(2220, 2222),
 range(2234, 2235)]


## if 'time' value falls in any of the ranges of lut_lst, replace values with NaN (drop row)
data_cols = ['filename', 'time']
data_vals = [['cell1', 0.0186],
        ['cell1', 0.0774],
        ['cell1', 2.2852],
        ['cell1', 2.3788],
        ['cell1', 14.62],
        ['cell1', 15.04],
        ['cell2', 20.3416],
        ['cell2', 20.9128],
        ['cell2', 29.6784],
        ['cell2', 30.1194],
        ['cell2', 32.3304]]
        
df = pd.DataFrame(data_vals, columns=data_cols)

# trying to filter 'time' but can't get INTO the ranges
df['time_filtered'] = df['time'].apply(lambda x: x if (x not in lut_lst) else np.nan)

The output for df is not filtered. I tried using any(lut_lst) or all(lut_lst) but that just threw an error.

df
filename    record  time    time_filtered
0   cell1   1   0.0186  0.0186
1   cell1   1   0.0774  0.0774
2   cell1   1   2.2852  2.2852
3   cell1   25  2.3788  2.3788
4   cell1   25  14.6200 14.6200
5   cell1   101 15.0400 15.0400
6   cell2   2   20.3416 20.3416
7   cell2   2   20.9128 20.9128
8   cell2   50  29.6784 29.6784
9   cell2   50  30.1194 30.1194
10  cell2   80  32.3304 32.3304

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use tuples instead of ranges in lut_lst, and change your filter slightly:

import numpy as np
# creates look up list for ranges that need to be excluded
lut_lst = []
for i in range(0,2235,15):
    a= i,2+i
    b= 14+i, 15+i
    lut_lst.append(a)
    lut_lst.append(b)

## if 'time' value falls in any of the ranges of lut_lst, replace values 
with NaN (drop row)
data_cols = ['filename', 'time']
data_vals = [['cell1', 0.0186],
    ['cell1', 0.0774],
    ['cell1', 2.2852],
    ['cell1', 2.3788],
    ['cell1', 14.62],
    ['cell1', 15.04],
    ['cell2', 20.3416],
    ['cell2', 20.9128],
    ['cell2', 29.6784],
    ['cell2', 30.1194],
    ['cell2', 32.3304]]
    
df = pd.DataFrame(data_vals, columns=data_cols)


df['time_filtered'] = df['time'].apply(lambda x: x if not any([a < x < b 
for a,b in lut_lst]) else np.nan)

df

Output:

    filename    time    time_filtered
0   cell1   0.0186  NaN
1   cell1   0.0774  NaN
2   cell1   2.2852  2.2852
3   cell1   2.3788  2.3788
4   cell1   14.6200 NaN
5   cell1   15.0400 NaN
6   cell2   20.3416 20.3416
7   cell2   20.9128 20.9128
8   cell2   29.6784 NaN
9   cell2   30.1194 NaN
10  cell2   32.3304 32.3304
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading