Making rows NaN based on many conditions

November 24, 2022

If I have a dataframe with some index and some value as follows:

import pandas as pd
from random import random

my_index = []
my_vals = []
for i in range(1000):
    my_index.append(i+random())
    my_vals.append(random())

df_vals = pd.DataFrame({'my_index': my_index, 'my_vals': my_vals})

And I have a second dataframe with a column start and end, a row must be read as an interval, so the first row would be interval from 1 to 4 (including 1 and 4). It is the following dataframe:

df_intervals = pd.DataFrame({'start': [1, 7, 54, 73, 136, 235, 645, 785, 968], 'end': [4, 34, 65, 90, 200, 510, 700, 805, 988]})

I would like to make all values in the my_vals column of df_vals a NaN if the row’s index (my_index) does not fall in to one of the intervals specified in the df_intervals dataframe. What is the best way to go about this automatically rather than specifying each condition manually?

(In my actual data set there are more than 9 intervals, this is some example data)

EDIT: in my actual data these indeces are not strictly integers, these can also be random floats

>Solution :

I believe this is a possible solution,

def index_in_range(index, df):
    for index_, row in df.iterrows():
        if (index >= row['start']) and (index <= row['end']):
            return True
    
    return False
    
df_vals['my_vals'] = df_vals.apply(lambda row: row['my_vals'] if index_in_range(row['my_index'], df_intervals) else None, axis=1)

To accomplish this without using a lambda function, you can do the following,

def index_in_range(index, df):
    for index_, row in df.iterrows():
        if (index >= row['start']) and (index <= row['end']):
            return True
    
    return False

for index_, row in df_vals.iterrows():
    if not index_in_range(row['my_index'], df_intervals):
        df_vals.at[index_, 'my_vals'] = None

Output:

    my_index    my_vals
0   0   NaN
1   1   0.126647
2   2   0.769215
3   3   0.819891
4   4   0.674466
... ... ...
995 995 NaN
996 996 NaN
997 997 NaN
998 998 NaN
999 999 NaN