How to Filter Pandas Dataframe using ‘in’ and ‘not in’ like in SQL



Filtering data is an essential task in data analysis and manipulation. When working with Pandas, you may often encounter scenarios where you need to filter a dataframe based on specific values, similar to using the ‘in’ and ‘not in’ operators in SQL queries. In this blog post, we will explore how to achieve this in Pandas and perform efficient data filtering.

Filtering with ‘in’ operator

The ‘in’ operator allows you to filter a Pandas dataframe based on a set of values. Here’s an example:

import pandas as pd

# Create a sample dataframe
data = {'Name': ['John', 'Emma', 'Peter', 'Sarah'],
        'Age': [25, 30, 28, 32],
        'City': ['New York', 'London', 'Paris', 'Tokyo']}

df = pd.DataFrame(data)

# Filter the dataframe based on a list of names
names = ['Emma', 'Sarah']
filtered_df = df[df['Name'].isin(names)]


In this example, we have a dataframe with columns ‘Name’, ‘Age’, and ‘City’. We use the ‘isin()’ method on the ‘Name’ column to filter rows where the name is either ‘Emma’ or ‘Sarah’. The resulting dataframe will only contain the matching rows.

Filtering with ‘not in’ operator

Similarly, you can use the ‘not in’ operator to exclude specific values from the dataframe. Here’s an example:

# Filter the dataframe to exclude rows with specific names
filtered_df = df[~df['Name'].isin(names)]


In this case, we use the ‘~’ operator to negate the filtering condition. The resulting dataframe will exclude rows where the name is ‘Emma’ or ‘Sarah’.


Filtering a Pandas dataframe using the ‘in’ and ‘not in’ operators provides a powerful mechanism to extract relevant data based on specific criteria. By leveraging these operators, you can mimic the functionality of SQL queries and perform efficient data filtering in Pandas. Experiment with different filtering conditions and explore the vast capabilities of Pandas for data manipulation and analysis.

Leave a ReplyCancel reply