Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas dataframe: how to select rows where one column-value is like 'values in a list'

I have a requirement where I need to select rows from a dataframe where one column-value is like values in a list.
The requirement is for a large dataframe with millions of rows and need to search for rows where column-value is like values of a list of thousands of values.

Below is a sample data.

NAME,AGE
Amar,80
Rameshwar,60
Farzand,90
Naren,60
Sheikh,45
Ramesh,55
Narendra,85
Rakesh,86
Ram,85
Kajol,80
Naresh,86
Badri,85
Ramendra,80

My code is like below. But problem is that I’m using a for loop, hence with increased number of values in the list-of-values (variable names_like in my code) I need to search, the number of loop and concat operation increases and it makes the code runs very slow.
I can’t use the isin() option as isin is for exact match and for me it is not an exact match, it a like condition for me.
Looking for a better more performance efficient way of getting the required result.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My Code:-

import pandas as pd

infile = "input.csv"

df = pd.read_csv(infile)
print(f"df=\n{df}")
names_like = ['Ram', 'Nar']
df_res = pd.DataFrame(columns=df.columns)
for name in names_like:
    df1 = df[df['NAME'].str.contains(name, na=False)]
    df_res = pd.concat([df_res,df1], axis=0)

print(f"df_res=\n{df_res}")

My Output:-

df_res=
         NAME AGE
1   Rameshwar  60
5      Ramesh  55
8         Ram  85
12   Ramendra  80
3       Naren  60
6    Narendra  85
10     Naresh  86

Looking for a better more performance efficient way of getting the required result.

>Solution :

You can pass all names in joined list by | for regex or, loop is not necessary:

df_res = df[df['NAME'].str.contains('|'.join(names_like), na=False)]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading