Home How do I create a regex dynamically using strings in a list for use in a pandas dataframe search?

Questions

How do I create a regex dynamically using strings in a list for use in a pandas dataframe search?

July 1, 2024

The following code allows me to successfully identify the 2nd and 3rd texts, and only those texts, in a pandas dataframe by search for rows that contain the word "cod" or "i":

import numpy as np
import pandas as pd
texts_df = pd.DataFrame({"id":[1,2,3,4],
                      "text":["she loves coding", 
                              "he was eating cod",
                              "i do not like fish",
                              "fishing is not for me"]})

texts_df.loc[texts_df["text"].str.contains(r'\b(cod|i)\b', regex=True)]

I would like to build the list of words up dynamically by inserting words from a long list but I can’t figure out how to do that successfully.

I’ve tried the following but I get an error saying "r is not defined" (which I expected as it’s not a variable but I can’t put it as part of the string either and don’t know what I should do)

kw_list = ["cod", "i"]

kw_regex_string = "\b("
for kw in kw_list:
  kw_regex_string = kw_regex_string + kw + "|"
kw_regex_string = kw_regex_string[:-1]  # remove the final "|" at the end
kw_regex_string = kw_regex_string + ")\b"

myregex = r + kw_regex_string
texts_df.loc[texts_df["text"].str.contains(myregex, regex=True)]

How can I build the ‘or’ condition containing the list of key words and then insert that into the reg ex in a way that will work in the pandas dataframe search?

>Solution :

When I’m doing this, I wrap the list with map and re.escape to escape special characters that could have a regex meaning, then I join them with | as separator and I include this in the parentheses with string formatting:

import re

kw_list = ['cod', 'i']

my_regex = r'\b(?:%s)\b' % '|'.join(map(re.escape, kw_list))

texts_df.loc[texts_df['text'].str.contains(my_regex, regex=True)]

Variant:

my_regex = fr'\b(?:{"|".join(map(re.escape, kw_list))})\b'

Crafted regex: '\\b(?:cod|i)\\b'

Example of escaping of special characters:

kw_list = ['10.00$', '*word*', '(A)']

# crafted regex
'\\b(?:10\\.00\\$|\\*word\\*|\\(A\\))\\b'

regex

byMR

Published July 01, 2024

Add a comment

RegEx to find string representing local paths

byMR

July 1, 2024

Questions

Character limit in a regular expression Javascript

byMR

July 1, 2024

Questions

How to use try-catch (IOException) for BufferedWriter.write nested in if-else?

byMR

July 2, 2024

Questions

Perl's capture group disappears while in scope

byMR

July 2, 2024

Questions

Why does Tokio single threaded flavor not provide any concurrency?

byMR

July 2, 2024

Questions

How Can I Remove All History (commits and etc.) of Huge GIT Repo

byMR

July 2, 2024

How do I create a regex dynamically using strings in a list for use in a pandas dataframe search?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

RegEx to find string representing local paths

Character limit in a regular expression Javascript

How to use try-catch (IOException) for BufferedWriter.write nested in if-else?

Perl's capture group disappears while in scope

Why does Tokio single threaded flavor not provide any concurrency?

How Can I Remove All History (commits and etc.) of Huge GIT Repo

Keep Up to Date with the Most Important News

How do I create a regex dynamically using strings in a list for use in a pandas dataframe search?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

RegEx to find string representing local paths

Character limit in a regular expression Javascript

How to use try-catch (IOException) for BufferedWriter.write nested in if-else?

Perl's capture group disappears while in scope

Why does Tokio single threaded flavor not provide any concurrency?

How Can I Remove All History (commits and etc.) of Huge GIT Repo

Discover more from Dev solutions