Home I have a problem with construct regular expression. [Python, Pandas]

Questions

I have a problem with construct regular expression. [Python, Pandas]

February 26, 2022

I have a data frame where row in one column looks like this:

<title>Some text</title>

<selftext>Some text</selftext>

This above is one row in one column.
The problem is that not every row looks like this. I have to implement that rows which not looks like this was removed.

I tried to use code below:

pattern = "<title>[a-zA-Z0-9]</title>\n\n<selftext>[a-zA-Z0-9]</selftext>"
for row in df.column_name:
    if row == pattern:
        print(row)

and I don’t have any rows printed, although I should.
What I am doing wrong?
Anyone knows?

>Solution :

My first idea for what is wrong with the pattern would be that you set a range but only allow exactly one character. Use this to allow any content within title and selftext tags which have at least one character.

pattern = "<title>[a-zA-Z0-9]+</title>\n\n<selftext>[a-zA-Z0-9]+</selftext>"

Also you did not call an actual regex pattern. You just did a string comparison. So unless the content would be exactly [a-zA-Z0-9] it wouldnt match.

Use it like this:

import re
pattern = "<title>[a-zA-Z0-9]+</title>\n\n<selftext>[a-zA-Z0-9]+</selftext>"
for row in df.column_name:
    if re.match(pattern, row):
        print(row)

Edit: Unless you also want to filter the content by following exactly the right character set and numbers range, I would recommend making the pattern much more broad. Basically XML allows for everything except Tags (<, >) within the tags. So you could just match until the next opening tag. While you’re at it you can also allow empty tags as these can also occur in XML.

import re
pattern = "<title>[^<]*</title>\n\n<selftext>[^<]*</selftext>"
for row in df.column_name:
    if re.match(pattern, row):
        print(row)

byMR

Published February 26, 2022

Add a comment

useReducer – keydown handler re-evaluates initial state on each render

byMR

February 26, 2022

Questions

convert textarea input to javascript object?

byMR

February 26, 2022

Questions

Counting id for both days SQL

byMR

February 26, 2022

Questions

Prompt user to fill in a template?

byMR

February 26, 2022

Questions

Find maximum value irrespective of whether is a positive/negative number in the previous nth rows?

byMR

February 26, 2022

Questions

python unable to get dictionary from string

byMR

February 26, 2022

I have a problem with construct regular expression. [Python, Pandas]

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

useReducer – keydown handler re-evaluates initial state on each render

convert textarea input to javascript object?

Counting id for both days SQL

Prompt user to fill in a template?

Find maximum value irrespective of whether is a positive/negative number in the previous nth rows?

python unable to get dictionary from string

Keep Up to Date with the Most Important News

I have a problem with construct regular expression. [Python, Pandas]

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

useReducer – keydown handler re-evaluates initial state on each render

convert textarea input to javascript object?

Counting id for both days SQL

Prompt user to fill in a template?

Find maximum value irrespective of whether is a positive/negative number in the previous nth rows?

python unable to get dictionary from string

Discover more from Dev solutions