Home re.findall outputs blanks along with correct

Questions

re.findall outputs blanks along with correct

December 8, 2022

I’m trying to get the list output to not have subgroups or empty spaces. I’m trying to stick with a RegEx only solution due to my re.split and array manipulation method is really janky and sort of slow.

HTML file: (Notice that thing 3 & 4 have /b/ before instead of /a/.)

<!DOCTYPE html>
<html>
    <head></head>   
    <body></body>
        <a href="example.com/a/thing1"></a>
        <a href="example.com/a/thing2"></a>
        <a href="example.com/b/thing3"></a>
        <a href="example.com/b/thing4" ><img src="/thing4.png"></a>
    </body>
</html>

Python file:

import re

html = open("help.html", "r").read()
links = re.findall('((?<=\.com\/a\/).*(?="))|((?<=\.com\/b\/).*(?=" ><))|((?<=\.com\/b\/).*(?="><\/a))',html)

print(links)

What will output when I run the above py file:

[('thing1', '', ''), ('thing2', '', ''), ('', '', 'thing3'), ('', 'thing4', '')]

What I want it to output:

[thing1, thing2, thing3, thing4]

>Solution :

You just have to remove the capturing groups. As stated in re.findall:

Empty matches are included in the result.

The result depends on the number of capturing groups in the pattern. If there are no groups, return a list of strings matching the whole pattern. If there is exactly one group, return a list of strings matching that group. If multiple groups are present, return a list of tuples of strings matching the groups. Non-capturing groups do not affect the form of the result.

An example of capturing group is ((?<=\.com\/a\/).*(?=")), so the most external brackets shall be removed, same for the other 2 groups:

links = re.findall('(?<=\.com\/a\/).*(?=")|(?<=\.com\/b\/).*(?=" ><)|(?<=\.com\/b\/).*(?="><\/a)',HTML)

Output:

['thing1', 'thing2', 'thing3', 'thing4']

python-re

byMR

Published December 08, 2022

Add a comment

R: how to convert a data frame to an assymetric matrix with an empty corner

byMR

December 8, 2022

Questions

Convert float64 to hexadecimal in Rust

byMR

December 8, 2022

Questions

How to reset the function command in velato [CIosed]

byMR

December 8, 2022

Questions

When and why would I use a constant function in C++

byMR

December 8, 2022

Questions

How to add a div in map every 4 iterations?

byMR

December 8, 2022

Questions

Create a column based on a value from another columns values on pandas

byMR

December 8, 2022

re.findall outputs blanks along with correct

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

R: how to convert a data frame to an assymetric matrix with an empty corner

Convert float64 to hexadecimal in Rust

How to reset the function command in velato [CIosed]

When and why would I use a constant function in C++

How to add a div in map every 4 iterations?

Create a column based on a value from another columns values on pandas

Keep Up to Date with the Most Important News

re.findall outputs blanks along with correct

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

R: how to convert a data frame to an assymetric matrix with an empty corner

Convert float64 to hexadecimal in Rust

How to reset the function command in velato [CIosed]

When and why would I use a constant function in C++

How to add a div in map every 4 iterations?

Create a column based on a value from another columns values on pandas

Discover more from Dev solutions