Removing newlines inside <a> tag

March 1, 2022

I am quite new to python and aslo BeautifulSoup. Recently I try to get all of <a> tag from local HTML,part of my code look like this:

with open(dir_path,encoding="utf-8_sig") as html_file:
     soup =BeautifulSoup(html_file,'html.parser')
tag = soup.find('a')
print(tag)

The output is look like this

<a class="cmp-image__link" data-cmp-hook-image="link" href="/">
<img alt="logo" class="cmp-image__image" data-cmp-hook-image="image" itemprop="contentUrl" src="//imagesource"/>
</a>

What I want to get is a string without newline inside the block of <a> tag

<a class="cmp-image__link" data-cmp-hook-image="link" href="/"><img alt="logo" class="cmp-image__image" data-cmp-hook-image="image" itemprop="contentUrl" src="//imagesource"/></a>

I did tried .strip() and replace but it didnt work. Please help!!

>Solution :

.strip() won’t work here, as it only removes leading or trailing whitespace.

.replace() will work here, but you need to assign its return value, as it doesn’t modify the string in-place (because Python strings are immutable).

This:

tag = '''<a class="cmp-image__link" data-cmp-hook-image="link" href="/">
<img alt="logo" class="cmp-image__image" data-cmp-hook-image="image" itemprop="contentUrl" src="//imagesource"/>
</a>'''

result = tag.replace('\n', '')

print(result)

produces the desired output.