Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

BeautifulSoup shuffles the attributes of html tags

I have an issue with BeautifukSoup. Whenever I parse an HTML input, it changes the order of the attributes (e.g. class, id) of the HTML tags.

For example:

from bs4 import BeautifulSoup

tags = BeautifulSoup('<span id="100" class="test"></span>', "html.parser")
print(str(tags))

Prints:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

<span class="test" id="100"></span>

As you can see, the class and id order was changed. How can I prevent such behavior?

I am unfamiliar with web development, but I know that the order of the attributes doesn’t matter.

My main goal here is to preserve the original shape of the HTML input after parsing it because I want to loop through the tags and match them (at character-level) with other HTML texts.

>Solution :

As you stated, the order of attributes in HTML doesn’t matter. But if you really want unsorted attributes, you can do:

from bs4 import BeautifulSoup
from bs4.formatter import HTMLFormatter


class UnsortedAttributes(HTMLFormatter):
    def attributes(self, tag):
        yield from tag.attrs.items()


tags = BeautifulSoup('<span id="100" class="test"></span>', "html.parser")

print(tags.encode(formatter=UnsortedAttributes()).decode())

Prints:

<span id="100" class="test"></span>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading