BeautifulSoup shuffles the attributes of html tags

August 1, 2023

I have an issue with BeautifukSoup. Whenever I parse an HTML input, it changes the order of the attributes (e.g. class, id) of the HTML tags.

For example:

from bs4 import BeautifulSoup

tags = BeautifulSoup('<span id="100" class="test"></span>', "html.parser")
print(str(tags))

Prints:

<span class="test" id="100"></span>

As you can see, the class and id order was changed. How can I prevent such behavior?

I am unfamiliar with web development, but I know that the order of the attributes doesn’t matter.

My main goal here is to preserve the original shape of the HTML input after parsing it because I want to loop through the tags and match them (at character-level) with other HTML texts.

>Solution :

As you stated, the order of attributes in HTML doesn’t matter. But if you really want unsorted attributes, you can do:

from bs4 import BeautifulSoup
from bs4.formatter import HTMLFormatter


class UnsortedAttributes(HTMLFormatter):
    def attributes(self, tag):
        yield from tag.attrs.items()


tags = BeautifulSoup('<span id="100" class="test"></span>', "html.parser")

print(tags.encode(formatter=UnsortedAttributes()).decode())

Prints: