Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Select all <table> elements without classes or ids with BeautifulSoup

I am trying to select all <table> elements on some web pages with BeautifulSoup. The table elements do not have specific classes or ids.

import bs4
import requests

def get_keycode_soup(url):
    res = requests.get(url)
    res.raise_for_status()
    return bs4.BeautifulSoup(res.text, features="html.parser")

def parse_qmk_soup():
    qmk_soup = get_keycode_soup("https://docs.qmk.fm/#/keycodes")
    tables = qmk_soup.select("table")
    # pass line for breakpoint
    pass

def main():
    parse_qmk_soup()

if __name__ == "__main__":
    main()

I have also tried selecting all the different table elements with

tables = qmk_soup.find_all("table")
# and
table_rows = qmk_soup.find_all("tr")

Whenever I pause the debugger on the pass line, tables is always None.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have tried some similar methods to this post and this post, but since there do not appear to be any other descriptive tags on the tables I’m trying to select, iterating feels inefficient.

Is there a way to simply select all the <table> elements on their own?

Edit: it appears that the page requires JS to load the tables as suggested by @DeepSpace below. Additionally, see the answer from @MendelG regarding following where the data is loaded from in case you might obtain the data from the source.

>Solution :

If you inspect your browser’s Network calls, and view the HTTP requests, you’ll see that the data is loaded from a different website URL, which is:

https://docs.qmk.fm/keycodes.md?cache-bust=1706627991267

The thing is, it’s really a markdown file (.md). However, at least you obtain the original file

So, there isn’t really any HTML to parse, to obtain it in a readable format.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading