It’s literally my first time using BeautifulSoup, and I’m having trouble extracting the table I want to work with ([https://ansm.sante.fr/disponibilites-des-produits-de-sante/medicaments]). I want to extract the table table table-products sortable searchable .
import requests
from bs4 import BeautifulSoup
url="https://ansm.sante.fr/disponibilites-des-produits-de-sante/medicaments"
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, "html.parser")
table = soup.find("table", class_="table table-products sortable searchable ")
table_data = table.tbody.find_all("tr")
This outputs:
AttributeError: 'NoneType' object has no attribute 'tbody'.
I guess I’m not reaching the table correctly, which is why it comes out as 'NoneType'.
>Solution :
There are two tables in the webpage and class value table table-products sortable searchable select both of them. The desired table is 2 and I use pandas to pull the complete table data
import pandas as pd
df = pd.read_html('https://ansm.sante.fr/disponibilites-des-produits-de-sante/medicaments')[1]
print(df)
Output:
Statut ... Remise à disposition
0 Rupture de stock ... NaN
1 Rupture de stock ... NaN
2 Rupture de stock ... NaN
3 Tension d'approvisionnement ... NaN
4 Remise à disposition ... NaN
.. ... ... ...
373 Arrêt de commercialisation ... NaN
374 Rupture de stock ... NaN
375 Rupture de stock ... NaN
376 Remise à disposition ... 2 mars 2021
377 Rupture de stock ... NaN
[378 rows x 4 columns]