Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extracting data from table using Beautiful Soup where <tr> tags have both <th> and <td> tags

There are some tables I am trying to get data from. I have been able to do this before when the tags have only tags, however in this specific table they are mixed on several rows:

Picture of HTML Element

I was able to extract and tags separately using below, but then it’s difficult to put the data back together.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

url = 'https://www.pge.com/pipeline/operations/cgt_pipeline_status.page#flows'
res = requests.get(url)
file = BeautifulSoup(res.text, 'lxml')

##################################################################
find_table = file.find('table', class_='supply_demand_table')
rows = find_table.find_all('tr')

lps_td =[]

for i in rows:
    table_data = i.find_all('td')
    data = [j.text for j in table_data]
    lps_td.append(data)
    
df_td = pd.DataFrame(lps_td)

lps_th =[]

for i in rows:
    table_data = i.find_all('th')
    data = [j.text for j in table_data]
    lps_th.append(data)
    
df_th = pd.DataFrame(lps_th)

lps_th =[]

Any help on pulling the entire table would be really appreciated.
Thanks!

>Solution :

read_html returns a list of dataframes index 5 is the table you want.

import pandas as pd


url = "https://www.pge.com/pipeline/operations/cgt_pipeline_status.page#flows"
df = pd.read_html(url)[5].rename(columns={"Unnamed: 0": ""}).set_index("")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading