Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Web scraping with BeautifulSoup – when trying to find table the content is not returned

I am trying to scrape a website for a table but only the header is being returned.

I am new to python and web scraping and have followed the following material which was very helpful https://medium.com/analytics-vidhya/how-to-scrape-a-table-from-website-using-python-ce90d0cfb607.

However, the following code only returns the header and not the body of the table.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# Create an URL object
url = 'https://www.dividendmax.com/dividends/declared'
# Create object page
page = requests.get(url)

req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(req)
html = page.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")

# Obtain information from tag <table>
table1 = soup.find_all('table')
table1

Output:

[<table aria-label="Declared Dividends" class="mdc-data-table__table">
 <thead>
 <tr class="mdc-data-table__header-row">
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Company</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Ticker</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Country</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Exchange</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Share Price</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Prev. Dividend</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Next Dividend</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Next Ex-date</th>
 </tr>
 </thead>
 <tbody></tbody>
 </table>]

I need to retrieve the tbody content (found when expanding the penultimate row of output).

Just as an FYI, the following code will be used to create the dataframe.

# Obtain every title of columns with tag <th>
headers = []
for i in table1.find_all('th'):
    title = i.text
    headers.append(title)

# Create a dataframe
mydata = pd.DataFrame(columns = headers)

# Create a for loop to fill mydata
for j in table1.find_all('tr')[1:]:
    row_data = j.find_all('td')
    row = [i.text for i in row_data]
    length = len(mydata)
    mydata.loc[length] = row

>Solution :

The page you are after is not the same as the tutorial. Probably not the best site if your trying to learn/practice with beautifulsoup. But the data for me comes back in a nice json format.

import requests
import pandas as pd

# Create an URL object
url = 'https://www.dividendmax.com/dividends/declared'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'}

jsonData = requests.get(url, headers=headers).json()
df = pd.DataFrame(jsonData)

Output:

print(df)
                                            name  ...                 ind
0                                   3i Group plc  ...  [22, 25, 23, 3, 5]
1                          3I Infrastructure Plc  ...              [4, 5]
2                                AB Dynamics plc  ...                  []
3    Aberdeen Smaller Companies Income Trust plc  ...                  []
4      Aberdeen Standard Equity Income Trust plc  ...                  []
..                                           ...  ...                 ...
146                              Workspace Group  ...      [25, 4, 24, 5]
147                          Wynnstay Properties  ...                  []
148                                 XP Power Ltd  ...              [5, 4]
149                           Yew Grove REIT Plc  ...                  []
150                                       Yougov  ...                  []

[151 rows x 11 columns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading