Web scraping with BeautifulSoup – when trying to find table the content is not returned

Advertisements

I am trying to scrape a website for a table but only the header is being returned.

I am new to python and web scraping and have followed the following material which was very helpful https://medium.com/analytics-vidhya/how-to-scrape-a-table-from-website-using-python-ce90d0cfb607.

However, the following code only returns the header and not the body of the table.

# Create an URL object
url = 'https://www.dividendmax.com/dividends/declared'
# Create object page
page = requests.get(url)

req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
page = urlopen(req)
html = page.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")

# Obtain information from tag <table>
table1 = soup.find_all('table')
table1

Output:

[<table aria-label="Declared Dividends" class="mdc-data-table__table">
 <thead>
 <tr class="mdc-data-table__header-row">
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Company</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Ticker</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Country</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Exchange</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Share Price</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Prev. Dividend</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Next Dividend</th>
 <th class="mdc-data-table__header-cell" role="columnheader" scope="col">Next Ex-date</th>
 </tr>
 </thead>
 <tbody></tbody>
 </table>]

I need to retrieve the tbody content (found when expanding the penultimate row of output).

Just as an FYI, the following code will be used to create the dataframe.

# Obtain every title of columns with tag <th>
headers = []
for i in table1.find_all('th'):
    title = i.text
    headers.append(title)

# Create a dataframe
mydata = pd.DataFrame(columns = headers)

# Create a for loop to fill mydata
for j in table1.find_all('tr')[1:]:
    row_data = j.find_all('td')
    row = [i.text for i in row_data]
    length = len(mydata)
    mydata.loc[length] = row

>Solution :

The page you are after is not the same as the tutorial. Probably not the best site if your trying to learn/practice with beautifulsoup. But the data for me comes back in a nice json format.

import requests
import pandas as pd

# Create an URL object
url = 'https://www.dividendmax.com/dividends/declared'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'}

jsonData = requests.get(url, headers=headers).json()
df = pd.DataFrame(jsonData)

Output:

print(df)
                                            name  ...                 ind
0                                   3i Group plc  ...  [22, 25, 23, 3, 5]
1                          3I Infrastructure Plc  ...              [4, 5]
2                                AB Dynamics plc  ...                  []
3    Aberdeen Smaller Companies Income Trust plc  ...                  []
4      Aberdeen Standard Equity Income Trust plc  ...                  []
..                                           ...  ...                 ...
146                              Workspace Group  ...      [25, 4, 24, 5]
147                          Wynnstay Properties  ...                  []
148                                 XP Power Ltd  ...              [5, 4]
149                           Yew Grove REIT Plc  ...                  []
150                                       Yougov  ...                  []

[151 rows x 11 columns]

Leave a ReplyCancel reply