Extracting values from HTML in python

Advertisements

My project involves web scraping using python. In my project I need to get data about a given its registration. I have managed to get the html from the site into python but I am struggling to extract the values.
I am using this website: https://www.carcheck.co.uk/audi/N18CTN

from bs4 import BeautifulSoup
import requests

url = "https://www.carcheck.co.uk/audi/N18CTN"

r= requests.get(url)

soup = BeautifulSoup(r.text)

print(soup)

I need to get this information about the vehicle

<td>AUDI</td>
</tr>
<tr>
<th>Model</th>
<td>A3</td>
</tr>
<tr>
<th>Colour</th>
<td>Red</td>
</tr>
<tr>
<th>Year of manufacture</th>
<td>2017</td>
</tr>
<tr>
<th>Top speed</th>
<td>147 mph</td>
</tr>
<tr>
<th>Gearbox</th>
<td>6 speed automatic</td>

How would I go about doing this?

>Solution :

You can use this example to get you started how to get information from this page:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://www.carcheck.co.uk/audi/N18CTN'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

all_data = []
for row in soup.select('tr:has(th):has(td):not(:has(table))'):
    header = row.find_previous('h1').text.strip()
    title = row.th.text.strip()
    text = row.td.text.strip()
    all_data.append((header, title, text))

df = pd.DataFrame(all_data, columns = ['Header', 'Title', 'Value'])
print(df.head(20).to_markdown(index=False))

Prints:

Header Title Value
General information Make AUDI
General information Model A3
General information Colour Red
General information Year of manufacture 2017
General information Top speed 147 mph
General information Gearbox 6 speed automatic
Engine & fuel consumption Power 135 kW / 184 HP
Engine & fuel consumption Engine capacity 1.968 cc
Engine & fuel consumption Cylinders 4
Engine & fuel consumption Fuel type Diesel
Engine & fuel consumption Consumption city 42.0 mpg
Engine & fuel consumption Consumption extra urban 52.3 mpg
Engine & fuel consumption Consumption combined 48.0 mpg
Engine & fuel consumption CO2 emission 129 g/km
Engine & fuel consumption CO2 label D
MOT history MOT expiry date 2023-10-27
MOT history MOT pass rate 83 %
MOT history MOT passed 5
MOT history Failed MOT tests 1
MOT history Total advice items 11

Leave a Reply Cancel reply