Advertisements
My project involves web scraping using python. In my project I need to get data about a given its registration. I have managed to get the html from the site into python but I am struggling to extract the values.
I am using this website: https://www.carcheck.co.uk/audi/N18CTN
from bs4 import BeautifulSoup
import requests
url = "https://www.carcheck.co.uk/audi/N18CTN"
r= requests.get(url)
soup = BeautifulSoup(r.text)
print(soup)
I need to get this information about the vehicle
<td>AUDI</td>
</tr>
<tr>
<th>Model</th>
<td>A3</td>
</tr>
<tr>
<th>Colour</th>
<td>Red</td>
</tr>
<tr>
<th>Year of manufacture</th>
<td>2017</td>
</tr>
<tr>
<th>Top speed</th>
<td>147 mph</td>
</tr>
<tr>
<th>Gearbox</th>
<td>6 speed automatic</td>
How would I go about doing this?
>Solution :
You can use this example to get you started how to get information from this page:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.carcheck.co.uk/audi/N18CTN'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
all_data = []
for row in soup.select('tr:has(th):has(td):not(:has(table))'):
header = row.find_previous('h1').text.strip()
title = row.th.text.strip()
text = row.td.text.strip()
all_data.append((header, title, text))
df = pd.DataFrame(all_data, columns = ['Header', 'Title', 'Value'])
print(df.head(20).to_markdown(index=False))
Prints:
Header | Title | Value |
---|---|---|
General information | Make | AUDI |
General information | Model | A3 |
General information | Colour | Red |
General information | Year of manufacture | 2017 |
General information | Top speed | 147 mph |
General information | Gearbox | 6 speed automatic |
Engine & fuel consumption | Power | 135 kW / 184 HP |
Engine & fuel consumption | Engine capacity | 1.968 cc |
Engine & fuel consumption | Cylinders | 4 |
Engine & fuel consumption | Fuel type | Diesel |
Engine & fuel consumption | Consumption city | 42.0 mpg |
Engine & fuel consumption | Consumption extra urban | 52.3 mpg |
Engine & fuel consumption | Consumption combined | 48.0 mpg |
Engine & fuel consumption | CO2 emission | 129 g/km |
Engine & fuel consumption | CO2 label | D |
MOT history | MOT expiry date | 2023-10-27 |
MOT history | MOT pass rate | 83 % |
MOT history | MOT passed | 5 |
MOT history | Failed MOT tests | 1 |
MOT history | Total advice items | 11 |