<html>
<head>
<title>Index of /pub/opera/desktop/</title>
</head>
<body>
<h1>Index of /pub/opera/desktop/</h1>
<hr>
<pre><a href="../">../</a>
<a href="15.0.1147.130/">15.0.1147.130/</a> 01-Jul-2013 15:18 -
<a href="15.0.1147.132/">15.0.1147.132/</a> 01-Jul-2013 15:18 -
<a href="15.0.1147.138/">15.0.1147.138/</a> 09-Jul-2013 12:11
I need to extract version which is 15.0.1147.130 and date which is 01-Jul-2013 15:18
However, using my code, it only gives me version
soup = BeautifulSoup(requests.get('https://get.geo.opera.com/pub/opera/desktop/').text, 'html.parser')
for item in soup.find('pre').find_all('a')[1:]:
print(item)
what am I missing to get the date text too?
>Solution :
You get "A" tags, they dont contains Date
soup = BeautifulSoup(requests.get('https://get.geo.opera.com/pub/opera/desktop/').text, 'html.parser')
for item in soup.find_all('pre'):
version = item
print(version.getText().replace('/', "").replace('-', ""))
UPDADE
import requests
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(requests.get('https://get.geo.opera.com/pub/opera/desktop/').text, 'html.parser')
lines = soup.find('pre').getText().replace('/', "").replace('-', "").split('\r')
for line in lines[1:-1]:
my_data = re.sub(' +', ' ', line).split(' ')
geo = my_data[0]
date = my_data[1]
time = my_data[2]
print(geo, date, time)