I make a code to scrape data from a web site and it’s working fine, but I’d like to export them on an excel file.
I am a newbie of python so I don’t know exactly what I should do
I thought to pandas, but my output is a print with join, so I didn’t find a good solution
This is my code:
from requests_html import HTMLSession
import pandas as pd
import tabulate
from tabulate import tabulate
matchlink = 'https://www.betexplorer.com/football/serbia/prva-liga/results/'
session = HTMLSession()
r = session.get(matchlink)
allmatch = r.html.find('.in-match')
results = r.html.find('.h-text-center a')
# search for elements containing "data-odd" attribute
matchodds = r.html.find('[data-odd]')
odds = [matchodd.attrs["data-odd"] for matchodd in matchodds]
idx = 0
for match, res in zip(allmatch, results):
if res.text == 'POSTP.':
continue
print(f"{match.text} {res.text} {', '.join(odds[idx:idx+3])}")
idx += 3
thanks for your help
>Solution :
Sure, use pandas!
Here is some sample output.
With a dataframe in hand, it’s easy to call
.to_csv(),
.to_excel(),
or whatever.
result odds
match
Dubocica - FK Indjija 2:1 2.18, 2.93, 3.31
Mladost GAT - Smederevo 1:1 1.63, 3.37, 5.17
Graficar Beograd - RFK Novi Sad 2:1 1.41, 4.31, 6.28
Tekstilac Odzaci - Radnicki Beograd 5:0 1.53, 3.79, 5.49
FK Indjija - Vrsac 2:1 1.72, 3.16, 4.90
... ... ...
Jedinstvo U. - RFK Novi Sad 4:0 1.45, 4.42, 5.59
Metalac - Graficar Beograd 1:3 2.17, 3.14, 3.11
Sloboda - OFK Beograd 0:2 1.87, 3.15, 4.02
Smederevo - FK Indjija 2:0 2.76, 2.83, 2.59
Vrsac - Kolubara 1:0 2.73, 2.92, 2.57
[160 rows x 2 columns]
I just put a trivial wrapper around your code.
(BTW, regarding tabulate,
if you from x import x and then import x,
that undoes what you were trying to do.)
from typing import Generator
from requests_html import HTMLSession
import pandas as pd
matchlink = "https://www.betexplorer.com/football/serbia/prva-liga/results/"
def _get_rows(url: str) -> Generator[dict[str, str], None, None]:
session = HTMLSession()
r = session.get(matchlink)
allmatch = r.html.find(".in-match")
results = r.html.find(".h-text-center a")
# search for elements containing "data-odd" attribute
matchodds = r.html.find("[data-odd]")
odds = [matchodd.attrs["data-odd"] for matchodd in matchodds]
idx = 0
for match, res in zip(allmatch, results):
if res.text == "POSTP.":
continue
print(f"{match.text} Z {res.text} {', '.join(odds[idx:idx+3])}")
yield {
"match": match.text,
"result": res.text,
"odds": ", ".join(odds[idx : idx + 3]),
}
idx += 3
if __name__ == "__main__":
df = pd.DataFrame(_get_rows(matchlink)).set_index("match")
print(df)