Scraping data using python and requests html and export in excel file

byMR

February 6, 2024

I make a code to scrape data from a web site and it’s working fine, but I’d like to export them on an excel file.

I am a newbie of python so I don’t know exactly what I should do

I thought to pandas, but my output is a print with join, so I didn’t find a good solution

This is my code:

from requests_html import HTMLSession
import pandas as pd
import tabulate
from tabulate import tabulate
 
matchlink = 'https://www.betexplorer.com/football/serbia/prva-liga/results/'
 
session = HTMLSession()
 
r = session.get(matchlink)

allmatch = r.html.find('.in-match')
results = r.html.find('.h-text-center a')
# search for elements containing "data-odd" attribute
matchodds = r.html.find('[data-odd]')

odds = [matchodd.attrs["data-odd"] for matchodd in matchodds]

idx = 0
for match, res in zip(allmatch, results):
    if res.text == 'POSTP.':
        continue

    print(f"{match.text} {res.text} {', '.join(odds[idx:idx+3])}")
    
    idx += 3

thanks for your help

>Solution :

Sure, use pandas!

Here is some sample output.
With a dataframe in hand, it’s easy to call
.to_csv(),
.to_excel(),
or whatever.

                                    result              odds
match                                                       
Dubocica - FK Indjija                  2:1  2.18, 2.93, 3.31
Mladost GAT - Smederevo                1:1  1.63, 3.37, 5.17
Graficar Beograd - RFK Novi Sad        2:1  1.41, 4.31, 6.28
Tekstilac Odzaci - Radnicki Beograd    5:0  1.53, 3.79, 5.49
FK Indjija - Vrsac                     2:1  1.72, 3.16, 4.90
...                                    ...               ...
Jedinstvo U. - RFK Novi Sad            4:0  1.45, 4.42, 5.59
Metalac - Graficar Beograd             1:3  2.17, 3.14, 3.11
Sloboda - OFK Beograd                  0:2  1.87, 3.15, 4.02
Smederevo - FK Indjija                 2:0  2.76, 2.83, 2.59
Vrsac - Kolubara                       1:0  2.73, 2.92, 2.57

[160 rows x 2 columns]

I just put a trivial wrapper around your code.
(BTW, regarding tabulate,
if you from x import x and then import x,
that undoes what you were trying to do.)

from typing import Generator

from requests_html import HTMLSession
import pandas as pd

matchlink = "https://www.betexplorer.com/football/serbia/prva-liga/results/"


def _get_rows(url: str) -> Generator[dict[str, str], None, None]:
    session = HTMLSession()

    r = session.get(matchlink)

    allmatch = r.html.find(".in-match")
    results = r.html.find(".h-text-center a")
    # search for elements containing "data-odd" attribute
    matchodds = r.html.find("[data-odd]")

    odds = [matchodd.attrs["data-odd"] for matchodd in matchodds]

    idx = 0
    for match, res in zip(allmatch, results):
        if res.text == "POSTP.":
            continue

        print(f"{match.text} Z {res.text} {', '.join(odds[idx:idx+3])}")
        yield {
            "match": match.text,
            "result": res.text,
            "odds": ", ".join(odds[idx : idx + 3]),
        }

        idx += 3


if __name__ == "__main__":
    df = pd.DataFrame(_get_rows(matchlink)).set_index("match")
    print(df)