Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Scraping data using python and requests html and export in excel file

I make a code to scrape data from a web site and it’s working fine, but I’d like to export them on an excel file.

I am a newbie of python so I don’t know exactly what I should do

I thought to pandas, but my output is a print with join, so I didn’t find a good solution

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is my code:

from requests_html import HTMLSession
import pandas as pd
import tabulate
from tabulate import tabulate
 
matchlink = 'https://www.betexplorer.com/football/serbia/prva-liga/results/'
 
session = HTMLSession()
 
r = session.get(matchlink)

allmatch = r.html.find('.in-match')
results = r.html.find('.h-text-center a')
# search for elements containing "data-odd" attribute
matchodds = r.html.find('[data-odd]')

odds = [matchodd.attrs["data-odd"] for matchodd in matchodds]

idx = 0
for match, res in zip(allmatch, results):
    if res.text == 'POSTP.':
        continue

    print(f"{match.text} {res.text} {', '.join(odds[idx:idx+3])}")
    
    idx += 3


thanks for your help

>Solution :

Sure, use pandas!

Here is some sample output.
With a dataframe in hand, it’s easy to call
.to_csv(),
.to_excel(),
or whatever.

                                    result              odds
match                                                       
Dubocica - FK Indjija                  2:1  2.18, 2.93, 3.31
Mladost GAT - Smederevo                1:1  1.63, 3.37, 5.17
Graficar Beograd - RFK Novi Sad        2:1  1.41, 4.31, 6.28
Tekstilac Odzaci - Radnicki Beograd    5:0  1.53, 3.79, 5.49
FK Indjija - Vrsac                     2:1  1.72, 3.16, 4.90
...                                    ...               ...
Jedinstvo U. - RFK Novi Sad            4:0  1.45, 4.42, 5.59
Metalac - Graficar Beograd             1:3  2.17, 3.14, 3.11
Sloboda - OFK Beograd                  0:2  1.87, 3.15, 4.02
Smederevo - FK Indjija                 2:0  2.76, 2.83, 2.59
Vrsac - Kolubara                       1:0  2.73, 2.92, 2.57

[160 rows x 2 columns]

I just put a trivial wrapper around your code.
(BTW, regarding tabulate,
if you from x import x and then import x,
that undoes what you were trying to do.)

from typing import Generator

from requests_html import HTMLSession
import pandas as pd

matchlink = "https://www.betexplorer.com/football/serbia/prva-liga/results/"


def _get_rows(url: str) -> Generator[dict[str, str], None, None]:
    session = HTMLSession()

    r = session.get(matchlink)

    allmatch = r.html.find(".in-match")
    results = r.html.find(".h-text-center a")
    # search for elements containing "data-odd" attribute
    matchodds = r.html.find("[data-odd]")

    odds = [matchodd.attrs["data-odd"] for matchodd in matchodds]

    idx = 0
    for match, res in zip(allmatch, results):
        if res.text == "POSTP.":
            continue

        print(f"{match.text} Z {res.text} {', '.join(odds[idx:idx+3])}")
        yield {
            "match": match.text,
            "result": res.text,
            "odds": ", ".join(odds[idx : idx + 3]),
        }

        idx += 3


if __name__ == "__main__":
    df = pd.DataFrame(_get_rows(matchlink)).set_index("match")
    print(df)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading