Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to accumulate in a df parsed data through a loop with pandas from a web scrapping?

I want to create a df with an historical dataset by scrapping a website, but I struggle to accumulate the full period within the loop. I am able to download a day, but when I try to create a loop to storage a set of iterations I am not able to accumulate the data in the dataframe.

The df I want to create from the start_date to the end_date is as follows:

Fecha PeríodeTU TM°C HRM%
single_date

Where Fecha is a result of adding a columns with the single_date of the code below, and the rest of the columns are actual data from the website scrapped.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have tried this:

def daterange(start_date, end_date):
    for n in range(int ((end_date - start_date).days)):
        yield start_date + timedelta(n)

start_date = date(2020, 6, 1)
end_date = date(2021, 3, 3)


for single_date in daterange(start_date, end_date):
    #URL API Meteo.cat con la fecha
    url = "https://www.meteo.cat/observacions/xema/dades?codi=V3&dia="+str(single_date)+"T00:00Z"        

    # GET a la API
    res = requests.get(url)
    soup = BeautifulSoup(res.content,'lxml')
    table = soup.find_all('table')[2]
    df_table = pd.read_html(str(table))[0]
    df_table['Fecha'] = single_date


data['Fecha'] = df['Fecha']
data['Hora'] = df['PeríodeTU']
data['Temperatura_Media'] = df['TM°C']
data['Humedad_Relativa'] = df['HRM%']
data.to_csv('Data/tempset.csv', header=True, index=False)

df_table only saves the last date, and I want to save the full period iterated.

Does anyone know how to deal with this situation?

>Solution :

You can create a list and the concatenate it:

dfs = []
for single_date in daterange(start_date, end_date):
    #URL API Meteo.cat con la fecha
    url = "https://www.meteo.cat/observacions/xema/dades?codi=V3&dia="+str(single_date)+"T00:00Z"    

    # GET a la API
    res = requests.get(url)
    soup = BeautifulSoup(res.content,'lxml')
    table = soup.find_all('table')[2]
    dfs.append(pd.read_html(str(table))[0].assign(Fecha = single_date))

And finally after running the loop:

df_table = pd.concat(dfs)

This will create df_table with all the individual observations from the dataframes based on your loop.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading