Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Web scraping in Python – but problems exporting data to excel

I’m trying to export som data to excel. I’m a total beginner, so i apologise for any dumb questions.

I’,m practicising scraping from a demosite webscraper.io – and so far i have found scraped the data, that i want, which is the laptop names and links for the products

import requests
from bs4 import BeautifulSoup
from pprint import pprint

url ="https://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"

r = requests.get(url)

html = r.text

soup = BeautifulSoup(html)

css_selector = {"class": "col-sm-4 col-lg-4 col-md-4"}

laptops = soup.find_all("div", attrs=css_selector)

for laptop in laptops:
    laptop_link = laptop.find('a')
    text = laptop_link.get_text()
    href = laptop_link['href']
    full_url = f"https://webscraper.io{href}"
    print(text)
    print (full_url)

I’m having major difficulties wrapping my head around how to export the text + full_url to excel.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have seen coding being done like this

import pandas as pd

df = pd.DataFrame(laptops)

df.to_excel("laptops_testing.xlsx", encoding="utf-8")

But when i’m doing so, i’m getting an .xlsx file which contains a lot of data and coding, that i dont want. I just want the data, that i have been printing (text) and (full_url)

The data i’m seeing in Excel is looking like this:

<div class="thumbnail">  
<img alt="item" class="img-responsive" src="/images/test-sites/e-commerce/items/cart2.png"/> 
<div class="caption">  
<h4 class="pull-right price">$295.99</h4>  
<h4>  
<a class="title" href="/test-sites/e-commerce/allinone/product/545" title="Asus VivoBook X441NA-GA190">Asus VivoBook X4...</a>  
</h4>  
<p class="description">Asus VivoBook X441NA-GA190 Chocolate Black, 14", Celeron N3450, 4GB, 128GB SSD, Endless OS, ENG kbd</p>  
</div>

<div class="ratings">  
<p class="pull-right">14 reviews</p>  
<p data-rating="3">  
<span class="glyphicon glyphicon-star"></span>  
<span class="glyphicon glyphicon-star"></span>  
<span class="glyphicon glyphicon-star"></span>  
</p>  
</div>  
</div>

Screenshot from google sheets:

enter image description here

>Solution :

This is not that much hard for solve just use this code you just have to add urls and text in lists then change it into a pandas dataframe and then make a new excel file.

import pandas as pd
import numpy as np
 
import requests

from bs4 import BeautifulSoup

from pprint import pprint

url ="https://webscraper.io/test-sites/e-commerce/allinone/computers/laptops"

r = requests.get(url)

html = r.text

soup = BeautifulSoup(html)

css_selector = {"class": "col-sm-4 col-lg-4 col-md-4"}

laptops = soup.find_all("div", attrs=css_selector)

laptop_name = []
laptop_url = []
for laptop in laptops:
    laptop_link = laptop.find('a')
    text = laptop_link.get_text()
    href = laptop_link['href']
    full_url = f"https://webscraper.io{href}"
    print(text)
    //appending name of laptops
    laptop_name.append(text)
    print (full_url)
    //appending urls
    laptop_url.append(full_url)

//changing it into dataframe
new_df = pd.DataFrame({'Laptop Name':laptop_name,'Laptop url':laptop_url})

print(new_df)

// defining excel file 
file_name = 'laptop.xlsx'
new_df.to_excel(file_name)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading