Extracting specific tag from XML in python using BeautifulSoup

I have a metadata file that looks like this: <?xml version=’1.0′ encoding=’utf-8′?> <package xmlns="http://www.idpf.org/2007/opf&quot; unique-identifier="uuid_id" version="2.0"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/&quot; xmlns:opf="http://www.idpf.org/2007/opf"&gt; <dc:title>Princeton Review Digital SAT Premium Prep, 2024: 4 Practice Tests + Online Flashcards + Review &amp; Tools</dc:title> <dc:creator opf:file-as="Princeton Review, The" opf:role="aut">The Princeton Review</dc:creator> <dc:identifier opf:scheme="ISBN">9780593516874</dc:identifier> <dc:identifier opf:scheme="AMAZON">0593516877</dc:identifier> <dc:identifier opf:scheme="GOODREADS">63139948</dc:identifier> <dc:identifier opf:scheme="GOOGLE">o6i4EAAAQBAJ</dc:identifier> </metadata> </package> I know… Read More Extracting specific tag from XML in python using BeautifulSoup

Check if element exists or not

I have a code that grabs a price tag from this HTML section <div class="main"> <div class="cost-box"> <ins><span>$</span><price>10.00</price></ins> </div> </div> Here’s the code I use to get the 10.00 price: import requests from bs4 import BeautifulSoup as bs url = "https://www.sample.com/sample/123abcd&quot; response = requests.get(url).text soup = bs(response, "html.parser") container = soup.find("div", class_="cost-box") price = container.price… Read More Check if element exists or not

how do I webscrape this site properly?

from bs4 import BeautifulSoup import requests headers = { ‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/117.0′ } product = input("What do you want? ") def eurika(product): eurika_url = f’https://www.eureka.com.kw/?instant_records%5Bquery%5D={product}’ response = requests.get(eurika_url, headers=headers) soup = BeautifulSoup(response.content, ‘html.parser’) product_span = soup.find_all(‘div’, class_=’caption’) if product_span: for product_spa in product_span: title = product_spa.find(‘span’, class_=’display-block fwbold’)… Read More how do I webscrape this site properly?

Unexpected output when I use BeautifulSoup for web scraping

I don’t know why my checkTargetExist function is returning the unexpected result. from bs4 import BeautifulSoup def checkTargetExist(soup, id): errorDiv = soup.find(id) #if there is <div> which informs error message then my target does not exist. if errorDiv: targetExist = False else: targetExist = True return targetExist # Create a BeautifulSoup object from your HTML… Read More Unexpected output when I use BeautifulSoup for web scraping

Web-scraping script is not finding element given its class name. Unsure what I'm missing in this code

Problem Description: I’m trying to get the URL of an image on this page: https://www2.hm.com/en_us/productpage.1109917007.html This first picture (pic #1) shows the desired image highlighted in blue + its class name. When you click on that image, you then get its fullscreen image (pic #2), whose URL I’m trying to get. I wrote the code… Read More Web-scraping script is not finding element given its class name. Unsure what I'm missing in this code

BeautifulSoup closes automatically html tags that are unclosed

I have an issue with BeautifukSoup. Whenever I parse an HTML input, it closes HTML tags that weren’t closed (e.g. <input> , or tags that weren’t closed by mistake). For example: from bs4 import BeautifulSoup tags = BeautifulSoup(‘<span id="100" class="test">’, "html.parser") print(str(tags)) Prints: <span id="100" class="test"></span> My main goal here is to preserve the original… Read More BeautifulSoup closes automatically html tags that are unclosed

BeautifulSoup shuffles the attributes of html tags

I have an issue with BeautifukSoup. Whenever I parse an HTML input, it changes the order of the attributes (e.g. class, id) of the HTML tags. For example: from bs4 import BeautifulSoup tags = BeautifulSoup(‘<span id="100" class="test"></span>’, "html.parser") print(str(tags)) Prints: <span class="test" id="100"></span> As you can see, the class and id order was changed. How… Read More BeautifulSoup shuffles the attributes of html tags

How to get the href from an <a> tag inside a <div> by text using beautifulsoup?

from bs4 import BeautifulSoup import re text = """<div class="content"> <div class="body"> <img src="https:/document-images/leasing/mynd-logo.png" alt="j0b" width="96px" height="auto" class="kjhjhv"> <div class="title"> You’ve received a new message. <div class="subtitle"> Service request <a href="https://-2B06XEJ0r8DVmc_kuX2cI8baDAaOcj-2Fp3iIjU6R7PXKa3dAYAr0B7iMyKwz-2FaV0nnIuCVP1pcf8DEy1UidQbR2IywCV5ueXy1TowXMzFcIPYG2hp7HjP1WzHYI-2FJNMGLZMtC4LXybcCZ4cUOV4DnC6s-2FCIJ-2FrumGmdnE2leBJgM3rWJaEyXOwi4JiHjBHr4rtNh-2BPeP3JFBHpGNp5KWrZkxg-2F9zrih4tp7-2BUUrBo0g8hlG5It1yEVfz9Im2iRVAvdjqHvAqxn63TsV2OFNp8M8DMjuS6aRL3Vki8HfkXx0kD8fGJ6GAKUAOZv-2BCSAgxtcdnIpR8sQU6Jkcm9vxdjG2zmDYFEmVykg0-2BY3uD1ZbGl79dsB68mqJdbbQlgb8ERSRAW3t8cYTXsegrGd5-2Fox-2B9Yo-2FPmUC9cnqEaA-3D-3D&quot; style="color:#5A88AA; text-decoration:underline;" target="_blank">#2819138</a> </div> </div> <!– CONTENT OF EMAIL –> <p> May 17, 2023 05:33 PDT <b> Charisse </b> wrote: </p> <p style="white-space: pre-wrap;">Hi… Read More How to get the href from an <a> tag inside a <div> by text using beautifulsoup?