Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

using scrapy scrape some information

import scrapy
from scrapy.http import Request
from bs4 import BeautifulSoup


class TestSpider(scrapy.Spider):
    name = 'test'
    start_urls = ['https://www.baroul-bucuresti.ro/index.php?urlpag=tablou-definitivi&p=1']
    
    def parse(self, response):
        base_url='https://www.baroul-bucuresti.ro'
        soup=BeautifulSoup(response.text, 'html.parser')
        tra = soup.find_all('div',class_='panel-title')
        productlinks=[]
        for links in tra:
            for link in links.find_all('a',href=True)[1:]:
                comp=base_url+link['href']
                yield Request(comp, callback=self.parse_book)
     
    d1=''
    def parse_book(self, response):
        title=response.xpath("//h1//text()").get()
        detail=response.xpath("//div[@class='av_bot_left left']//p")
        for i in range(len(detail)):
           
            if 'Decizia de intrare:' in detail[i].get():
                d1=response.xpath("//em[@class='ral_i']//text()").get()
                print(d1)

They will provide me these output:

Decizia de intrare:

But the actual output that I want is these as you seen below the page of the website https://www.baroul-bucuresti.ro/avocat/15655/aanegroae-ana-maria :

Decizia de intrare: 2469/1-06.12.16

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

try this:

Instead of taking the xpath of the root node in your if statement I take the xpath of the node you have already identified as having the text you desire. Then I just do some string formatting.

   def parse_book(self, response):
        title=response.xpath("//h1//text()").get()
        detail=response.xpath("//div[@class='av_bot_left left']//p")
        for i in range(len(detail)):
           
            if 'Decizia de intrare:' in detail[i].get():
                d1=detail[i].xpath('.//text()').getall()  
                d1 = " ".join([i.strip() for i in d1 if i.strip()])
                print(d1)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading