Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to extract text from a tag that is embedded under h2 using scrapy?

enter image description here

I want to extract the name from a tag.

response.css(‘h2.product-names::text’).get()

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

But it is returning:

'<h2 class="product-names">

\<a target="\_blank" href="https://www.electronicsbazaar.com/dell-inspiron-13-7348-core-i5-5200u-2-20ghz-8gb-500gb-int-webcam-win-10-13-3-touch" title='Refurbished Dell Inspiron 13 7348 (Core I5 5Th Gen/8GB/500GB/Int/Win 10/13.3" Touch)'\>\\n                                                                                                            Refurbished Dell Inspiron 13 7348 (Core I5 5Th Gen/8GB/500GB/Int/Win 10/13.3" Touch)                                                                                                                                          </a>

</h2>

'

How can I get the text of the link ?

I tried:

> > > response.css('h2.product-names').get()
> > > '<h2 class="product-names">
> > > 
> > > \<a target="\_blank" href="https://www.electronicsbazaar.com/dell-inspiron-13-7348-core-i5-5200u-2-20ghz-8gb-500gb-int-webcam-win-10-13-3-touch" title='Refurbished Dell Inspiron 13 7348 (Core I5 5Th Gen/8GB/500GB/Int/Win 10/13.3" Touch)'\>\\n                                                                                                            Refurbished Dell Inspiron 13 7348 (Core I5 5Th Gen/8GB/500GB/Int/Win 10/13.3" Touch)                                                                                                                                          </a>
> > > 
> > > </h2>
> > > 
> > > '

>Solution :

the problem is that the name, if i read correctly from your screenshot, is contained in the tag
The right xpath is:

response.xpath('//h2[@class="product-names"]/a/@title').extract()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading