Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

JavaScript in requests package python

I want to get text from a site using Python.
But the site uses JavaScript and the requests package to receive only JavaScript code.
Is there a way to get text without using Selenium?

import requests as r

a=r.get('https://aparat.com/').text

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

If the site loads content using javascript then the javascript has to be run in order to get the content. I ran into this issue a while back when I did some web scraping, and ended up using Selenium. Yes its slower than BeautifulSoup but it’s the easiest solution.

If you know how the server works you could send a request and it should return with content of some kind (whether that be html, json, etc)

Edit: Load the developer tools, go to network tab and refresh the page. Look for an XHR request and the URL it uses. You may be able to use this data for your needs.

For example I found these URLs:

https://www.aparat.com/api/fa/v1/etc/page/config/mode/full
https://www.aparat.com/api/fa/v1/video/video/list/tagid/1?next=1

If you navigate to these in your browser you will notice JSON content, you might be able to use this. I think some of the text is encoded in Unicode e.g \u062e\u0644\u0627\u0635\u0647 \u0628\u0627\u0632\u06cc -> خلاصه بازی

I don’t know the specific python implementation you might use. Look for libs that support making http requests and recieving data. That way you can avoid selenium. But you must know the URL’s beforehand. Like shown above.

For example this is what I would do:

  1. Make a http request to the URL you find in developer tools
  2. With JSON content, use a JSON parser to get a table/array/dictionary natively. You can then traverse this in the native programming language.
  3. Use a unicode decoder to get the text in normal text format, there might be a lib to do this, but for example on this website using the "Decode/Unescape Unicode Entities" I was able to get the text.

I hope this helps.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading