Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

HTTP request works with curl but fails with Python with 403

I’m trying to download the rss feed from "https://www.straitstimes.com/news/singapore/rss.xml". I have the following Python script:

import requests

r = requests.get('https://www.straitstimes.com/news/singapore/rss.xml')

for k, v in r.headers.items():
    print("{}: {}".format(k, v))
    
print(r.content)

When I run this, I get the following response:

Cache-Control: max-age=0, no-cache, no-store                                                                                          
Content-Type: text/html                                                                                                               
Date: Wed, 13 Dec 2023 03:06:00 GMT                                                                                                   
Expires: Wed, 13 Dec 2023 03:05:59 GMT                                                                                                
Referrer-Policy: no-referrer-when-downgrade                                                                                           
Server: ECD (sgc/56B1)                                                                                                                
Set-Cookie: sph_user_country=SG;Path=/;                                                                                               
X-EC-Security-Audit: 403                                                                                                              
x-vmg-version: v10.5.70                                                                                                               
Content-Length: 345                                                                                                                   
b'<?xml version="1.0" encoding="iso-8859-1"?>\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n         "http://www.w3
.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\t<head>\n\t\t<titl
e>403 - Forbidden</title>\n\t</head>\n\t<body>\n\t\t<h1>403 - Forbidden</h1>\n\t</body>\n</html>\n'

When I try to get it with curl using the following request (I’m trying to force HTTP/1.1 and remove any user-agent/accept headers from the request), I get the XML just fine. What am I doing wrong with requests?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

curl https://www.straitstimes.com/news/singapore/rss.xml -v --http1.1 -H 'User-Agent:' -H 'Accept:'

>Solution :

You can try like this

import requests

headers = {
    'User-Agent': '',
    'Accept': ''
}

url = 'https://www.straitstimes.com/news/singapore/rss.xml'
r = requests.get(url, headers=headers)
print(r.status_code)  
if r.status_code == 200:
    print(r.text) 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading