HTTP request works with curl but fails with Python with 403

December 13, 2023

I’m trying to download the rss feed from "https://www.straitstimes.com/news/singapore/rss.xml". I have the following Python script:

import requests

r = requests.get('https://www.straitstimes.com/news/singapore/rss.xml')

for k, v in r.headers.items():
    print("{}: {}".format(k, v))
    
print(r.content)

When I run this, I get the following response:

Cache-Control: max-age=0, no-cache, no-store                                                                                          
Content-Type: text/html                                                                                                               
Date: Wed, 13 Dec 2023 03:06:00 GMT                                                                                                   
Expires: Wed, 13 Dec 2023 03:05:59 GMT                                                                                                
Referrer-Policy: no-referrer-when-downgrade                                                                                           
Server: ECD (sgc/56B1)                                                                                                                
Set-Cookie: sph_user_country=SG;Path=/;                                                                                               
X-EC-Security-Audit: 403                                                                                                              
x-vmg-version: v10.5.70                                                                                                               
Content-Length: 345                                                                                                                   
b'<?xml version="1.0" encoding="iso-8859-1"?>\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n         "http://www.w3
.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\t<head>\n\t\t<titl
e>403 - Forbidden</title>\n\t</head>\n\t<body>\n\t\t<h1>403 - Forbidden</h1>\n\t</body>\n</html>\n'

When I try to get it with curl using the following request (I’m trying to force HTTP/1.1 and remove any user-agent/accept headers from the request), I get the XML just fine. What am I doing wrong with requests?

curl https://www.straitstimes.com/news/singapore/rss.xml -v --http1.1 -H 'User-Agent:' -H 'Accept:'

>Solution :

You can try like this

import requests

headers = {
    'User-Agent': '',
    'Accept': ''
}

url = 'https://www.straitstimes.com/news/singapore/rss.xml'
r = requests.get(url, headers=headers)
print(r.status_code)  
if r.status_code == 200:
    print(r.text)