Advertisements
I’m using Python for a web scraping project, and I bumped into this URL that downloads a XML file to my PC.
Is there a way I can access the XML file that’s downloaded when you click the link? I’m ok with saving the XML locally if that’s the only way, but I have no idea how to do so.
I’ve tried using the requests
module, but I get the byte string when doing so.
import requests
r = requests.get(
"https://fnet.bmfbovespa.com.br/fnet/publico/downloadDocumento?id=465601"
)
print(r.content)
>Solution :
You need to specify the request headers to download from that specific site.
Here is how i did it:
import requests
filename = "file.xml"
url = "https://fnet.bmfbovespa.com.br/fnet/publico/downloadDocumento?id=465601"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Accept-Language': 'en-US,en;q=0.9',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
'DNT': '1',
'Host': 'fnet.bmfbovespa.com.br',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Sec-GPC': '1',
'Upgrade-Insecure-Request': '1'
}
r = requests.get("https://fnet.bmfbovespa.com.br/fnet/publico/downloadDocumento?id=465601", headers=headers)
with open(filename, "wb") as file:
file.write(r.content)