Home scrapy intercepts not all of the markup that comes in the request

Questions

scrapy intercepts not all of the markup that comes in the request

June 22, 2023

I’m trying to intercept the markup that comes in http packets, but I only get part of that markup. For some reason it cuts off in the middle. Is it related to that? Here is my code:

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.utils.log import configure_logging


class StackOverflowSpider(scrapy.Spider):
    
    name = 'stackoverflow'
    allowed_domains = ['stackoverflow.com']
    start_urls = ['https://stackoverflow.com/questions/tagged/python?tab=newest&page=1&pagesize=15']
    first_request_done = False
    
    def start_requests(self):
        if not self.first_request_done:
            self.first_request_done = True
            for url in self.start_urls:
                yield scrapy.Request(url=url, callback=self.parse, dont_filter=True)
            
    def parse(self, response):
        if response.status == 200 and response.headers.get('Content-Type', '').startswith(b'text/html'):
            html = response.body.decode('utf-8')
            print(html)
        
        yield
    

configure_logging()
process = CrawlerProcess(settings={
    'LOG_ENABLED': False,
    'DOWNLOAD_DELAY': 1,
    'CONCURRENT_REQUESTS': 1
})
process.crawl(StackOverflowSpider)
process.start(stop_after_crawl=False)

>Solution :

This is just the python print function not properly flushing the output… This can be demonstrated by spliting the page content into lines and printing them out one at a time, or alternatively writing the contents to a file and viewing the full output in the written file.

For example, you can try this to print it out line by line:

def parse(self, response):
    for line in response.text.splitlines():
        print(line)

or if you wanted to write the contents to a file:

def parse(self, response):
    with open('response.html', "wt", encoding="utf8") as htmlfile:
        htmlfile.write(response.text)
    ...
    ...

scrapy

byMR

Published June 22, 2023

Add a comment

Merging and flattening two lists of dictionaries using keys as new fields

byMR

June 22, 2023

Questions

Incorrect answers with [] and set&get of javascript map

byMR

June 22, 2023

Questions

Cannot resolve keyword 'ir' into field. Choices are: category, category_id, date, description, id, is_on_main, name, price, url

byMR

June 22, 2023

Questions

Can't pass object into function

byMR

June 22, 2023

Questions

MySQL query find and replace values inside table

byMR

June 22, 2023

Questions

Error in the borrow checker using recursive function

byMR

June 22, 2023

scrapy intercepts not all of the markup that comes in the request

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Like this:

Leave a ReplyCancel reply

Read more

Merging and flattening two lists of dictionaries using keys as new fields

Incorrect answers with [] and set&get of javascript map

Cannot resolve keyword 'ir' into field. Choices are: category, category_id, date, description, id, is_on_main, name, price, url

Can't pass object into function

MySQL query find and replace values inside table

Error in the borrow checker using recursive function

Keep Up to Date with the Most Important News

scrapy intercepts not all of the markup that comes in the request

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Merging and flattening two lists of dictionaries using keys as new fields

Incorrect answers with [] and set&get of javascript map

Cannot resolve keyword 'ir' into field. Choices are: category, category_id, date, description, id, is_on_main, name, price, url

Can't pass object into function

MySQL query find and replace values inside table

Error in the borrow checker using recursive function

Discover more from Dev solutions