Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Getting scrapy and pytest to work with AsyncioSelectorReactor

To reproduce my issue

  • python 3.12.1
  • scrapy 2.11.2
  • pytest 8.2.1

In bookspider.py I have:

from typing import Iterable

import scrapy
from scrapy.http import Request


class BookSpider(scrapy.Spider):
    name = None

    def start_requests(self) -> Iterable[Request]:
        yield scrapy.Request("https://books.toscrape.com/")

    def parse(self, response):
        books = response.css("article.product_pod")
        for book in books:
            yield {
                "name": self.name,
                "title": book.css("h3 a::text").get().strip(),
            }

In test_bookspider.py I have:

import json
import os

from pytest_twisted import inlineCallbacks
from scrapy.crawler import CrawlerRunner
from twisted.internet import defer

from bookspider import BookSpider


@inlineCallbacks
def test_bookspider():
    runner = CrawlerRunner(
        settings={
            "REQUEST_FINGERPRINTER_IMPLEMENTATION": "2.7",
            "FEEDS": {"books.json": {"format": "json"}},
            "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
            # "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor",
        }
    )
    yield runner.crawl(BookSpider, name="books")

    with open("books.json", "r") as f:
        books = json.load(f)
    assert len(books) >= 1
    assert books[0]["name"] == "books"
    assert books[0]["title"] == "A Light in the ..."

    os.remove("books.json")

    defer.returnValue(None)

With "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor" uncommented I get the following error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Exception: The installed reactor (twisted.internet.selectreactor.SelectReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)

With "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor" uncommented my test passes.

Can anyone explain this behaviour and more broadly how to test CrawlerRunner or CrawlerProcess with pytest?

>Solution :

If you use pytest-twisted you need to tell it to install an appropriate reactor by passing --reactor=asyncio to your pytest command, otherwise it will install the default reactor. See https://github.com/pytest-dev/pytest-twisted#using-the-plugin

how to test CrawlerRunner or CrawlerProcess with pytest?

You shouldn’t use CrawlerProcess in things like pytest tests, because it will start and stop the reactor for you. If you really need to test those you should write tests that use a single process per a CrawlerProcess invocation.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading