This Middleware adds some settings to configure how to work with Crawlera.
Unique Crawlera API Key provided for authentication.
Crawlera instance url, it varies depending on adquiring a private or dedicated instance. If Crawlera didn’t provide you with a private instance url, you don’t need to specify it.
Number of consecutive bans from Crawlera necessary to stop the spider.
Timeout for processing Crawlera requests. It overrides Scrapy’s
False Sets Scrapy’s
0, making the spider to crawl faster. If set to
True, it will
respect the provided
DOWNLOAD_DELAY from Scrapy.
Default headers added only to crawlera requests. Headers defined on
DEFAULT_REQUEST_HEADERS will take precedence as long as the
CrawleraMiddleware is placed after the
DefaultHeadersMiddleware. Headers set on the requests have precedence over the two settings.
- This is the default behavior,
DefaultHeadersMiddlewaredefault priority is
400and we recommend
CrawleraMiddlewarepriority to be
Step size used for calculating exponential backoff according to the formula:
random.uniform(0, min(max, step * 2 ** attempt)).
Max value for exponential backoff as showed in the formula above.
List of HTTP response status codes that warrant enabling Crawlera for the corresponding domain.
When a response with one of these HTTP status codes is received after a request that did not go through Crawlera, the request is retried with Crawlera, and any new request to the same domain is also sent through Crawlera.