天天看點

關于Scrapy 自定義Spider Middleware中遇到的坑

關于Scrapy 的”TypeError: process_spider_output() got multiple values for argument ‘response’”的解決辦法

由于Scrapy生成的middlewares.py中, 預設的SpiderMiddleware的類方法都不帶有self參數, 如

未修改之前的:
def process_spider_output(response, result, spider):
        # Called with the results returned from the Spider, after
        # it has processed the response.

        # Must return an iterable of Request, dict or Item objects.
        for i in result:
            yield i
           

然後隻需要将所有類方法都加上’self’, 則可解決自定義SpiderMiddleware的這個問題

注意: 你也可以在自定義SpiderMiddleware中重寫所有類方法, 并保證第一參數是’self’, 這樣也可以解決

經修改之後的:
def process_spider_output(self, response, result, spider):
        # Called with the results returned from the Spider, after
        # it has processed the response.

        # Must return an iterable of Request, dict or Item objects.
        for i in result:
            yield i