天天看点

关于Scrapy 自定义Spider Middleware中遇到的坑

关于Scrapy 的”TypeError: process_spider_output() got multiple values for argument ‘response’”的解决办法

由于Scrapy生成的middlewares.py中, 默认的SpiderMiddleware的类方法都不带有self参数, 如

未修改之前的:
def process_spider_output(response, result, spider):
        # Called with the results returned from the Spider, after
        # it has processed the response.

        # Must return an iterable of Request, dict or Item objects.
        for i in result:
            yield i
           

然后只需要将所有类方法都加上’self’, 则可解决自定义SpiderMiddleware的这个问题

注意: 你也可以在自定义SpiderMiddleware中重写所有类方法, 并保证第一参数是’self’, 这样也可以解决

经修改之后的:
def process_spider_output(self, response, result, spider):
        # Called with the results returned from the Spider, after
        # it has processed the response.

        # Must return an iterable of Request, dict or Item objects.
        for i in result:
            yield i