天天看點

scrapy ajax豆瓣,如何用scrapy處理ajax資料

我在做一個蜘蛛網,有一個問題:我試過了擷取一組html資料。以及它包含發送ajax所需的id請求。但是,當我試圖将ajax資料與html中獲得的其他資料放在一起時,結果就是這樣錯了,怎麼了我能解決嗎?我的代碼是:class DoubanSpider(scrapy.Spider):

name = "douban"

allowed_domains = ["movie.douban.com"]

start_urls = ["https://movie.douban.com/review/best"]

def parse(self, response):

for review in response.css(".review-item"):

rev = Review()

rev['reviewer'] = review.css("a[property='v:reviewer']::text").extract_first()

rev['rating'] = review.css("span[property='v:rating']::attr(class)").extract_first()

rev['title'] = review.css(".main-bd>h2>a::text").extract_first()

number = review.css("::attr(id)").extract_first()

f = scrapy.Request(url='https://movie.douban.com/j/review/%s/full' % number,

callback=self.parse_full_passage)

rev['comment'] = f

yield rev

def parse_full_passage(self, response):

r = json.loads(response.body_as_unicode())

html = r['html']

yield html