一、建立工程
scrapy startproject shop
二、Items.py檔案代碼:
import scrapy
class ShopItem(scrapy.Item):
title = scrapy.Field()
time = scrapy.Field()
三、shopspider.py檔案爬蟲代碼
# -*-coding:UTF-8-*-
from shop.items import ShopItem
class shopSpider(scrapy.Spider):
name = "shop"
allowed_domains = ["news.xxxxxxx.xx.cn"]
def parse(self,response):
item = ShopItem()
item['title'] = response.xpath("//div[@class='txttotwe2']/ul/li/a/text()").extract()
item['time'] = response.xpath("//div[@class='txttotwe2']/ul/li/font/text()").extract()
yield item
四、pipelines.py檔案代碼(列印出内容):
注意:如果在shopspider.py檔案中列印出内容則顯示的是unicode編碼,而在pipelines.py列印出來的資訊則是正常的顯示内容。
class ShopPipeline(object):
def process_item(self, item, spider):
count=len(item['title'])
print 'news count: ' ,count
for i in range(0,count):
print 'biaoti: '+item['title'][i]
print 'shijian: '+item['time'][i]
return item
五、爬取顯示的結果:
root@kali:~/shop# scrapy crawl shop --nolog
news count: 40
biaoti: xxx建成國家食品安全示範城市
shijian: (2017-06-16)
biaoti: xxxx考試開始報名
……………………
…………………..
本文轉自 老鷹a 51CTO部落格,原文連結:http://blog.51cto.com/laoyinga/1940001