scrapy爬取招聘網站，items轉換成dict遇到的問題

2018-06-14 12:21:00

pipelines代碼

1 import json
 2 
 3 class TencentJsonPipeline(object):
 4     def __init__(self):
 5         self.file = open('tencent.json','wb')
 6 
 7     def process_item(self, item, spider):
 8         content = json.dumps(dict(item),ensure_ascii=False)+"\n"
 9         self.file.write(content)
10         return item
11     def close_project(self):
12         self.file.close()

報錯：

self.file.write(content)
TypeError: a bytes-like object is required, not 'str'

這個問題是基本的編碼解碼問題，打開json檔案時不能用‘wb’，而是‘w’，編碼方式為utf-8

更正後代碼：

1 class TencentJsonPipeline(object):
 2     def __init__(self):
 3         self.file = open('tencent.json','w',encoding='utf-8')
 4 
 5     def process_item(self, item, spider):
 6         content = json.dumps(dict(item),ensure_ascii=False)+"\n"
 7         self.file.write(content)
 8         return item
 9     def close_project(self):
10         self.file.close()

運作正常

參考位址：https://stackoverflow.com/questions/44682018/typeerror-object-of-type-bytes-is-not-json-serializable

scrapy爬取招聘網站，items轉換成dict遇到的問題

繼續閱讀

Scrapy ：全站爬取文學文章

Scrapy Crawl 運作出錯 AttributeError: 'xxxSpider' object has no attribute '_rules' 的問題解決

CrawlSpider一鍵爬取投标網

Python Scrapy 全站爬蟲

爬取豆瓣電影TP250（文字資訊+儲存圖檔）

Scrapy架構的一些學習心得Scrapy架構的一些學習心得

scrapy MapCompose 一些操作

windows下搭建爬蟲架構scrapy

scrapy與requests的了解與爬蟲優化想法

【Python】Scrapy爬蟲介紹&&requests爬蟲移植到Scrapy爬蟲ScrapyScrapy爬蟲執行個體編寫/re爬蟲移植

用scrapy爬取小說網站，并儲存到資料庫

Scrapy抓取在不同級别Request之間傳遞參數

scrapy在不同的Request之間傳遞參數的辦法

scrapy常用指令筆記

幾種常見的疊代器

【崔慶才教材】《Python3網絡爬蟲開發實戰》3.4爬取貓眼電影排行代碼更正（繞過美團驗證碼）