主要是說要爬蟲就要安裝的工具,僅簡單說一下。大部分都能pip安裝。熒光的需要另外安裝
- python3 建議安裝Anaconda,這樣python3和Anaconda同時安裝好了,為以後省去不少麻煩。
- 請求庫: requests, selenium, chromedriver , geckodriver, phantomjs, aiohttp
- 解析庫: lxml, beautifulsoup4, pyquery, tesserocr
- 資料庫:mysql, mongodb, redis
- 存儲庫:pymysql, pymongo, redis-py, redisdump
- Web庫:flask, tornado
- App爬取相關庫:Charles, mitmproxy, appium
- 爬蟲架構:pyspider, scrapy, scrapy-splash, scrapy-redis
- 部署相關庫:docker, scrapyd, scrapyd-client, scrapyd api, scrapyrt, gerapy
chromedriver/geckodriver:
下載下傳:
國内要下載下傳chromedriver隻能到這個鏡像網址
http://npm.taobao.org/mirrors/chromedriver/
Firefox
https://github.com/mozilla/geckodriver/releases
下載下傳對應版本後放在python的scripts檔案夾裡
驗證安裝:
from selenium import webdriver
browser = webdriver.Chrome()
browser = webdriver.Firefox()
打開一個空白的浏覽器,安裝成功
tesserocr:
需要先安裝tesseract:
http://digi.bib.uni-mannheim.de/tesseract
選擇不帶dev版本的下載下傳
然後再 pip install tesserocr pillow
Mysql:
https://www.mysql.com/cn/downloads
然後 pip install pymysql
MongoDB:
https://www.mongodb.com
作者推薦再下載下傳可視化工具robo3t:https://robomongo.org/download
然後 pip install pymongo
Redis:
https://www.redis.cn
作者推薦再下載下傳可視化工具redisdesktopmanager:
https://github.com/uglide/redisdesktopmanager/releases
然後 pip install redis
為了導入導出redis的資料,還需要安裝redisdump
先安裝 ruby ,http://www.ruby-lang.org
然後 gem install redis-dump
Charles:
https://www.charlesproxy.com/download
appium:
https://github.com/appium/appium-desktop/releases
pyspider:
要先安裝pycurl,在下面網址找到适合自己的版本,win64位,python3.7的就要下載下傳
pycurl‑7.43.1‑cp37‑cp37m‑win_amd64.whl
https://www.lfd.uci.edu/~gohlke/pythonlibs/#pycurl
Scrapy:
先pip安裝lxml, pyopenssl,twisted,pywin32。最後再pip安裝scrapy
scrapy-splash
要先安裝splash,通過docker安裝 ,再pip install scrapy-splash
Docker
https://docs.docker.com/docker-for-windows/install/