TaoBao Crawler

淘寶爬蟲 TaoBaoCrawlerTaoBao Crawler2017-10-20更新！突破&終結淘寶全線封鎖！

星期一, 20. 十一月 2017 07:10下午

做圖像處理和做物體識别檢測的很多朋友都會有這樣一個感受。沒有資料集！！！！部落客一直苦于沒有資料集。而作為最大的，最集中的圖檔庫–淘寶網（百度出來的圖檔的離散度太高）卻無法使用簡單的爬蟲方式擷取圖檔。（妹子圖的爬蟲+cook處理也沒有用。。。）為此，部落客使用了selenium仿真爬蟲，這種爬蟲的相容性比較高，隻是速度一般，5000張圖檔需要大約30min，如果爬一夜的話，估計就夠我們用的了。源碼在github倉庫裡。也可使用

git clone [email protected]:lucky-ing/TaoBaoCrawler.git

下載下傳到本地。

2017-10-20更新！突破&終結淘寶全線封鎖！

依賴項

firefox browser

python 3.0+

selenium

BeautifulSoup

tqdm

Tutorial

#this program uses firefox browser, we should have firefox browser in computer first.\n
#if we just use the base function of this program
#-s specific the key word of searching.you can input any word you want to scraper. this parameter is requested!
python3 main.py -s 電腦
 #-f specific the path where Image storage location
python3 main.py -s 電腦 -f ./image
#-n specific the max num of the downloads
python3 main.py -s 電腦 -n

淘寶爬蟲 TaoBaoCrawlerTaoBao Crawler2017-10-20更新！突破&終結淘寶全線封鎖！

about the selenium

selenium is a Automated testing tool, it can run browser automatly. Taobao have anti-reptile strategy that if we use this tool, taobao website will send the context.

淘寶爬蟲效果

淘寶爬蟲 TaoBaoCrawlerTaoBao Crawler2017-10-20更新！突破&終結淘寶全線封鎖！

淘寶爬蟲 TaoBaoCrawlerTaoBao Crawler2017-10-20更新！突破&終結淘寶全線封鎖！

TaoBao Crawler

2017-10-20更新！突破&終結淘寶全線封鎖！

依賴項

Tutorial

about the selenium

淘寶爬蟲效果

繼續閱讀

2023爬蟲學習筆記 -- 多線程操作

M團店鋪評價采集不到問題問題展示：解決方案：

Python爬蟲學習（1）

Python爬蟲學習進階

Python爬蟲（入門+進階）學習筆記 1-2 初識Python爬蟲

Python進階爬蟲——Class1：認識爬蟲

python爬蟲學習筆記-1

python學習之urllib使用小結

NOIp模拟題之肮髒的牧師（桶排序）

一篇文章教你如何在一個月内學會爬取大規模資料

爬取央視網節目單欄目資訊！

Pyhton爬蟲實戰 - 抓取BOSS直聘職位描述和資料清洗Pyhton爬蟲實戰 - 抓取BOSS直聘職位描述和資料清洗

selenium 自動搶課——電子科大自動搶課腳本前言：使用方法：`代碼：

selenium操作cookie

利用Selenium內建TestLink做自動化測試

sort()函數到底是怎樣進行數字排序的

淘寶爬蟲 TaoBaoCrawlerTaoBao Crawler2017-10-20更新！突破&amp;終結淘寶全線封鎖！

TaoBao Crawler

2017-10-20更新！突破&終結淘寶全線封鎖！

依賴項

Tutorial

about the selenium

淘寶爬蟲效果

繼續閱讀

淘寶爬蟲 TaoBaoCrawlerTaoBao Crawler2017-10-20更新！突破&終結淘寶全線封鎖！