天天看點

[雪峰磁針石部落格]2018最佳人工智能資料采集(爬蟲)工具書下載下傳

Python網絡資料采集

Python網絡資料采集 - 2016.pdf

本書采用簡潔強大的Python語言,介紹了網絡資料采集,并為采集新式網絡中的各種資料類型提供了全面的指導。第 1部分重點介紹網絡資料采集的基本原理:如何用Python從網絡伺服器請求資訊,如何對伺服器的響應進行基本處理,以及如何以自動化手段與網站進行互動。第 二部分介紹如何用網絡爬蟲測試網站,自動化處理,以及如何通過更多的方式接入網絡。

Web Scraping with Python 2nd - 2018.pdf https://github.com/REMitchell/python-scraping

2000左右星

精通Python爬蟲架構Scrapy

Scrapy是使用Python開發的一個快速、高層次的螢幕抓取和Web抓取架構,用于抓Web站點并從頁面中提取結構化的資料。《精通Python爬蟲架構Scrapy》以Scrapy 1.0版本為基礎,講解了Scrapy的基礎知識,以及如何使用Python和三方API提取、整理資料,以滿足自己的需求。

本書共11章,其内容涵蓋了Scrapy基礎知識,了解HTML和XPath,安裝Scrapy并爬取一個網站,使用爬蟲填充資料庫并輸出到移動應用中,爬蟲的強大功能,将爬蟲部署到Scrapinghub雲伺服器,Scrapy的配置與管理,Scrapy程式設計,管道秘訣,了解Scrapy性能,使用Scrapyd與實時分析進行分布式爬取。本書附錄還提供了各種軟體的安裝與故障排除等内容。

本書适合軟體開發人員、資料科學家,以及對自然語言處理和機器學習感興趣的人閱讀。

Learning Scrapy -2016.pdf

另有中文電子版本 因為版權已經在CSDN等網站下架,可以在qq群144081101等找到。

python3爬蟲基礎

線上教程 https://github.com/MorvanZhou/easy-scraping-tutorial

200 左右星

First web scraper

教程:

https://first-web-scraper.readthedocs.io/en/latest/ https://github.com/ireapps/first-web-scraper/blob/master/docs/index.rst

Practical Web Scraping for Data Science -Best Practices and Examples with Python - 2018.pdf

https://github.com/Apress/practical-web-scraping-for-data-science

星級 低于100

This book provides a complete and modern guide to web scraping, using Python as the programming language, without glossing over important details or best practices. Written with a data science audience in mind, the book explores both scraping and the larger context of web technologies in which it operates, to ensure full understanding. The authors recommend web scraping as a powerful tool for any data scientist’s arsenal, as many data science projects start by obtaining an appropriate data set.

Starting with a brief overview on scraping and real-life use cases, the authors explore the core concepts of HTTP, HTML, and CSS to provide a solid foundation. Along with a quick Python primer, they cover Selenium for JavaScript-heavy sites, and web crawling in detail. The book finishes with a recap of best practices and a collection of examples that bring together everything you've learned and illustrate various data science use cases.

用Python寫網絡爬蟲 第2版

《用Python寫網絡爬蟲(第 2版》講解了如何使用Python來編寫網絡爬蟲程式,内容包括網絡爬蟲簡介,從頁面中抓取資料的3種方法,提取緩存中的資料,使用多個線程和程序進行并發抓取,抓取動态頁面中的内容,與表單進行互動,處理頁面中的驗證碼問題,以及使用Scarpy和Portia進行資料抓取,并在最後介紹了使用本書講解的資料抓取技術對幾個真實的網站進行抓取的執行個體,旨在幫助讀者活學活用書中介紹的技術。

《用Python寫網絡爬蟲(第 2版》适合有一定Python程式設計經驗而且對爬蟲技術感興趣的讀者閱讀。

Python Web Scraping 2nd Edition - 2017.pdf

第一版中文

用Python寫網絡爬蟲.pdf https://github.com/kjam/wswp

< 100星

Python Web Scraping Cookbook - 2018.pdf

下載下傳

Python Web Scraping Cookbook is a solution-focused book that will teach you techniques to develop high-performance Scrapers, and deal with cookies, hidden form fields, Ajax-based sites and proxies. You'll explore a number of real-world scenarios where every part of the development or product life cycle will be fully covered. You will not only develop the skills to design reliable, high-performing data flows, but also deploy your codebase to Amazon Web Services (AWS). If you are involved in software engineering, product development, or data mining or in building data-driven products, you will find this book useful as each recipe has a clear purpose and objective.

Right from extracting data from websites to writing a sophisticated web crawler, the book's independent recipes will be extremely helpful while on the job. This book covers Python libraries, requests, and BeautifulSoup. You will learn about crawling, web spidering, working with AJAX websites, and paginated items. You will also understand to tackle problems such as 403 errors, working with proxy, scraping images, and LXML.

By the end of this book, you will be able to scrape websites more efficiently and deploy and operate your scraper in the cloud.

https://github.com/PacktPublishing/Python-Web-Scraping-Cookbook

參考資料

https://github.com/lorien/awesome-web-scraping/blob/master/python.md

最好用的Python爬蟲推薦

https://www.jianshu.com/p/7da43c16dd87 https://www.zhihu.com/question/41277528