Python 使用scrapy爬蟲架構爬取圖檔下載下傳并儲存本地

2022-09-22 18:39:13

Scrapy官方文檔:http://scrapy-chs.readthedocs.io/zh_CN/latest/index.html

基本按照文檔的流程過一遍基本就會用了:

在爬取之前,先建立一個新的Scrapy項目,進入終端,輸入下面指令:

scrapy startproject BiZhi

該指令将建立包含下面的内容tutorial目錄:

在終端輸入下面指令進入該項目:

cd BiZhi

輸入我們需要爬取的網址:

scrapy genspider bizhi pic.netbian.com

我們主要編輯的如下圖箭頭所示:

我們先進入spiders下面的bizhi.py進行編輯:

1.下面是擷取圖檔和下載下傳完整代碼:

# -*- coding: utf-8 -*-
import scrapy
from ..items import BizhiItem

class BizhiSpider(scrapy.Spider):
    name = 'bizhi'
    allowed_domains = ['pic.netbian.com']
    start_urls = ['http://pic.netbian.com/']

    def parse(self, response):
        # 擷取圖檔
        picture_list = response.xpath('//ul[@class="clearfix"]/li/a//@src').extract()
        for picture in picture_list:
            # 拼接完整位址
            url = 'http://pic.netbian.com' + picture
            item = BizhiItem()
            item['url'] = [url]
            yield  item
            # 擷取下一頁位址連結
            next_url = response.xpath('//div[@class="page"]/a/@href').extract()
            for next in next_url:

                if len(next) != 0:
                    # 拼接下一頁完整位址
                    downPageUrl = 'http://pic.netbian.com' + next

                    yield scrapy.Request(url=downPageUrl,callback=self.parse)

2.進入items.py進行編輯:

3.進入settings.py進行編輯:

Python 使用scrapy爬蟲架構爬取圖檔下載下傳并儲存本地

Scrapy官方文檔:http://scrapy-chs.readthedocs.io/zh_CN/latest/index.html

在爬取之前,先建立一個新的Scrapy項目,進入終端,輸入下面指令:

scrapy startproject BiZhi

cd BiZhi

輸入我們需要爬取的網址:

scrapy genspider bizhi pic.netbian.com

scrapy crawl bizhi

繼續閱讀

CSU 1561 (More) Multiplication

CSU 1563 Lexicography

HDU 4721 Food and Productivity

ZOJ 1041 Transmitters

CSU 1562 Fun House

CodeChef PALPROB Palindromeness

UVA 10344- 23 out of 5

ZOJ 1104 Leaps Tall Buildings

HDU 2821 Pusher

UVA 1401 Remember the Word

ZOJ 2748 Free Kick

CSU 1567 Reverse Rot

JAVA 系列——>開發工具IntelliJ IDEA的安裝以及配置、快捷鍵IDEA 簡介

UVA 519 Puzzle (II)

如何成為一名.net 工程師?

磁盤結構及在Linux中的命名

Python 使用scrapy爬蟲架構爬取圖檔下載下傳并儲存本地

Scrapy官方文檔:​​http://scrapy-chs.readthedocs.io/zh_CN/latest/index.html​​

在爬取之前,先建立一個新的Scrapy項目,進入終端,輸入下面指令: scrapy startproject BiZhi

cd BiZhi

輸入我們需要爬取的網址: scrapy genspider bizhi pic.netbian.com

scrapy crawl bizhi

繼續閱讀

Scrapy官方文檔:http://scrapy-chs.readthedocs.io/zh_CN/latest/index.html

在爬取之前,先建立一個新的Scrapy項目,進入終端,輸入下面指令:

scrapy startproject BiZhi

輸入我們需要爬取的網址:

scrapy genspider bizhi pic.netbian.com