CrawlSpider一鍵爬取投标網

2023-08-07 02:48:23

驚了個呆不到20行爬完~

cmd:
scrapy startproject toubiao
cd toubiao
scrapy genspider -t crawl gg .com

# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
import re

class GgSpider(CrawlSpider):
    name = 'gg'
    allowed_domains = ['bidchance.com']
    start_urls = ['http://www.bidchance.com/outlinegonggao.html']

    rules = (				#連結提取   目的地  提取之後是否繼續提取
        Rule(LinkExtractor(allow=r'www.bidchance.com/info-gonggao-(\d+)\.html'), callback='parse_item', follow=False),
        Rule(LinkExtractor(allow=r'http://www.bidchance.com/outlinegonggao\d+\.html'), follow=True)
           )

    def parse_item(self, response):
        item = {}
        item["title"] = response.xpath('//div[@class="xlh"]/text()').extract_first().strip()
        item["date"] = re.findall('釋出日期：(2019年\d{2}月\d{2}日)',response.text)[0]

        print(item)

CrawlSpider一鍵爬取投标網

驚了個呆不到20行爬完~

繼續閱讀

Scrapy ：全站爬取文學文章

Scrapy Crawl 運作出錯 AttributeError: 'xxxSpider' object has no attribute '_rules' 的問題解決

Spider和CrawlSpiderSpider和CrawlSpider

Scrapy--CrawlSpiderCrawlSpider簡介CrawlSpider實戰

Python Scrapy 全站爬蟲

爬取豆瓣電影TP250（文字資訊+儲存圖檔）

Scrapy架構的一些學習心得Scrapy架構的一些學習心得

scrapy MapCompose 一些操作

windows下搭建爬蟲架構scrapy

scrapy與requests的了解與爬蟲優化想法

【Python】Scrapy爬蟲介紹&&requests爬蟲移植到Scrapy爬蟲ScrapyScrapy爬蟲執行個體編寫/re爬蟲移植

用scrapy爬取小說網站，并儲存到資料庫

Scrapy抓取在不同級别Request之間傳遞參數

scrapy在不同的Request之間傳遞參數的辦法

scrapy常用指令筆記

【崔慶才教材】《Python3網絡爬蟲開發實戰》3.4爬取貓眼電影排行代碼更正（繞過美團驗證碼）

CrawlSpider一鍵爬取投标網

驚了個呆 不到20行爬完~

繼續閱讀

驚了個呆不到20行爬完~