CrawlSpider一键爬取投标网

2023-08-07 02:48:23

惊了个呆不到20行爬完~

cmd:
scrapy startproject toubiao
cd toubiao
scrapy genspider -t crawl gg .com

# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
import re

class GgSpider(CrawlSpider):
    name = 'gg'
    allowed_domains = ['bidchance.com']
    start_urls = ['http://www.bidchance.com/outlinegonggao.html']

    rules = (				#链接提取   目的地  提取之后是否继续提取
        Rule(LinkExtractor(allow=r'www.bidchance.com/info-gonggao-(\d+)\.html'), callback='parse_item', follow=False),
        Rule(LinkExtractor(allow=r'http://www.bidchance.com/outlinegonggao\d+\.html'), follow=True)
           )

    def parse_item(self, response):
        item = {}
        item["title"] = response.xpath('//div[@class="xlh"]/text()').extract_first().strip()
        item["date"] = re.findall('发布日期：(2019年\d{2}月\d{2}日)',response.text)[0]

        print(item)

CrawlSpider一键爬取投标网

惊了个呆不到20行爬完~

继续阅读

Scrapy ：全站爬取文学文章

Scrapy Crawl 运行出错 AttributeError: 'xxxSpider' object has no attribute '_rules' 的问题解决

Spider和CrawlSpiderSpider和CrawlSpider

Scrapy--CrawlSpiderCrawlSpider简介CrawlSpider实战

Python Scrapy 全站爬虫

爬取豆瓣电影TP250（文字信息+保存图片）

Scrapy框架的一些学习心得Scrapy框架的一些学习心得

scrapy MapCompose 一些操作

windows下搭建爬虫框架scrapy

scrapy与requests的理解与爬虫优化想法

【Python】Scrapy爬虫介绍&&requests爬虫移植到Scrapy爬虫ScrapyScrapy爬虫实例编写/re爬虫移植

用scrapy爬取小说网站，并保存到数据库

Scrapy抓取在不同级别Request之间传递参数

scrapy在不同的Request之间传递参数的办法

scrapy常用命令笔记

【崔庆才教材】《Python3网络爬虫开发实战》3.4爬取猫眼电影排行代码更正（绕过美团验证码）

CrawlSpider一键爬取投标网

惊了个呆 不到20行爬完~

继续阅读

惊了个呆不到20行爬完~