概述：

站長之家的圖檔爬取

使用

BeautifulSoup

解析html

通過浏覽器的形式來爬取,爬取成功後以二進制儲存，儲存的時候根據每一頁按頁存放每一頁的圖檔

第一頁：http://sc.chinaz.com/tupian/index.html

第二頁：http://sc.chinaz.com/tupian/index_2.html

第三頁：http://sc.chinaz.com/tupian/index_3.html

以此類推，周遊20頁

源代碼

# @Author: lomtom
# @Date:   2020/2/27 14:22
# @email: [email protected]

# 站長之家的圖檔爬取
# 使用BeautifulSoup解析html
# 通過浏覽器的形式來爬取,爬取成功後以二進制儲存

# 第一頁：http://sc.chinaz.com/tupian/index.html
# 第二頁：http://sc.chinaz.com/tupian/index_2.html
# 第三頁：http://sc.chinaz.com/tupian/index_3.html
# 周遊14頁

import os
import requests
from bs4 import BeautifulSoup

def getImage():
    url = ""
    for i in range(1,15):
        # 建立檔案夾,每一頁放進各自的檔案夾
        download = "images/%d/"%i
        if not os.path.exists(download):
            os.mkdir(download)
        # url
        if i ==1:
            url = "http://sc.chinaz.com/tupian/index.html"
        else:
            url = "http://sc.chinaz.com/tupian/index_%d.html"%i
        #發送請求擷取響應，成功狀态碼為200
        response = requests.get(url)
        if response.status_code == 200:
            # 使用bs解析網頁
            bs = BeautifulSoup(response.content,"html5lib")
            # 定位到圖檔的div
            warp = bs.find("div",attrs={"id":"container"})
            # 擷取img
            imglist = warp.find_all_next("img")
            for img in imglist:
                # 擷取圖檔名稱和連結
                title = img["alt"]
                src = img["src2"]
                # 存入檔案
                with open(download+title+".jpg","wb") as file:
                    file.write(requests.get(src).content)
            print("第%d頁列印完成"%i)

if __name__ == '__main__':
    getImage()

效果圖

【python爬蟲實戰】批量爬取站長之家的圖檔作者

作者

1、作者個人網站

2、作者CSDN

3、作者部落格園

4、作者簡書

【python爬蟲實戰】批量爬取站長之家的圖檔作者

概述：

源代碼

效果圖

作者

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入