python 爬蟲-1：下載下傳網頁源代碼

2022-12-18 12:41:06

下載下傳靜态網頁源代碼的 python 爬蟲函數源代碼：

import urllib2
def download(url, num_retries = 5):
    '''
    function: 下載下傳網頁源代碼，如果遇到 5xx 錯誤狀态，則繼續嘗試下載下傳，直到下載下傳 num_retries 次為止。
    '''
    print "downloading " , url
    try:
        html = urllib2.urlopen(url).read()
    except urllib2.URLError as e:
        print "download error: " , e.reason
        html = None
        if num_retries > 0:
            if hasattr(e,'code') and 500 <= e.code < 600:
                return download(url, num_retries-1)

    return html

其中 url 即為你想現在的網頁位址。 num_reties 為遇到 5xx 錯誤的時候，重試下載下傳的次數。

具體詳見我的部落格：

python 爬蟲-1：下載下傳網頁源代碼

繼續閱讀

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

sort()函數到底是怎樣進行數字排序的

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入