python網絡爬蟲學習日記-----urllib中urlopen()的使用

2023-08-05 14:51:17

urllib的四個子產品

request:基本的Http請求子產品
error:異常子產品
parse:工具子產品，url處理方法
robotparser:識别網上的robots.tst檔案，判斷網站是否可爬

發送請求

urlopen()

先使用urlopen()進行最基本的頁面抓取

import urllib.request

response=urllib.request.urlopen(‘https://www.python.org’)

print(response.read().decode(‘utf-8’))

python網絡爬蟲學習日記-----urllib中urlopen()的使用

這樣就獲得了python官網的html代碼

再用type(response)傳回response的類型

傳回結果：<class ‘http.client.HTTPResponse’>

可以看到傳回了一個HTTPResponse類型的對象其中包含了read(),readinto(),gethender(“””)擷取響應頭資訊中的子資訊,getheaders()響應頭資訊,fileno()等方法及msg,version,status擷取響應狀态碼,reason,debuglevel,closed各種屬性

urlopen()還有幾個參數

1).data參數為可選參數，需要bytes()方法轉化，并且請求方式變成post

#urlopen()中data參數練習
data=bytes(urllib.parse.urlencode({'word':'hello'}),encoding='utf-8')
response=urllib.request.urlopen('http://httpbin.org/post',data=data)
print(response.read())

這樣我們傳遞的參數會出現在form字段中模拟了表單送出方式，以post方式傳輸資料

2）timeout參數

#urlopen()中timeout參數說明  超過請求指定時間報異常
try:
    response=urllib.request.urlopen('http://httpbin.org/get',timeout=0.1)
except urllib.error.URLError as e:
    if isinstance(e.reason,socket.timeout):
        print("time out")

運作結果：

python網絡爬蟲學習日記-----urllib中urlopen()的使用

下一個筆記介紹比urlopen()更強大的request()

python網絡爬蟲學習日記-----urllib中urlopen()的使用

urllib的四個子產品

發送請求

運作結果：

python網絡爬蟲學習日記-----urllib中urlopen()的使用

繼續閱讀

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

sort()函數到底是怎樣進行數字排序的

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入

python網絡爬蟲學習日記-----urllib中urlopen()的使用

urllib的四個子產品

發送請求

運作結果： python網絡爬蟲學習日記-----urllib中urlopen()的使用

繼續閱讀

運作結果：

python網絡爬蟲學習日記-----urllib中urlopen()的使用