urllib的使用

2021-11-10 11:42:16

urlopen方法與HTTPResponse類

不多BB，直接上代碼

from urllib.request import urlopen


response = urlopen('http://www.baidu.com')
print(response.closed)
with response:     # 冒号
    print(1, type(response))
    print(2, response.status, response.reason)
    print(3, response.geturl())
    print(4, response.info())
    print(5, response.read())


print(response.closed)

urlopen()方法傳回一個http.client.HTTPResponse類響應對象，它是一個類檔案對象

with response:語句開啟了一個上下文管理器，用于簡化try-except-finally形式語句。詳細介紹可點選參考文章。with response之前的print(response.closed)顯示結果為False，表示該類檔案對象還未關閉，with response之後的print(response.closed)顯示結果為True,表示此時已關閉類檔案對象。

response.geturl()才真正擷取相應url

response.read()擷取了該網頁的所有内容

Request類與User-Agent(使用者代理)問題

為了防止一些網站的反爬蟲機制使得我們的爬蟲程式無法運作，我們需要利用Request類構造HTTP響應中的User-Agent(UA)将自己僞裝起來。我們借助浏覽器的UA來進行構造，如下圖的紅框所示，找到User-Agent部分并進行複制。

下面是Request類的類定義，箭頭所指的内容hearders是一個字典，也就是鍵值對(K-V對)

from urllib.request import urlopen, Request
from http.client import HTTPResponse


url = 'http://www.baidu.com'  # 實際自動跳轉301、302
ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
req = Request(url, headers={
    'User-agent': ua
})
response = urlopen(req)  # HTTP的GET請求。并用req代替了url

print(response.closed)
with response:     # 冒号
    print(1, type(response))  # response是一個http.client.HTTPResponse的響應對象，類檔案對象
    print(2, response.status, response.reason)
    print(3, response.geturl())
    print(4, response.info())  # 擷取HTTP請求頭
    print(5, response.read())
    print(6, response._method)  # 以'_'開頭的變量是保護變量，設計上不希望通路
print(response.closed)

urllib的使用

繼續閱讀

PAT (Advanced Level) Practise 1065 A+B and C (64bit) (20)

POJ 3093 Margaritas on the River Walk

HDU 1010 Tempter of the Bone

TYVJ-P1035 棋盤覆寫

CodeForces 18E Flag 2

PAT (Advanced Level) Practise 1024 Palindromic Number (25)

使用try-with-resources優雅關閉資源

【FPGA實作GA】基于FPGA的GA優化算法的設計與實作

解決方案之：DM relay 處理單元報錯

Perl與網絡監控

PAT (Advanced Level) Practise 1131 Subway Map (30)

ZOJ 3938 Defuse the Bomb

CSU 1565 Word Cloud

ZOJ 3700 Ever Dream

ZOJ 1199 Point of Intersection

CSU 1567 Reverse Rot