Python标準庫urllib

2023-05-18 01:54:53

urllib是python的一個擷取url的子產品。它用urlopen函數的形式提供了一個非常簡潔的接口。這使得用各種各樣的協定擷取url成為可能。它同時也提供了一個稍微複雜的接口來處理常見的狀況-如基本的認證，cookies，代理，等等。這些都是由叫做opener和handler的對象來處理的。

urllib

import urllib
s = urllib.urlopen('http://tieba.baidu.com/p/3606519228')
print s.read()  #将會列印出整個檔案的html源代碼

s.readline() #列印Html代碼的第一行
s.getcode()  #傳回Http狀态碼。如果是http請求，200請求成功完成;404網址未找到
s.info()     #傳回一個httplib.HTTPMessage對象，表示遠端伺服器傳回的頭資訊
s.geturl()   #傳回請求的url

>>> s = urllib.urlopen('http://www.alwme.com/')
>>> byte = s.read()
>>> print("從 %s 上擷取了 %s 位元組") % (s.geturl(),len(byte))
從 http://alwme.com/ 上擷取了 26834 位元組

urlget() 擷取舊網址重定向後的網址，有時候我們常常通路一個網址會跳轉到另一個網址，比如360百萬美金買的網址www.so.com會跳轉到新品牌網址www.haosou.com

from urllib2 import Request, urlopen

s = 'http://www.me.com/'
req = Request(s)
response = urlopen(req)

print 'old url is: ' + s 
print 'The redirect url is: ' + response.geturl(

運作結果：

Python标準庫urllib

info() 這個傳回對象的字典對象，該字典描述了擷取的頁面情況。通常是伺服器發送的特定頭headers。目前是httplib.HTTPMessage 執行個體。包含"Content-length"，"Content-type"，和其他内容

print  urlopen('http://www.so.com').info()

運作結果：

Python标準庫urllib

urlretrieve() 傳回一個二進制組 urlretrieve将url定位到的html檔案下載下傳到你本地的硬碟中，如果不指定filename，則會存為臨時檔案。

臨時存放：

>>> filename = urllib.urlretrieve('http://www.alwme.com/')
>>> type(filename)
<type 'tuple'>
>>> print filename
('/tmp/tmpaOdE2g', <httplib.HTTPMessage instance at 0x7f1b021e8680>)

存為本地檔案：

>>> filename = urllib.urlretrieve('http://www.alwme.com/',filename='/home/zhg/temptest/alwme.html')
>>> type(filename)
<type 'tuple'>
>>> print filename
('/home/zhg/temptest/alwme.html', <httplib.HTTPMessage instance at 0x7f1b021e8a28>)

urllib.urlcleanup()    #清除由于urllib.urlretrieve()所産生的緩存

Python标準庫urllib

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入