urllib2子產品之異常處理

urllib2子產品中最重要的函數是urlopen()函數，用于擷取urls資源（uniform resorce locators）。urlopen函數不僅可以用于簡單的情況，還可以進行複雜情況下的資源擷取如認證(authentication)、cookies、代理等。urlopen支援多種協定，如http、ftp、file等。

http是基于請求、響應的協定，用戶端送出請求、伺服器端作出響應。urllib2通過request對象反映發出的http請求，調用urlopen()時就會送出請求，函數傳回值就是相應的響應對象。

我們下面做個最簡單的post資料送出的測試，當然很多情況下這種簡單的測試是送出不成功的，但是我們可以從中學習的post資料的用法

#!/usr/bin/python

#coding=utf-8

import urllib

import urllib2

user = "yourname"

password = "password"

posturl = "http://www.xiami.com/member/login"

postdata = { 'email' : user,

'password' : password,

'autologin' : '1',

'submit' : '登錄',

'type' : ''

}

req = urllib2.request(posturl)

postdata = urllib.urlencode(postdata)

#enable cookie

opener = urllib2.build_opener(urllib2.httpcookieprocessor())

response = opener.open(req, postdata)

print response.read( )

或者直接使用簡化的寫法

url = "http://www.example.com/"

datas = { "email" : user,

"password" : password

req = urllib2.request(url,urllib.encode(datas))

response= urllib2.urlopen(req)

由于一些網站不希望被程式通路，或網站會發送不同的内容給不同的浏覽器類型，是以需要修改http頭部來将程式僞造成相應的浏覽器，而浏覽器通常通過頭部的user-agent來識别，是以通常隻改user-agent即可。方法是傳遞一個headers頭部字典給request對象。

headers = {"user-agent":"mozilla/4.0 (compatible; msie 6.0; windows nt 5.1"}

request = urllib2.request(url, headers=headers)

response= urllib2.urlopen(request)

也可使用如下代碼

url = "http://www.example.com/"

request = urllib2.request(url)

request.add_header("user-agent", "mozilla/4.0 (compatible; msie 8.0; windows nt 6.1; trident/4.0)")

response = urllib2.urlopen(request)

response.close( )

當urlopen()不能處理響應時會引起urlerror異常。httperror異常是urlerror的一個子類，隻有在通路http類型的url時才會引起。

通常引起urlerror的原因是：無網絡連接配接（沒有到目标伺服器的路由）、通路的目标伺服器不存在。在這種情況下，異常對象會有reason屬性（是一個（錯誤碼、錯誤原因）的元組）

#!coding:utf-8

url="http://www.baidu.com/"

try:

response = urllib2.urlopen(url)

print response.read( )

except urllib2.urlerror, e:

print e.reason

每一個從伺服器傳回的http響應都有一個狀态碼。其中，有的狀态碼表示伺服器不能完成相應的請求，預設的處理程式可以為我們處理一些這樣的狀态碼（如傳回的響應是重定向，urllib2會自動為我們從重定向後的頁面中擷取資訊）。有些狀态碼，urllib2子產品不能幫我們處理，那麼urlopen函數就會引起httperror異常,其中典型的有404/401。

httperror異常的執行個體有整數類型的code屬性，表示伺服器傳回的錯誤狀态碼。

urllib2子產品預設的處理程式可以處理重定向（狀态碼是300範圍），而且狀态碼在100-299範圍内表示成功。是以，能夠引起httperror異常的狀态碼範圍是：400-599.

當引起錯誤時，伺服器會傳回http錯誤碼和錯誤頁面。你可以将htperror執行個體作為傳回頁面，這意味着，httperror執行個體不僅有code屬性，還有read、geturl、info等方法。

#!coding=utf-8

url="http://www.csdn.net/aderstep"

response=urllib2.urlopen(url)

except urllib2.httperror, e:

print e.code

print e.read()

如果想在代碼中處理urlerror和httperror有兩種方法，代碼如下：

url = "http://www.csdn.com/aderstep"

request = urllib2.request(url)

response = urllib2.urlopen(request)

response.close( )

# httperror必須排在urlerror的前面

# 因為httperror是urlerror的子類對象

# 在網通路中引發的所有異常要麼是urlerror類要麼是其子類

# 如果我們将urlerror排在httperror的前面，那麼将導緻httperror異常将永遠不會被觸發

# 因為python在捕獲異常時是按照從前往後的順序挨個比對的

print "the server couldn't fulfill the request"

print "error code:", e.code

if e.code == 404:

print "page not found!"

#do someting

elif e.code == 403:

print "access denied!"

else:

print "something happened! error code", e.code

print "return content:", e.read()

except urllib2.urlerror, err1:

print "failed to reach the server"

print "the reason:", e.reason

或者使用如下的代碼模版,也是用的最多的模版

urllib2子產品之異常處理

繼續閱讀

Windows下VS開發環境環境安裝工程項目設定關于Debug和Release的提示

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Windows下配置Apache的SSL服務

Mac｜Windows系統本地照片自動上傳到伺服器

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入