天天看點

【原創】python urllib2/httplib 源碼

#看過源碼的感覺就是,urllib2重構了很多層代碼。。很多我們用不到。。。 # # 在微網誌上閑逛,然後看到知道餘弦大神說“知道創于研發技能表v3.0”馬上就要面世,是以去官網找了找,沒找到。。 是以還是看了看 《 知道創于研發技能表v2.2 》 其中有一行,我發現我沒看到過,可見之前看的不仔細。。。

  • Python
    • urllib2
      • 打開請求響應調試
        • 編輯urllib2的do_open裡的h.set_debuglevel
        • 改為h.set_debuglevel(1),這時可以清晰看到請求響應資料,包括https

我覺得吧,直接改源碼是辦法,但直接改函數内部代碼也不是辦法,是以就看了一下urllib2源碼 其實可以考慮把AbstractHTTPHandler.__init__(self, debuglevel=0)預設值為1。 原因是防坑。。。

    如果urllib2引入後沒有使用過,會建立全局變量_opener     global _opener     如果存在_opener,則用_opener.open(url, data, timeout)

    _opener 可以由build_opener()建構,傳回類 OpenerDirector的執行個體.

         default_classes預設有 [ProxyHandler, UnknownHandler, HTTPHandler, HTTPDefaultErrorHandler, HTTPRedirectHandler, FTPHandler, FileHandler, HTTPErrorProcessor]幾個類         如果 hasattr(httplib, 'HTTPS')則追加 HTTPSHandler類至 default_classes          HTTPHandler和 HTTPSHandler繼承自 AbstractHTTPHandler,可以設定調試級别

       将 default_classes 所有的Handlers,依次執行個體,調用 OpenerDirector.add_handler(handler) 添加到opener中。          OpenerDirector.add_handler(handler)             dir()擷取 handler所有的屬性 ,忽略 ["redirect_request", "do_open", "proxy_open"]             根據_下劃線分割為( 請求類型 /  操作) Ex: http_error, http_request, http_response, http_open             依次添加到 OpenerDirector.handle_open(字典格式,key值有http, https, ftp, file等,value是一個清單,存放多個handler)             疑問,不知道為什麼要用bisect.insort添加至清單。作用僅僅是排序....

    urllib2.urlopen就是調用_opener. open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT)         建立 Request(fullurl, data)對象 (fullurl可以是url,或者是Request執行個體)          Request.get_type() 是根據url傳回http/https等請求類型,指派給 protocol                  # 處理request,擷取response         擷取 OpenerDirector.handle_open 中http的類清單(handlers),依次執行 handler.http_request(request)擷取 request.(其實清單裡面隻有一個handler)                   OpenerDirector._open(req, data=None)             OpenerDirector._call_chain(chain, kind, meth_name, *args) chain就是指 OpenerDirector.handle_open字典, kind指請求類型: http/https等, method就是要執行的操作             嘗試擷取 OpenerDirector._call_chain(...): OpenerDirector.handle_open.get("default", []),循環執行handles依次執行函數 default_open, 并傳回 response              再嘗試 OpenerDirector._call_chain(...): request.get_type()本身的請求類型,函數名為:" http_open" 。 傳回 response              最後嘗試 OpenerDirector._call_chain(...):  請求類型為 "unknown",函數名為" unknown_open"。傳回 response

        # 處理response          擷取 OpenerDirector.handle_open中 http的類清單(handlers),依次執行 handler.http_response(request) 擷取response .(其實清單裡面隻有一個handler)         End: open(). return response """ >>>  import urllib2 as url >>> url >>> dir(url) ['AbstractBasicAuthHandler', 'AbstractDigestAuthHandler', 'AbstractHTTPHandler', 'BaseHandler', 'CacheFTPHandler', 'FTPHandler', 'FileHandler', 'HTTPBasicAuthHandler', 'HTTPCookieProcessor', 'HTTPDefaultErrorHandler', 'HTTPDigestAuthHandler', 'HTTPError', 'HTTPErrorProcessor', 'HTTPHandler', 'HTTPPasswordMgr', 'HTTPPasswordMgrWithDefaultRealm', 'HTTPRedirectHandler', 'HTTPSHandler', 'OpenerDirector', 'ProxyBasicAuthHandler', 'ProxyDigestAuthHandler', 'ProxyHandler', 'Request', 'StringIO', 'URLError', 'UnknownHandler', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__version__', '_cut_port_re', '_opener', '_parse_proxy', '_safe_gethostbyname', 'addinfourl', 'base64', 'bisect', 'build_opener', 'ftpwrapper', 'getproxies', 'hashlib', 'httplib', 'install_opener', 'localhost', 'mimetools', 'os', 'parse_http_list', 'parse_keqv_list', 'posixpath', 'proxy_bypass', 'quote', 'random', 'randombytes', 're', 'request_host', 'socket', 'splitattr', 'splithost', 'splitpasswd', 'splitport', 'splittag', 'splittype', 'splituser', 'splitvalue', 'sys', 'time', 'unquote', 'unwrap', 'url2pathname', 'urlopen', 'urlparse'] >>> url._opener >>> url.build_opener >>>  opener = url.build_opener() >>> opener >>> dir(opener) ['__doc__', '__init__', '__module__', '_call_chain', '_open', 'add_handler', 'addheaders', 'close', 'error', 'handle_error', 'handle_open', 'handlers', 'open', 'process_request', 'process_response'] >>> opener.handle_open {'unknown': [], 'http': [], 'https': [], 'file': [], 'ftp': []} >>> opener.handle_open['http'] [] >>> dir(opener.handle_open['http'][0]) ['__doc__', '__init__', '__lt__', '__module__', '_debuglevel', 'add_parent', 'close', 'do_open', 'do_request_', 'handler_order', 'http_open', 'http_request', 'parent', 'set_http_debuglevel'] >>> >>> opener.handle_open['http'][0].set_http_debuglevel > >>> opener.handle_open['https'][0].set_http_debuglevel > >>>  opener.handle_open['https'][0].set_http_debuglevel(1) >>>  opener.handle_open['http'][0].set_http_debuglevel(1) >>> >>> # >>> HTTPHandler 一定會存在,HTTPSHandler在httplib支援https時才會建立. 都繼承自:AbstractHTTPHandler # >>>   關鍵代碼是 HTTPHandler.do_open # >>>   httplib.HTTPConnection.set_debuglevel(level) # >>>   httplib.HTTPConnection.request(method, url, body=None, headers={}) # >>>       httplib.HTTPConnection._send_request(method, url, body, headers) # >>>       httplib.HTTPConnection.endheaders(message_body=None) # >>>       httplib.HTTPConnection._send_output(message_body=None) # >>>       httplib.HTTPConnection.send(data) # 其實就是body # >>>           httplib.HTTPConnection.connect() # 建立socket # >>>               socket.create_connection(address, timeout= , source_address=None) # >>>               self.sock = socket.create_connection((self.host,self.port), self.timeout, self.source_address) # >>>           print send body # print 發送的資料包 # >>>           self.sock.sendall(data) # >>>   httplib.HTTPConnection.getresponse() # >>>       HTTPResponse(sock, debuglevel=0, strict=0, method=None, buffering=False) # >>>       response = self.response_class(*args, **kwds) response_class就是: HTTPResponse # >>>       HTTPResponse.begin() # >>>           HTTPResponse._read_status() # >>>               print "reply:", repr(line) # print 接受到的資料 'HTTP/1.1 200 OK\r\n' # >>>               return (version, status, reason) # Ex: (HTTP/1.0, 200, OK) # >>>           如果status == 101: # >>>               While True: # >>>                   print "header:", skip # print 接收到的頭資訊 # >>>           msg = HTTPMessage(self.fp, 0) # >>>           for hdr in HTTPMessage.headers: # >>>               print "header:", hdr, # >>>           HTTPMessage._check_close() #檢查頭資訊:connection是否關閉 # >>>       httplib.HTTPConnection.close() # >>>           self.sock.close() # >>>           HTTPResponse.close()

# >>> opener.handle_open['file'][0].set_http_debuglevel # Traceback (most recent call last): #   File "", line 1, in # AttributeError: FileHandler instance has no attribute 'set_http_debuglevel' # >>> opener.handle_open['ftp'][0].set_http_debuglevel # Traceback (most recent call last): #   File "", line 1, in # AttributeError: FTPHandler instance has no attribute 'set_http_debuglevel' # >>> opener.handle_open['ftp'][0].set_http_debuglevel(1) # Traceback (most recent call last): #   File "", line 1, in # AttributeError: FTPHandler instance has no attribute 'set_http_debuglevel'

>>>  url.install_opener(opener) >>>  url.urlopen("http://www.baidu.com") send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.baidu.com\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n' reply: 'HTTP/1.1 200 OK\r\n' header: Date: Wed, 12 Aug 2015 05:40:47 GMT header: Content-Type: text/html; charset=utf-8 header: Transfer-Encoding: chunked header: Connection: Close header: Vary: Accept-Encoding header: Set-Cookie: BAIDUID=8E25E0CC918EE71717BE5AA3D3472F62:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com header: Set-Cookie: BIDUPSID=8E25E0CC918EE71717BE5AA3D3472F62; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com header: Set-Cookie: PSTM=1439358047; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com header: Set-Cookie: BDSVRTM=0; path=/ header: Set-Cookie: BD_HOME=0; path=/ header: Set-Cookie: H_PS_PSSID=16229_16415_1431_13520_12825_14429_12868_16520_16799_16331_16662_16427_16514_15243_11854_13932_16720; path=/; domain=.baidu.com header: P3P: CP=" OTI DSP COR IVA OUR IND COM " header: Cache-Control: private header: Cxy_all: baidu+ac7f221e1b28f124d4a8cbfc52852314 header: Expires: Wed, 12 Aug 2015 05:40:46 GMT header: X-Powered-By: HPHP header: Server: BWS/1.1 header: X-UA-Compatible: IE=Edge,chrome=1 header: BDPAGETYPE: 1 header: BDQID: 0xdc7ecbce0008572d header: BDUSERID: 0 > >>>