1、web爬蟲，requests請求

【轉載自： https://www.jianshu.com/u/3fe4aab60ac4

】

requests請求，就是用python的requests子產品模拟浏覽器請求，傳回html源碼

模拟浏覽器請求有兩種，一種是不需要使用者登入或者驗證的請求，一種是需要使用者登入或者驗證的請求

一、不需要使用者登入或者驗證的請求

這種比較簡單，直接利用requests子產品發一個請求即可拿到html源碼

#!/usr/bin/env python
# -*- coding:utf8 -*-
import requests     #導入模拟浏覽器請求子產品

http =requests.get(url="http://www.iqiyi.com/")     #發送http請求
http.encoding = "utf-8"                             #http請求編碼
neir = http.text                                    #擷取http字元串代碼
print(neir)

得到html源碼

<!DOCTYPE html>
<html>
<head>
<title>抽屜新熱榜-聚合每日熱門、搞笑、有趣資訊</title>
        <meta charset="utf-8" />
        <meta name="keywords" content="抽屜新熱榜,資訊,段子,圖檔,公衆場合不宜,科技,新聞,節操,搞笑" />

        <meta name="description" content="
            抽屜新熱榜，彙聚每日搞笑段子、熱門圖檔、有趣新聞。它将微網誌、門戶、社群、bbs、社交網站等海量内容聚合在一起，通過使用者推薦生成最熱榜單。看抽屜新熱榜，每日熱門、有趣資訊盡收眼底。
            " />

        <meta name="robots" content="index,follow" />
        <meta name="GOOGLEBOT" content="index,follow" />
        <meta name="Author" content="搞笑" />
        <meta http-equiv="X-UA-Compatible" content="IE=EmulateIE8">
        <link type="image/x-icon" href="/images/chouti.ico" rel="icon"/>
        <link type="image/x-icon" href="/images/chouti.ico" rel="Shortcut Icon"/>
        <link type="image/x-icon" href="/images/chouti.ico" rel="bookmark"/>
    <link type="application/opensearchdescription+xml"
          href="opensearch.xml" title="抽屜新熱榜" rel="search" />

二、需要使用者登入或者驗證的請求

擷取這種頁面時，我們首先要了解整個登入過程，一般登入過程是，當使用者第一次通路時，會自動在浏覽器生成cookie檔案，當使用者輸入登入資訊後會攜帶着生成的cookie檔案，如果登入資訊正确會給這個cookie

授權，授權後以後通路需要登入的頁面時攜帶授權後cookie即可

1、首先通路一下首頁，然後檢視是否有自動生成cookie

#!/usr/bin/env python
# -*- coding:utf8 -*-
import requests     #導入模拟浏覽器請求子產品

### 1、在沒登入之前通路一下首頁，擷取cookie
i1 = requests.get(
    url="http://dig.chouti.com/",
    headers={'Referer': 'http://dig.chouti.com/'}
)
i1.encoding = "utf-8"                               #http請求編碼
i1_cookie = i1.cookies.get_dict()
print(i1_cookie)                                    #傳回擷取到的cookie
#傳回：{'JSESSIONID': 'aaaTztKP-KaGLbX-T6R0v', 'gpsd': 'c227f059746c839a28ab136060fe6ebe', 'route': 'f8b4f4a95eeeb2efcff5fd5e417b8319'}

可以看到生成了cookie，說明如果登陸資訊正确，背景會給這裡的cookie授權，以後通路需要登入的頁面攜帶授權後的cookie即可

2、讓程式自動去登入授權cookie

首先我們用浏覽器通路登入頁面，随便亂輸入一下登入密碼和賬号，擷取登入頁面url，和登入所需要的字段

攜帶cookie登入授權

#!/usr/bin/env python
# -*- coding:utf8 -*-
import requests     #導入模拟浏覽器請求子產品

### 1、在沒登入之前通路一下首頁，擷取cookie
i1 = requests.get(
    url="http://dig.chouti.com/",
    headers={'Referer':'http://dig.chouti.com/'}
)
i1.encoding = "utf-8"                               #http請求編碼
i1_cookie = i1.cookies.get_dict()
print(i1_cookie)                                    #傳回擷取到的cookie
#傳回：{'JSESSIONID': 'aaaTztKP-KaGLbX-T6R0v', 'gpsd': 'c227f059746c839a28ab136060fe6ebe', 'route': 'f8b4f4a95eeeb2efcff5fd5e417b8319'}

### 2、使用者登陸，攜帶上一次的cookie，背景對cookie中的随機字元進行授權
i2 = requests.post(
    url="http://dig.chouti.com/login",              #登入url
    data={                                          #登入字段
        'phone': "8615284816568",
        'password': "279819",
        'oneMonth': ""
    },
    headers={'Referer':'http://dig.chouti.com/'},
    cookies=i1_cookie                               #攜帶cookie
)
i2.encoding = "utf-8"
dluxxi = i2.text
print(dluxxi)                                       #檢視登入後伺服器的響應
#傳回：{"result":{"code":"9999", "message":"", "data":{"complateReg":"0","destJid":"cdu_50072007463"}}}  登入成功

3、登入成功後，說明背景已經給cookie授權，這樣我們通路需要登入的頁面時，攜帶這個cookie即可，比如擷取個人中心

#!/usr/bin/env python
# -*- coding:utf8 -*-
import requests     #導入模拟浏覽器請求子產品

### 1、在沒登入之前通路一下首頁，擷取cookie
i1 = requests.get(
    url="http://dig.chouti.com/",
    headers={'Referer':'http://dig.chouti.com/'}
)
i1.encoding = "utf-8"                               #http請求編碼
i1_cookie = i1.cookies.get_dict()
print(i1_cookie)                                    #傳回擷取到的cookie
#傳回：{'JSESSIONID': 'aaaTztKP-KaGLbX-T6R0v', 'gpsd': 'c227f059746c839a28ab136060fe6ebe', 'route': 'f8b4f4a95eeeb2efcff5fd5e417b8319'}

### 2、使用者登陸，攜帶上一次的cookie，背景對cookie中的随機字元進行授權
i2 = requests.post(
    url="http://dig.chouti.com/login",              #登入url
    data={                                          #登入字段
        'phone': "8615284816568",
        'password': "279819",
        'oneMonth': ""
    },
    headers={'Referer':'http://dig.chouti.com/'},
    cookies=i1_cookie                               #攜帶cookie
)
i2.encoding = "utf-8"
dluxxi = i2.text
print(dluxxi)                                       #檢視登入後伺服器的響應
#傳回：{"result":{"code":"9999", "message":"", "data":{"complateReg":"0","destJid":"cdu_50072007463"}}}  登入成功

### 3、通路需要登入才能檢視的頁面，攜帶着授權後的cookie通路
shouquan_cookie = i1_cookie
i3 = requests.get(
    url="http://dig.chouti.com/user/link/saved/1",
    headers={'Referer':'http://dig.chouti.com/'},
    cookies=shouquan_cookie                        #攜帶着授權後的cookie通路
)
i3.encoding = "utf-8"
print(i3.text)                                     #檢視需要登入才能檢視的頁面

擷取需要登入頁面的html源碼成功

全部代碼

get()方法，發送get請求

encoding屬性，設定請求編碼

cookies.get_dict()擷取cookies

post()發送post請求

text擷取伺服器響應資訊

#!/usr/bin/env python
# -*- coding:utf8 -*-
import requests     #導入模拟浏覽器請求子產品

### 1、在沒登入之前通路一下首頁，擷取cookie
i1 = requests.get(
    url="http://dig.chouti.com/",
    headers={'Referer':'http://dig.chouti.com/'}
)
i1.encoding = "utf-8"                               #http請求編碼
i1_cookie = i1.cookies.get_dict()
print(i1_cookie)                                    #傳回擷取到的cookie
#傳回：{'JSESSIONID': 'aaaTztKP-KaGLbX-T6R0v', 'gpsd': 'c227f059746c839a28ab136060fe6ebe', 'route': 'f8b4f4a95eeeb2efcff5fd5e417b8319'}

### 2、使用者登陸，攜帶上一次的cookie，背景對cookie中的随機字元進行授權
i2 = requests.post(
    url="http://dig.chouti.com/login",              #登入url
    data={                                          #登入字段
        'phone': "8615284816568",
        'password': "279819",
        'oneMonth': ""
    },
    headers={'Referer':'http://dig.chouti.com/'},
    cookies=i1_cookie                               #攜帶cookie
)
i2.encoding = "utf-8"
dluxxi = i2.text
print(dluxxi)                                       #檢視登入後伺服器的響應
#傳回：{"result":{"code":"9999", "message":"", "data":{"complateReg":"0","destJid":"cdu_50072007463"}}}  登入成功

### 3、通路需要登入才能檢視的頁面，攜帶着授權後的cookie通路
shouquan_cookie = i1_cookie
i3 = requests.get(
    url="http://dig.chouti.com/user/link/saved/1",
    headers={'Referer':'http://dig.chouti.com/'},
    cookies=shouquan_cookie                        #攜帶着授權後的cookie通路
)
i3.encoding = "utf-8"
print(i3.text)                                     #檢視需要登入才能檢視的頁面

注意：如果登入需要驗證碼，那就需要做圖像處理，根據驗證碼圖檔，識别出驗證碼，将驗證碼寫入登入字段

1、web爬蟲，requests請求

繼續閱讀

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入