Python爬蟲Requests使用

2023-06-20 17:04:35

1.基本使用

官方文檔

https://cn.python-requests.org/zh_CN/latest/

快速上手

https://cn.python-requests.org/zh_CN/latest/user/quickstart.html

安裝

pip install requests -i https://pypi.douban.com/simple

response的屬性以及類型

類型：models.Response

response.text：擷取網站源碼
response.encoding：通路或定制編碼方式
response.url：擷取請求的url
response.content：響應的位元組類型
response.status_code：響應的狀态碼
response.headers：響應的頭資訊

import requests

url = 'http://www.baidu.com'

response = requests.get(url=url)

# 一個類型和六個屬性
# Response類型
print(type(response))

# 設定響應的編碼格式
response.encoding = 'utf-8'

# 以字元串的形式來傳回了網頁的源碼
print(response.text)

# 傳回一個url位址
print(response.url)

# 傳回的是二進制的資料
print(response.content)

# 傳回響應的狀态碼
print(response.status_code)

# 傳回的是響應頭
print(response.headers)

2.get請求

requests.get()

import requests
url = 'https://www.baidu.com/s'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36'
}
data = {
    'wd':'北京'
}

# url  請求資源路徑
# params 參數
response = requests.get(url=url,params=data,headers=headers)

content = response.text

print(content)

總結：

參數使用params傳遞

參數無需urlencode編碼

不需要請求對象的定制

請求資源路徑中的？可以加也可以不加

3.post請求

requests.post()

import requests
url = 'https://fanyi.baidu.com/sug'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36'
}

data = {
    'kw': 'eye'
}

# url 請求位址
# data 請求參數
response = requests.post(url=url,data=data,headers=headers)

content =response.text
import json
obj = json.loads(content,encoding='utf-8')
print(obj)

總結：

post請求是不需要編解碼

post請求的參數是data

不需要請求對象的定制

get和post差別？

get請求的參數名字是params post請求的參數的名字是data
請求資源路徑後面可以不加?
不需要手動編解碼
不需要做請求對象的定制

4.代理

proxy定制，在請求中設定proxies參數，參數類型是一個字典類型

import requests
url = 'http://www.baidu.com/s?'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36',
}

data = {
    'wd':'ip'
}

proxy = {
    'http':'211.65.197.93:80'
}

response = requests.get(url = url,params=data,headers = headers,proxies = proxy)
response.encoding = 'utf-8'
content = response.text

with open('daili.html','w',encoding='utf-8')as fp:
    fp.write(content)

Python爬蟲Requests使用

1.基本使用

2.get請求

3.post請求

4.代理

繼續閱讀

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

sort()函數到底是怎樣進行數字排序的

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入

李現绯聞女友硬剛網友？過往情史被扒，喜歡的都是“金剛芭比”