一日一技：大幅度提高requests的通路速度

2021-07-25 16:54:13

我做了一個垃圾資訊過濾的 HTTP 接口。現在有一千萬條消息需要經過這個接口進行垃圾檢測。

一開始我的代碼是這樣的：

import requests
messages = ['第一條', '第二條', '第三條']
for message in messages:
    resp = requests.post(url, json={'msg': message}).json()
if resp['trash']:
        print('是垃圾消息')

我們寫一段代碼來看看運作速度：

通路一百次百度，竟然需要 20 秒。那我有一千萬條資訊，這個時間太長了。

有沒有什麼加速的辦法呢？除了我們之前文章講到的多線程、aiohttp 或者幹脆用 Scrapy 外，還可以讓 requests 保持連接配接進而減少頻繁進行 TCP 三次握手的時間消耗。

那麼要如何讓 requests 保持連接配接呢？實際上非常簡單，使用

Session

對象即可。

修改後的代碼：

import requests
import time

start = time.time()
session = requests.Session()
for _ in range(100):
    resp = session.get('https://baidu.com').content.decode()
end = time.time()
print(f'通路一百次網頁，耗時：{end - start}')

運作效果如下圖所示：

性能得到了顯著提升。通路 100 頁隻需要 5 秒鐘。

在官方文檔[1]中，requests 也說到了

Session

對象能夠保持連接配接：



The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3’s connection pooling. So if you’re making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).

”

Excellent news — thanks to urllib3, keep-alive is 100% automatic within a session! Any requests that you make within a session will automatically reuse the appropriate connection!

參考資料

[1]

一日一技：大幅度提高requests的通路速度

參考資料

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入