天天看點

Scrapy添加代理驗證

middlewares.py

from w3lib.http import basic_auth_header

class CustomProxyMiddleware(object):
    def process_request(self, request, spider):
        request.meta['proxy'] = "https://<PROXY_IP_OR_URL>:<PROXY_PORT>"
        request.headers['Proxy-Authorization'] = basic_auth_header(
            '<PROXY_USERNAME>', '<PROXY_PASSWORD>')      

settings.py

DOWNLOADER_MIDDLEWARES = {
    '<PROJECT_NAME>.middlewares.CustomProxyMiddleware': 350,
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 400,
}      

問題

1、如果代理驗證設定不對,狀态碼會傳回407

407 Proxy Authentication Required

剛開始采用以下格式配置,發現部分請求可以發送,不過會有一個重試,部分請求直接報錯

request.meta['proxy'] = "https://<PROXY_USERNAME>:<PROXY_PASSWORD>@<PROXY_IP_OR_URL>:<PROXY_PORT>"      

正确的設定是在請求頭中設定

Proxy-Authorization

參考
  1. Using a custom proxy in a Scrapy spider
  2. Proxy-Authorization