Python爬蟲：scrapy-splash的請求頭和代理參數設定

2021-11-22 23:50:00

3中方式任選一種即可

1、lua中腳本設定代理和請求頭：

function main(splash, args)
    -- 設定代理              
    splash:on_request(function(request)
        request:set_proxy{
            host = "27.0.0.1",
            port = 8000,
        }
        end)
    
    -- 設定請求頭
    splash:set_user_agent("Mozilla/5.0")
    
    -- 自定義請求頭
   splash:set_custom_headers({
    ["Accept"] = "application/json, text/plain, */*"
    })
            
    splash:go("https://www.baidu.com/")
    return splash:html()

2、scrapy中設定代理

def start_requests(self):
    for url in self.start_urls:
        yield SplashRequest(url,
            endpoint='execute',
            args={'wait': 5,
                  'lua_source': source，
                  'proxy': 'http://proxy_ip:proxy_port'
                  }

scrapy中設定請求頭一樣的在headers中設定

3、中間件中設定代理

class ProxyMiddleware(object):
      def process_request(self, request, spider):
          request.meta['splash']['args']['proxy'] = proxyServer
          request.headers["Proxy-Authorization"] = proxyAuth

參考：

Python爬蟲：scrapy-splash的請求頭和代理參數設定

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入