3中方式任選一種即可
1、lua中腳本設定代理和請求頭:
function main(splash, args)
-- 設定代理
splash:on_request(function(request)
request:set_proxy{
host = "27.0.0.1",
port = 8000,
}
end)
-- 設定請求頭
splash:set_user_agent("Mozilla/5.0")
-- 自定義請求頭
splash:set_custom_headers({
["Accept"] = "application/json, text/plain, */*"
})
splash:go("https://www.baidu.com/")
return splash:html()
2、scrapy中設定代理
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url,
endpoint='execute',
args={'wait': 5,
'lua_source': source,
'proxy': 'http://proxy_ip:proxy_port'
}
scrapy中設定請求頭一樣的在headers中設定
3、中間件中設定代理
class ProxyMiddleware(object):
def process_request(self, request, spider):
request.meta['splash']['args']['proxy'] = proxyServer
request.headers["Proxy-Authorization"] = proxyAuth
參考: