文章目錄

前言
爬蟲百度圖檔時，總是有時好有時壞解決方案
- 出現問題：
- 更改headers：
- - 找到屬于自己的headers
  - - - 我們都在成長的路上，請相信自己！sincerely，end.

前言

爬蟲百度圖檔時，總是時好時壞（爬不上的居多），已解決，如有錯誤，請糾正，萬分感謝！

爬蟲百度圖檔時，總是有時好有時壞解決方案

出現問題：

根據調試資訊，我觀察到了傳回結果如下：

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="utf-8">
    <title>百度安全驗證</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <meta name="apple-mobile-web-app-capable" content="yes">
    <meta name="apple-mobile-web-app-status-bar-style" content="black">
    <meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0">
    <meta name="format-detection" content="telephone=no, email=no">
    <link rel="shortcut icon" href="https://www.baidu.com/favicon.ico" type="image/x-icon">
    <link rel="icon" sizes="any" mask href="https://www.baidu.com/img/baidu.svg">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
    <meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
    <link rel="stylesheet" href="https://ppui-static-wap.cdn.bcebos.com/static/touch/css/api/mkdjump_0635445.css" />
</head>
<body>
    <div class="timeout hide">
        <div class="timeout-img"></div>
        <div class="timeout-title">網絡不給力，請稍後重試</div>
        <button type="button" class="timeout-button">傳回首頁</button>
    </div>
    <div class="timeout-feedback hide">
        <div class="timeout-feedback-icon"></div>
        <p class="timeout-feedback-title">問題回報</p>
    </div>

<script src="https://wappass.baidu.com/static/machine/js/api/mkd.js"></script>
<script src="https://ppui-static-wap.cdn.bcebos.com/static/touch/js/mkdjump_1448d18.js"></script>
</body>
</html>

原來是進入了百度驗證！

更改headers：

之前的headers 如下：

後來增加如下資訊：

headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'sec-ch-ua': '"Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}

成功解決!

别劃走别劃走，每個人的因為浏覽器版本不同等原因，不一定代碼一樣！

找到屬于自己的headers

（以百度圖檔為例）

打開爬蟲網頁 —— 百度圖檔
按F12打開開發者工具，按F5重新整理
點選Network，找到Doc，點選Name下的資訊，找到Headers

爬蟲百度圖檔進入百度驗證怎麼辦？前言爬蟲百度圖檔時，總是有時好有時壞解決方案
找到Request Headers的 Accept 、 Accept-Encoding 、 Accept-Language 、 Cache-Control 、 Connection 、 sec-ch-ua 、 User-Agent 字段，将其複制下來

爬蟲百度圖檔進入百度驗證怎麼辦？前言爬蟲百度圖檔時，總是有時好有時壞解決方案

爬蟲百度圖檔進入百度驗證怎麼辦？前言爬蟲百度圖檔時，總是有時好有時壞解決方案
将複制的字段構造成字典形式

舉例：

Accept-Encoding: gzip, deflate, br

更改為 ‘Accept-Encoding’: ‘gzip, deflate, br’

python中的部分代碼（僅供參考，版本不一定一緻，具體還是要按上述步驟找到自己的headers和url）：

headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'sec-ch-ua': '"Google Chrome";v="89", "Chromium";v="89", ";Not A Brand";v="99"',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}

#name是需要搜尋圖檔的名字
url = 'https://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&fm=detail&lm=-1&hd=&latest=&copyright=&st=-1&sf=2&fmq=1616167633329_R_D&fm=detail&pv=&ic=0&nc=1&z=&se=&showtab=0&fb=0&width=&height=&+name+'&pn='+str(i*30)

res = requests.get(url,headers=headers)

解決！

爬蟲百度圖檔進入百度驗證怎麼辦？前言爬蟲百度圖檔時，總是有時好有時壞解決方案

文章目錄

前言

爬蟲百度圖檔時，總是有時好有時壞解決方案

出現問題：

更改headers：

找到屬于自己的headers

我們都在成長的路上，請相信自己！sincerely，end.

繼續閱讀

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

sort()函數到底是怎樣進行數字排序的

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入

爬蟲百度圖檔進入百度驗證怎麼辦？前言爬蟲百度圖檔時，總是有時好有時壞 解決方案

文章目錄

前言

爬蟲百度圖檔時，總是有時好有時壞 解決方案

出現問題：

更改headers：

找到屬于自己的headers

我們都在成長的路上，請相信自己！sincerely，end.

繼續閱讀

爬蟲百度圖檔進入百度驗證怎麼辦？前言爬蟲百度圖檔時，總是有時好有時壞解決方案

爬蟲百度圖檔時，總是有時好有時壞解決方案