python爬蟲——使用selenium+chrome options爬取站長素材頁面源碼

2023-08-05 23:25:12

一.站長素材

1.需要爬取的内容

2.代碼

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
# webdriver 路徑
path = r'E:\chromedriver_win32\chromedriver.exe'
# 建立無界面浏覽器
chrome_options = Options()
chrome_options.add_argument("--headless")
browser = webdriver.Chrome(executable_path=path, options=chrome_options)

# 站長素材高清圖檔-科技圖檔url
url = 'http://sc.chinaz.com/tupian/kejitupian.html'
browser.get(url)
time.sleep(3)
# 第一次儲存html代碼
with open('kejitupian1.html', 'w', encoding='utf8') as fp:
    fp.write(browser.page_source)
# 滾動，執行js
js = 'window.scrollTo(0,document.body.scrollHeight)'
browser.execute_script(js)
time.sleep(3)
# 第二次儲存html代碼
with open('kejitupian2.html', 'w', encoding='utf8') as fp:
    fp.write(browser.page_source)

# 關閉浏覽器
browser.quit()

3.結果對比

第一次抓取：

python爬蟲——使用selenium+chrome options爬取站長素材頁面源碼

第二次抓取：

python爬蟲——使用selenium+chrome options爬取站長素材頁面源碼

python爬蟲——使用selenium+chrome options爬取站長素材頁面源碼

一.站長素材

1.需要爬取的内容

2.代碼

3.結果對比

繼續閱讀

Python爬蟲之網站超清圖檔爬取(2021.3.29)

Python入門級爬取百度百科詞條

16Python爬蟲---Scrapy常用指令

Python爬蟲基本庫的使用第二章基本庫的使用

Python爬蟲（四）lxml、xpath安裝子產品導入查找節點屬性查找 @ 符号使用謂語選取未知節點擷取文本和屬性

爬蟲學習之04-request子產品擷取糗事百科一張熱圖

python3下用selenium庫和chrome的headless模式實作網頁抓取（注釋中有用phantomJS的小段代碼）

【Python爬蟲案例學習19】多程序爬取某圖檔網站

python爬蟲實戰之爬取成語大全

【爬取百度首頁】-将整個html源碼儲存-headers使用一、網頁分析二、代碼實作與步驟三、結果分析

爬取百度貼吧

爬取貓眼電影--靜态網頁反爬與多線程/多程序爬取網頁解析爬取代碼多線程與多程序

requests子產品進行人人網模拟登陸

2023爬蟲學習筆記 -- 多線程操作

Python爬蟲學習（1）

Boss直聘Python爬蟲實戰