Selenium擷取網頁資料

2023-04-23 19:23:03

# coding:utf-8

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
import os


def get_url_html(url):
    # 擷取執行驅動路徑, 驅動放在項目根目錄下, 驅動下載下傳位址:https://chromedriver.storage.googleapis.com/index.html
    driver_path = os.path.dirname(os.path.abspath(__file__)) + os.sep + "chromedriver"

    # 添加選項
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--no-sandbox")
    
    # 啟動webdriver
    session = webdriver.Chrome(executable_path=driver_path, chrome_options=chrome_options)
    
    # 通路url
    session.get(url)
    
    # 通路url後睡3秒,視情況而定
    time.sleep(3)
    
    # 擷取網頁源代碼
    content = session.page_source
    
    # 退出webdriver, 否則會在背景留下chromedriver驅動程序
    session.quit()
    return content

Selenium擷取網頁資料

繼續閱讀

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

筆試面試題目：滑動視窗(二)

27. Remove Element(清單)題目代碼

資料結構與算法（27）——排序（二）

sort()函數到底是怎樣進行數字排序的

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入

hdu7108哈希