python網絡爬蟲之selenium爬取執行個體

2023-08-05 22:25:25

python網絡爬蟲之selenium

今天終于進入到了selenium的學習，使用click()方法對百度首頁進行了測試，目的是爬取全部百度熱搜。

除了擷取第一頁展示的6個标題外，還要通過 click()方法模拟點選 “換一換” 按鈕擷取剩下3頁的（一共4頁）

注釋滿滿的代碼：

from selenium import webdriver
# 目标網址
url = 'https://www.baidu.com/'
# 驅動火狐浏覽器
driver = webdriver.Firefox(executable_path='D:\develop\geckodriver.exe')
# 加載網址
driver.get(url)
# 隐式等待10s
driver.implicitly_wait(10)
number = []


def repeat():
    k = 'count'
    # 因為百度首頁将熱搜序号1 2 3 分别以1 3 5的序号排在list裡，是以需要跨步為2進行循環
    for i in range(1, 7, 2):
        element_odd = driver.find_element_by_css_selector('li.hotsearch-item:nth-child(' + str(i) + ')')
        print(element_odd.text)
    # 與上面同理，這樣在控制台輸出的才是 1 2 3 4 5 6...這樣的順序 否則就是135246這樣
    for j in range(2, 8, 2):
        element_even = driver.find_element_by_css_selector('li.hotsearch-item:nth-child(' + str(j) + ')')
        print(element_even.text)
        if j == 6:
        	# 每次循環完一個頁面就往清單中加個字元串count
            number.append(k)
            if len(number) <= 5:
                number.append(k)
                # 通過css選擇器擷取 “換一換” 按鈕，再通過click()模拟點選
                driver.find_element_by_css_selector('#hotsearch-refresh-btn > i:nth-child(1)').click()
                # 疊代，方法的出口是 當建立的清單的長度小于等于5
                repeat()

# 運作repeat()方法
repeat()

運作結果：

python網絡爬蟲之selenium爬取執行個體

如何擷取css選擇器的位置

python網絡爬蟲之selenium爬取執行個體

python網絡爬蟲之selenium

繼續閱讀

TestLink導出用例轉換工具(XML2Excel)

利用Selenium內建TestLink做自動化測試

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

sort()函數到底是怎樣進行數字排序的

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入