selenium爬取維基百科資訊

Selenium調用Firefox浏覽器進行資料采集。

操作環境：JetBrains PyCharm 2018.2.2 x64編譯器，Python 3.6

軟體和包的安裝

安裝Firefox浏覽器（使用預設路徑安裝）；
安裝下載下傳geckodriver(Firefox的官方Webdriver)，下載下傳位址為：https://github.com/mozilla/geckodriver/releases ，将geckodriver.exe檔案放在python的根目錄下面，否則會出錯；
在pycharm編譯器中安裝selenium包；
測試是否安裝成功；

from selenium import webdriver

browser = webdriver.Firefox()

如果火狐浏覽器自動打開，則說明配置成功。

爬取某一固定網頁的維基百科資料

首先，使用web開發者工具檢視網頁的結構.

selenium爬取維基百科資訊selenium爬取維基百科資訊

然後，使用webdriver打開浏覽器，使用get（）函數打開網址，精心路徑定位，爬取資料。

from selenium import webdriver
import time
import pandas as pd

browser = webdriver.Firefox()

url="https://www.wikiwand.com/zh-hans/"
word = "國王與我"
browser.get(url+word)
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# 定位到 li 标簽
title = browser.find_elements_by_xpath("//*[@id='overview']/p")
introduction = browser.find_elements_by_xpath("/html/body/div[2]/div[1]/article/div/section[1]/p")
print(title, '\n', introduction, '\n', plot, '\n')

參考教程:

[1]: https://yq.aliyun.com/articles/26033

selenium爬取維基百科資訊selenium爬取維基百科資訊

目錄

selenium爬取維基百科資訊

軟體和包的安裝

爬取某一固定網頁的維基百科資料

繼續閱讀

TestLink導出用例轉換工具(XML2Excel)

利用Selenium內建TestLink做自動化測試

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

sort()函數到底是怎樣進行數字排序的

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入