爬蟲筆記之——selenium安裝與使用（1）

一、安裝環境
- 1、下載下傳Chrome浏覽器驅動
- - （1）檢視Chrome版本
  - （2）下載下傳相比對的Chrome驅動程式
位址：https://chromedriver.storage.googleapis.com/index.html
- 2、學習使用selenium
- - （1）安裝selenium，用pip install selenium -i 源鏡像
  - （2）開始程式設計
- 3、頁面元素定位
- - （1）通過ID值定位
  - （2）通過CLASS值定位
  - （3）通過NAME定位
  - （4）通過TAG_NAME定位
  - （5）通過XPATH文法定位
  - （6）通過CSS文法定位
  - （7）通過文本定位--精确定位
  - （8）通過部分文本定位--模糊定位
- 4、操作表單元素及其他操作
- - （1）輸入内容、清除内容、滑鼠單擊
  - （2）行為鍊
  - （3）動作鍊
  - （4）點選操作（繼續學習行為鍊）
注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。
- 5、行為鍊中的等待（Explicit Waits）

認識selenium
- Selenium是一個用于Web應用程式測試的工具，Selenium測試直接運作在浏覽器中，就像真正的使用者在操作一樣。支援的浏覽器包括IE（7—11），Firefox，Safari，Google，Chrome，Opera，Edge等。

一、安裝環境

1、下載下傳Chrome浏覽器驅動

（1）檢視Chrome版本

爬蟲筆記之——selenium安裝與使用（1）一、安裝環境位址：https://chromedriver.storage.googleapis.com/index.html注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。

（2）下載下傳相比對的Chrome驅動程式

位址：https://chromedriver.storage.googleapis.com/index.html
打開之後，找上面查到的最新位址，如圖

爬蟲筆記之——selenium安裝與使用（1）一、安裝環境位址：https://chromedriver.storage.googleapis.com/index.html注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。
然後，把下載下傳的壓縮包解壓，得到chromedriver.exe檔案，複制到Python安裝目錄下，輕按兩下安裝。
打開cmd指令提示符，輸入Chromedriver，之後顯示如下圖樣，就代表安裝成功了。

爬蟲筆記之——selenium安裝與使用（1）一、安裝環境位址：https://chromedriver.storage.googleapis.com/index.html注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。
以上是Chrome浏覽器的驅動安裝，其他浏覽器可以對應下載下傳。

2、學習使用selenium

（1）安裝selenium，用pip install selenium -i 源鏡像

（2）開始程式設計

下面是VSCode裡面錄入的代碼，其中定義浏覽器的時候，會自動彈出各種浏覽器模式

爬蟲筆記之——selenium安裝與使用（1）一、安裝環境位址：https://chromedriver.storage.googleapis.com/index.html注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。
運作此時的程式，會彈出浏覽器界面3秒鐘。此時，并沒有打開任何的頁面

爬蟲筆記之——selenium安裝與使用（1）一、安裝環境位址：https://chromedriver.storage.googleapis.com/index.html注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。
繼續編寫代碼，并點選運作，打開一個百度頁面，如下圖。

爬蟲筆記之——selenium安裝與使用（1）一、安裝環境位址：https://chromedriver.storage.googleapis.com/index.html注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。
再擷取百度首頁的源代碼，如下圖。

爬蟲筆記之——selenium安裝與使用（1）一、安裝環境位址：https://chromedriver.storage.googleapis.com/index.html注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。

上面完整代碼如下，最後關閉。

# -*- coding:utf-8 -*-
# pip install selenium -i 源鏡像


from selenium import webdriver
import time 

# 打開谷歌浏覽器
driver = webdriver.Chrome()

# 使用谷歌浏覽器打開百度
url = 'https://www.baidu.com'
driver.get(url)

# 設定打開的浏覽器——視窗最大化
driver.maximize_window()

# 擷取源代碼(注意後面沒括号)
response = driver.page_source
print(response)


time.sleep(5)

driver.close()

3、頁面元素定位

（1）通過ID值定位

driver.find_element(By.ID,“kw”)

爬蟲筆記之——selenium安裝與使用（1）一、安裝環境位址：https://chromedriver.storage.googleapis.com/index.html注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。

（2）通過CLASS值定位

driver.find_element(By.CLASS_NAME,“s_ipt”)

（3）通過NAME定位

driver.find_element(By.NAME,“wd”)

（4）通過TAG_NAME定位

driver.find_element(By.TAG_NAME,“div”)
說明：HTML本質就是由不同的tag（标簽）組成，而每個tag都是指同一個類，是以tag定位效率低，一般不建議使用

（5）通過XPATH文法定位

driver.find_element(By.XPATH,“//*[@id=“su”]”).click()
通過複制得到“//*[@id=“su”]”，然後粘貼到上面代碼中

爬蟲筆記之——selenium安裝與使用（1）一、安裝環境位址：https://chromedriver.storage.googleapis.com/index.html注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。

代碼如下：

# -*- coding:utf-8 -*-

import time
from selenium import webdriver
from selenium.webdriver.common.by import By

# 打開谷歌浏覽器
driver = webdriver.Chrome()

# 使用谷歌浏覽器打開百度
driver.get('https://www.baidu.com')

# 通過CLASS值定位,此處的class值是“s_ipt”
driver.find_element(By.CLASS_NAME,'s_ipt').send_keys("大家好")

# 通過XPATH文法定位“百度一下”按鈕，并點選
driver.find_element(By.XPATH,'//*[@id="su"]').click()

time.sleep(5)

運作後，自動實作搜尋，效果如下。

爬蟲筆記之——selenium安裝與使用（1）一、安裝環境位址：https://chromedriver.storage.googleapis.com/index.html注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。

（6）通過CSS文法定位

driver.find_element(By.CSS_SELECTOR,“#su”).click()
上面代碼也能實作這樣的效果

（7）通過文本定位–精确定位

driver.find_element(By.LINKE_TEXT,“在希望的田野上”)

（8）通過部分文本定位–模糊定位

driver.find_element(By.PRATIAL_LINK_TEXT,“希望”)

4、操作表單元素及其他操作

（1）輸入内容、清除内容、滑鼠單擊

# 輸入内容
send_keys('python')
# 清除輸入框内容
clear()
# 滑鼠單擊
click()

（2）行為鍊

在用selenium操作頁面時，有時要分為很多步驟，那麼這個時候可以用滑鼠行為鍊類ActionChains來完成。

# -*- coding:utf-8 -*-

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains

# 打開谷歌浏覽器
driver = webdriver.Chrome()

# 使用谷歌浏覽器打開百度
driver.get('https://www.baidu.com')


# 定位搜尋框
inputtag = driver.find_element(By.ID,"kw")

# 百度一下按鈕
submittag = driver.find_element(By.ID,"su")

# 建立行為鍊
actions = ActionChains(driver)

# 給搜尋框發送資料
actions.move_to_element(inputtag)
actions.send_keys_to_element(inputtag,'python')

# 選中送出按鈕并送出
actions.move_to_element(submittag)
actions.click(submittag)

# 統一執行
actions.perform()
time.sleep(5)

（3）動作鍊

ActionChains方法清單

click(on_element=None)——單擊滑鼠左鍵
click_and_hold(on_element=None) ——點選滑鼠左鍵，不松開
context_click(on_element=None) ——點選滑鼠右鍵
double_click(on_element=None)——輕按兩下滑鼠左鍵
drag_and_drop(source, tanget)——拖拽到某個元素然後松開
key_down(value, element=None)——按下某個鍵盤上的鍵
key_up(value, element=None)——松開某個鍵
move_to_element(to_element)——滑鼠移動到某個元素
perform() ——執行鍊中的所有動作
release(on_element=None) ——在某個元素位界松開滑鼠左鍵
send_keys(*keys_to_send) ——發送某個鍵到目前焦點的元素
send_keys_to_element(element,*keys_to_send)——發送某個鍵到指定元素
drag_and_drop_by_offset(element, x,y)——把元素拖動到指定的坐标

舉例說明滑鼠移動。

# -*- coding:utf-8 -*-

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains

driver = webdriver.Chrome()
driver.get('http://sahitest.com/demo/mouseover.htm')

# 定位到顯示文本框,還是用xpath方法
display = driver.find_element(By.XPATH,'//input[@value="Write on hover"]')
# 定位到隐藏文本框
hide = driver.find_element(By.XPATH,'//input[@value="Blank on hover"]')

action = ActionChains(driver)
time.sleep(3)

action.move_to_element(display).perform()
time.sleep(3)

action.move_to_element(hide).perform()
time.sleep(3)

效果如圖。

爬蟲筆記之——selenium安裝與使用（1）一、安裝環境位址：https://chromedriver.storage.googleapis.com/index.html注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。

舉例說明滑鼠拖拽的幾種情況。

# -*- coding:utf-8 -*-

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains

driver = webdriver.Chrome()
driver.get("http://sahitest.com/demo/dragDropMooTools.htm")



dragger = driver.find_element(By.XPATH, '//div[@id="dragger"]')

item1 = driver.find_element(By.XPATH,'//html/body/div[2]')
item2 = driver.find_element(By.XPATH,'//html/body/div[3]')
item3 = driver.find_element(By.XPATH,'//html/body/div[4]')
item4 = driver.find_element(By.XPATH,'//html/body/div[5]')


action = ActionChains(driver)

# 下面是直接拖拽的動作
action.drag_and_drop(dragger, item1).perform()
time.sleep(3)
# 下面是先點選目标不松開，再定位item2位置松開
action.click_and_hold(dragger).release(item2).perform()
time.sleep(3)
# 下面是先點選目标不松開，然後滑動到item3位置松開
action.click_and_hold(dragger).move_to_element(item3).release().perform()
time.sleep(3)
action.drag_and_drop(dragger, item4).perform()
time.sleep(3)
driver.quit()

結果就是每過1秒，拖拽一個藍色的方框，到目的地。如圖。

爬蟲筆記之——selenium安裝與使用（1）一、安裝環境位址：https://chromedriver.storage.googleapis.com/index.html注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。

（4）點選操作（繼續學習行為鍊）

示例網站：http://sahitest.com/demo/clicks.htm

# -*- coding:utf-8 -*-

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains

driver = webdriver.Chrome()
driver.get("http://sahitest.com/demo/clicks.htm")

# 單擊按鈕
click_one = driver.find_element(By.XPATH, '//input[@value="click me"]')

# 輕按兩下按鈕
click_dbl = driver.find_element(By.XPATH, '//input[@value="dbl click me"]')

# 右擊按鈕
click_rgt = driver.find_element(By.XPATH, '//input[@value="right click me"]')

# 定義下面的一個行為鍊，完成單擊，輕按兩下，右擊
ActionChains(driver).click(click_one).double_click(click_dbl).context_click(click_rgt).perform()
time.sleep(5)

效果如圖。

爬蟲筆記之——selenium安裝與使用（1）一、安裝環境位址：https://chromedriver.storage.googleapis.com/index.html注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。
注意：滑鼠滑動、拖拽是動作鍊，一連串的點選是行為鍊。

5、行為鍊中的等待（Explicit Waits）

示例代碼。

# -*- coding:utf-8 -*-
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://www.baidu.com")

# 定義一個遞歸函數
def search():
    try:
        input = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.ID, "kw")))
        input.send_keys("大家好")
        time.sleep(10)
    except TimeoutException:
        # 重複進入函數嘗試完成
        return search()        

if __name__ =='__main__':
    search()