天天看點

python selenium 爬取百度翻譯單詞音标

python selenium 小爬蟲

主要流程 讀取excel檔案中的單詞——利用selenium 去百度翻譯中擷取單詞對應的音标——寫入cvs檔案

selenium 安裝 環境配置略過

谷歌浏覽器打開百度翻譯并等待baidu_translate_input加載完成

browser = webdriver.Chrome()

url = “https://fanyi.baidu.com/?aldtype=85#en/zh/”

browser.get(url)

WebDriverWait(browser, 1000).until(EC.presence_of_all_elements_located((By.ID, ‘baidu_translate_input’)))

打開excel檔案,并擷取單詞sheet的 行數

excelfile = xlrd.open_workbook(r’F:\studytest\word.xlsx’)

sheet = excelfile.sheet_by_name(“單詞”)

cnt = sheet.nrows

csv檔案寫入标題

with open(r’F:\studytest\result.csv’, ‘a’, encoding=‘utf-8’,newline=’’) as csvfile:

writer = csv.writer(csvfile)

writer.writerow((“單詞”, “音标”))

定位baidu_translate_input并輸入單詞

browser.find_element_by_id(‘baidu_translate_input’).send_keys(mystr)

點選翻譯

browser.find_element_by_id(‘translate-button’).click()

擷取音标

phonetic = browser.find_element_by_class_name(‘dictionary-spell’).text # 音标

全部代碼

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @version  : Python 3.7.3
# @Time     : 2019/7/24 20:13

import xlrd
import time
import csv

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

browser = webdriver.Chrome()
url = "https://fanyi.baidu.com/?aldtype=85#en/zh/"

browser.get(url)
WebDriverWait(browser, 1000).until(EC.presence_of_all_elements_located((By.ID, 'baidu_translate_input')))

excelfile = xlrd.open_workbook(r'F:\studytest\word.xlsx')
sheet = excelfile.sheet_by_name("單詞")
cnt = sheet.nrows

with open(r'F:\studytest\result.csv', 'a', encoding='utf-8',newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(("單詞", "音标"))

for i in range(cnt):
    mystr = sheet.cell(i, 0).value
    browser.find_element_by_id('baidu_translate_input').send_keys(mystr)
    browser.find_element_by_id('translate-button').click()
    WebDriverWait(browser, 1000).until(EC.presence_of_all_elements_located((By.CLASS_NAME, 'trans-left')))
    try:
        words = browser.find_element_by_class_name('strong').text  # 單詞
        phonetic = browser.find_element_by_class_name('dictionary-spell').text  # 音标
        print("%s   %s" % (words, phonetic))

        data = (words, phonetic)
        with open(r'F:\studytest\result.csv', 'a', encoding='utf-8', newline='') as csvfile:
            writer = csv.writer(csvfile)
            writer.writerow(data)
    except:
        pass

    time.sleep(1)
    browser.find_element_by_id('baidu_translate_input').clear()

browser.close()
browser.quit()
print("完成,請到相應檔案夾檢視!")

           
python selenium 爬取百度翻譯單詞音标