layui擷取input資訊_python爬蟲—用selenium爬取京東商品資訊

2023-04-12 15:43:57

python爬蟲——用selenium爬取京東商品資訊

1.先附上效果圖(我偷懶隻爬了4頁)

layui擷取input資訊_python爬蟲—用selenium爬取京東商品資訊

2.京東的網址https://www.jd.com/

3.我這裡是不加載圖檔，加快爬取速度，也可以用Headless無彈窗模式

4.先找到搜尋框并用selenium模拟點選(這裡發現京東不需要登入就能看到商品資訊)

layui擷取input資訊_python爬蟲—用selenium爬取京東商品資訊

5.進入了第一頁，先寫好翻頁的函數，需要滑動到底部才能加載後30個商品，總共有60個商品

layui擷取input資訊_python爬蟲—用selenium爬取京東商品資訊

def next_page(page_number):    try:        # 滑動到底部        browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")        time.sleep(random.randint(1, 3))#設定随機延遲        button = wait.until(            EC.element_to_be_clickable((By.CSS_SELECTOR, '#J_bottomPage > span.p-num > a.pn-next > em'))        )#翻頁按鈕        button.click()# 翻頁動作        wait.until(            EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#J_goodsList > ul > li:nth-child(30)"))        )#等到30個商品都加載出來        # 滑動到底部，加載出後三十個貨物資訊        browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")        wait.until(            EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#J_goodsList > ul > li:nth-child(60)"))        )#等到60個商品都加載出來             wait.until(            EC.text_to_be_present_in_element((By.CSS_SELECTOR, "#J_bottomPage > span.p-num > a.curr"), str(page_number))        )# 判斷翻頁成功,高亮的按鈕數字與設定的頁碼一樣        html = browser.page_source#擷取網頁資訊        prase_html(html)#調用提取資料的函數    except TimeoutError:        return next_page(page_number)

layui擷取input資訊_python爬蟲—用selenium爬取京東商品資訊

python爬蟲——用selenium爬取京東商品資訊

繼續閱讀

python使用selenium擷取登陸後的界面源碼_Python爬蟲爬取B站蔡徐坤打籃球視訊（含工程源碼）

.net背景怎麼提取html中的多個圖檔的絕對位址_python爬蟲多線程實戰：爬取美桌1080p桌面圖檔...

layui擷取input資訊_Linux 下Input系統應用程式設計實戰

python爬取ul下的li是空的_python微網誌爬蟲——使用selenium爬取關鍵詞下超話内容

python爬取ul下的li是空的_python爬取豆瓣首頁熱門欄目詳細流程

python 擷取li的内容_Python 爬蟲解析庫的使用解析庫的使用--Beautiful Soup:

python 擷取li的内容_python爬蟲xpath篇-以爬取京東商品資訊為例附思路和詳細代碼注釋...