python爬蟲實戰2-擷取當當網近30日好評榜前500本書籍-使用BeautifulSoup

2023-08-05 23:00:26

所有的一切都跟上一篇文章是一樣的，不同的是不用寫長長的正規表達式啦，上一期傳送門https://blog.csdn.net/u010376229/article/details/114042780

這次我們需要用到BeautifulSoup，隻需簡單的學習一下就剋不用寫正規表達式啦，而且更加清楚

def get_books_info_of_current_page(page):
    html = get_html("http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-" + str(page))
    soup = BeautifulSoup(html, 'lxml')
    lis = soup.find("ul", class_="bang_list").find_all("li")  # 找到<ul class="bang_list">下所有的li元素
    get_book_info_and_write_to_txt(lis)

def get_book_info_and_write_to_txt(lis):
    for li in lis:
        book_info = {
            "range": li.find('div', class_="list_num").string,
            "img": li.find("div", class_="pic").a.img.get("src"),
            "title": li.find("div", class_="name").a.get("title"),
            "recommend": li.find("div", class_="star").find("span", class_="tuijian").string,
            "author": li.find("div", class_="publisher_info").a.get("title") if li.find("div", class_="publisher_info").a else "無",
            "price": li.find("div", class_="price").span.string
        }
        write_item_to_file(book_info)

不過用這種方法用的時間比較久，取500條資料用時14s左右，用正則隻需要10s左右

python爬蟲實戰2-擷取當當網近30日好評榜前500本書籍-使用BeautifulSoup

繼續閱讀

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

sort()函數到底是怎樣進行數字排序的

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入