【爬蟲】python爬取微網誌熱搜榜top50

2023-05-17 05:36:18

1.說明

微網誌熱搜很有用，不用登入可以通路網頁版https://s.weibo.com/top/summary，我們可以通過requests爬取下來并且整理。請注意，我們要有良好的覺悟，不要随意爬取增加微網誌伺服器壓力，以下代碼僅供學習

2.代碼

from lxml import etree
import requests

def get_weibo_top():
    url = "https://s.weibo.com/top/summary?cate=realtimehot"
    request = requests.get(url)
    html = etree.HTML(request.text)
    nodes = html.xpath("//div[@class='data']/table/tbody/tr")
    all_hot_list = []
    for tr_node in nodes[1:]:
        rank_top = tr_node.xpath('./td[1]/text()')[0]
        if not rank_top or not rank_top.isdigit():
            continue
        keyword = tr_node.xpath('./td[2]/a/text()')[0]
        search_nums = tr_node.xpath('./td[2]/span/text()')[0]
        search_url = "https://s.weibo.com" + tr_node.xpath('./td[2]/a/@href')[0]
        hot_object = {
            "rank_top": rank_top,
            "keyword": keyword,
            "search_nums": search_nums,
            "search_url": search_url
        }
        all_hot_list.append(hot_object)
    return all_hot_list

if __name__ == '__main__':
    print(get_weibo_top())

【爬蟲】python爬取微網誌熱搜榜top50

1.說明

2.代碼

繼續閱讀

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

sort()函數到底是怎樣進行數字排序的

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入