多協程爬取中大微網誌内容（以及轉發數，點贊數，評論數）

2023-06-25 13:56:39

這個是在之前的微網誌爬取（Python）–中大微網誌前100條微網誌内容以及評論轉發點贊數目爬取

的并發版本

代碼

import requests
from gevent import monkey
import gevent

monkey.patch_all(select=False)
from pyquery import PyQuery as pq

headers = {
    'Host': 'm.weibo.cn',
    'Referer': 'https://m.weibo.cn/u/1892723783?uid=1892723783&luicode=10000011&lfid=1076031892723783&featurecode=20000320',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest',
    'X-DevTools-Emulate-Network-Conditions-Client-Id': 'A20EA5B172E6DC82709D213A40AD0E8F'
}


def get_page(page):
    url = 'https://m.weibo.cn/api/container/getIndex?uid=1892723783&luicode=10000011&lfid=1076031892723783&featurecode=20000320&type=uid&value=1892723783&containerid=1076031892723783&page=%d' % page
    try:
        res = requests.get(url, headers=headers)
        if res.status_code == :
            return res.json()
    except requests.ConnectionError as e:
        print("Error", e.args)


def parse_page(json):
    if json:
        items = json.get('data').get('cards')
        for item in items:
            item = item.get('mblog')
            weibo = {}
            weibo['text'] = pq(item.get('text')).text()
            weibo['attitudes'] = item.get('attitudes_count')
            weibo['comments'] = item.get('comments_count')
            weibo['reposts'] = item.get('reposts_count')
            yield weibo


def oper(page):
    global data
    json = get_page(page)
    results = parse_page(json)
    count = 
    for res in results:
        data[page *  + count] = '\n'.join(
            [res['text'], '【評論數: ' + str(res['comments']) + ' 轉發數: ' + str(res['reposts']) + ' 點贊數: ' + str(
                res['attitudes']) + '】\n\n'])
        count += 


if __name__ == '__main__':
    data = {}
    gevent.joinall([gevent.spawn(oper, page) for page in range(, )])
    with open('weibo.txt', 'w', encoding='utf-8') as f:
        f.write(''.join(data.values()))

多協程爬取中大微網誌内容（以及轉發數，點贊數，評論數）

代碼

繼續閱讀

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

sort()函數到底是怎樣進行數字排序的

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入