新浪微網誌下載下傳完整相冊

2023-07-02 03:16:39

打開新浪微網誌，登入，打開yz的相冊

打開chrome的開發者工具，在Sources中+New snippet

timeout=prompt("Set timeout (Second):");
count=0
current=location.href;
if(timeout>0)
setTimeout('reload()',1000*timeout);
else
location.replace(current);
function reload(){
setTimeout('reload()',1000*timeout);
count++;
console.log('每（'+timeout+'）秒自動重新整理,重新整理次數：'+count);
window.scrollTo(0,document.body.scrollHeight);
}

右鍵Run，等結束，在Elements中Copy Element

body

儲存為yz.txt

然後執行腳本

import os
from lxml import etree
import requests
import sys
import datetime

html = etree.parse('yz.txt', etree.HTMLParser(encoding='utf-8'))
print(type(html))


ust = html.xpath('//ul/@group_id')
print(type(ust))
curr_time = datetime.datetime.now()

for iul in ust:
    print(iul)
    print(type(iul))
    path = str(iul)
    if not "年" in path:
        year = str(curr_time.year) + "年"
        path = year+path
    isExists = os.path.exists(path)
    if not isExists:
        os.makedirs(path)
    else:
        print(path)
    output = '//ul[@group_id="'
    output += str(iul)
    output += '"]//img/@src'
    print(output)
    lst = html.xpath(output)
    print(type(lst))
    for ili in lst:
        print(ili)
        link = str(ili)
        if not link.startswith('https:'):
            link = 'https:' + link
        link = link.replace("/thumb300/", "/large/")
        print(link)
        response = requests.get(link,verify=False)
        index = link.rfind('/')
        fn = link[index + 1:]
        if path.startswith('2010') or  path.startswith('2009'):
            if not ".jpg" in fn:
                fn += ".jpg"
        file_name = path+'/'+fn
        with open(file_name, "wb") as f:
            f.write(response.content)

現在隻有儲存圖檔功能，儲存視訊以後加吧。

一些年代久遠的圖檔竟然沒有字尾

試了這個可以，不過下不全，隻能下200頁

Python爬蟲——批量爬取微網誌圖檔（不使用cookie）

新浪微網誌下載下傳完整相冊

繼續閱讀

v2ex的簡單爬蟲

Python漫畫爬蟲開源 66漫畫 AJAX，包含資料庫連接配接，圖檔下載下傳處理

requests子產品進行人人網模拟登陸

Python image.show() 出錯FSPathMakeRef(/Applications/Preview.app) failed with error -43

2023爬蟲學習筆記 -- 多線程操作

M團店鋪評價采集不到問題問題展示：解決方案：

Python爬蟲學習（1）

Python爬蟲學習進階

Python爬蟲（入門+進階）學習筆記 1-2 初識Python爬蟲

Python進階爬蟲——Class1：認識爬蟲

python爬蟲學習筆記-1

python學習之urllib使用小結

NOIp模拟題之肮髒的牧師（桶排序）

一篇文章教你如何在一個月内學會爬取大規模資料

Pyhton爬蟲實戰 - 抓取BOSS直聘職位描述和資料清洗Pyhton爬蟲實戰 - 抓取BOSS直聘職位描述和資料清洗

sort()函數到底是怎樣進行數字排序的