python3 爬取豆瓣妹子

2023-08-07 03:22:04

__author__ = 'NFD'
# -*- coding:UTF-8 -*-

import urllib.request
import os
import re
import time
from bs4 import BeautifulSoup

webheader = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}

img_index = 1

def processDouban(page_url):
    list_url = page_url

    #print(all_data)

if __name__ == '__main__':
    pageIndex = 2525
    index = 1
    #os.mkdir('doubanimages')
    while pageIndex > 0:
        pageUrl = 'http://www.dbmeinv.com/?pager_offset='+ str(pageIndex)
        try:
            list_req = urllib.request.Request(url=pageUrl, headers=webheader)
            list_Page=urllib.request.urlopen(list_req)
            all_data = list_Page.read().decode('utf-8')
            current_soup = BeautifulSoup(all_data, 'html.parser')
            current_list = current_soup.find_all('img',{'class':'height_min'})
            for list in current_list:
            #print(list['href'])
                time.sleep(1)
                print(time.strftime("%H:%M:%S     ") +'處理圖檔: '+list['src'])
                try:
                    file=open('doubanimages//' + str(index) + '.jpg', "wb")
                    req = urllib.request.Request(list['src'], headers=webheader)

                    webPage=urllib.request.urlopen(req)
                    data = webPage.read()
                    file.write(data)
                except:
                    print('打開圖檔失敗')
                    file.flush()
                    file.close()
                    index += 1
                        #img_index += 1
                else:
                    file.flush()
                    file.close()
                    index += 1
        except:
            pageIndex-=1
        else:
            pageIndex-=1

python3 爬取豆瓣妹子

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入