python3爬取豆瓣電影資訊(前500部)

2023-06-11 11:34:10

import requests
from bs4 import BeautifulSoup
import operator
from lxml import etree
import json

class Spider(object):
    def __init__(self):
        self.headers={
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'
        }


    def getHtml(self,url):
        res = requests.get(url,headers=self.headers)
        return res.content.decode("utf-8")

    def handleInfo(self):
        html = self.getHtml("https://movie.douban.com/j/search_subjects?type=movie&tag=%E8%B1%86%E7%93%A3%E9%AB%98%E5%88%86&sort=rank&page_limit=1000&page_start=0")
        dic = json.loads(html)
        movies = dic["subjects"]
        #擷取導演和主演資訊

        print("正在爬取...")
        for movie in movies:
             movieDetail =json.loads(self.getHtml("https://movie.douban.com/j/subject_abstract?subject_id=%s" % (movie["id"])))["subject"]
             movie["directors"] = movieDetail["directors"]
             movie["actors"] = movieDetail["actors"]
             movie["rate"] = float(movie["rate"])

        sorted_movies = sorted(movies, key=operator.itemgetter('rate'), reverse=True)  # True 是倒叙  預設是False
        for i in range(10):
            with open("moviesInfoDetail.txt", "a+", encoding="utf-8") as f:
                f.write(str(sorted_movies[i])+'\r\n')
        print("爬取完畢，結果儲存在moviesInfoDetail.txt中")
if __name__=="__main__":
    spider = Spider()
    spider.handleInfo()

參考：http://www.facesjoy.cn/article/2019/10/20/9.html

python3爬取豆瓣電影資訊(前500部)

繼續閱讀

腳本管理器項目

Python漫畫爬蟲開源 66漫畫 AJAX，包含資料庫連接配接，圖檔下載下傳處理

requests子產品進行人人網模拟登陸

Python image.show() 出錯FSPathMakeRef(/Applications/Preview.app) failed with error -43

2023爬蟲學習筆記 -- 多線程操作

M團店鋪評價采集不到問題問題展示：解決方案：

Python爬蟲學習（1）

Python爬蟲學習進階

Python爬蟲（入門+進階）學習筆記 1-2 初識Python爬蟲

Python進階爬蟲——Class1：認識爬蟲

python爬蟲學習筆記-1

python學習之urllib使用小結

NOIp模拟題之肮髒的牧師（桶排序）

一篇文章教你如何在一個月内學會爬取大規模資料

Pyhton爬蟲實戰 - 抓取BOSS直聘職位描述和資料清洗Pyhton爬蟲實戰 - 抓取BOSS直聘職位描述和資料清洗

sort()函數到底是怎樣進行數字排序的