python爬取網易雲歌曲名字

2023-03-08 02:49:33

之前都是按部就班的往下寫，終于嘗試在爬蟲裡寫函數了

網址：https://music.163.com/#/artist?id=9272，爬取這50首歌的名字。分析網址：網易雲首頁是https://music.163.com，是以可以知道對于不同的歌手都有一個對應的id，像這樣就需要将參數傳入url中，另外浏覽器辨別headers是不變的header（寫爬蟲最關鍵的便是與反爬蟲之間的鬥争，因而我們要養成良好的習慣，學會構造頭部）。

import requests
import re

url='https://music.163.com/artist'

def get_html(url):

    headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.63 Safari/537.36'
               }
    params = {
        'id': '9272'
             }

    #通過Request()方法構造一個請求對象
    response = requests.get(url, headers=headers,params=params)
    html=response.text
    print(type(html))   #str
    return html

def parse_html(html):

    #<a href="/song?id=287035" target="_blank" rel="external nofollow" >遇見</a>
    pattern=re.compile('<a href="/song\?id=\d+" target="_blank" rel="external nofollow" >(.*?)</a>')
    names=re.findall(pattern,html)
    print(len(names))
    for name in names:
         print(name)

if __name__ == '__main__':
    html=get_html(url)
    f = open("out.txt","w",encoding='utf-8') #添加encoding='utf-8'可以解決'gbk' codec can't encode character '\xa9'
    f.write(html)      #将html寫入txt檔案
    parse_html(html)

注意谷歌浏覽器喜歡在同一标簽頁積累網頁，是以打開的網頁源代碼可能是之前網頁的，是以最好将爬取的網頁建立一個标簽頁（這個問題在這次爬蟲困擾了自己很久）。

python爬取網易雲歌曲名字

python爬取網易雲歌曲名字

繼續閱讀

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

sort()函數到底是怎樣進行數字排序的

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入