Python入門級爬取百度百科詞條

2023-08-07 03:23:24

爬取 Angelababy詞條曆史版本中的value值。

嘗試爬取網頁

# _*_ coding:utf-8 _*_
import urllib
import urllib2
import re
page = 1
url = 'https://baike.baidu.com/historylist/Angelababy/1509275#page'+str(page)
try:
    request = urllib2.Request(url)
    response = urllib2.urlopen(request)
    print response.read()
except urllib2.URLError, e:
    if hasattr(e,"code"):
        print e.code
    if hasattr(e,"reason"):
        print e.reason

運作結果：

Python入門級爬取百度百科詞條

可以看到已經爬取了此網頁所有的内容。現在需要實作的就是爬取想要的value值了。

爬取目标内容

Python入門級爬取百度百科詞條

可以看到要爬取的内容，格式全部一樣都是圖中所示，代碼如下：

<tr>
    <td class="checkBox">
        <input type="checkbox" value="128140635">
    </td>
      .
      .
      .
</tr>

是以我們做以下正則比對：

pattern = re.compile('<tr>.*?<td class="checkBox">.*?<input.*?value="(.*?)">.*?</td>.*?</tr>',re.S)

全部代碼如下：

# _*_ coding:utf-8 _*_
import urllib
import urllib2
import re
page = 1
url = 'https://baike.baidu.com/historylist/Angelababy/1509275#page'+str(page)
try:
    request = urllib2.Request(url)
    response = urllib2.urlopen(request)
    content = response.read().decode('utf-8')
    pattern = re.compile('<tr>.*?<td class="checkBox">.*?<input.*?value="(.*?)">.*?</td>.*?</tr>',re.S)
    items = re.findall(pattern,content)
    for item in items:
        print(item)
except urllib2.URLError,e:
    if hasattr(e,"code"):
        print e.code
    if hasattr(e,"reason"):
        print e.reason

爬取結果如下：

Python入門級爬取百度百科詞條

學習連結

崔慶才的個人部落格

Python入門級爬取百度百科詞條

嘗試爬取網頁

爬取目标内容

學習連結

繼續閱讀

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

sort()函數到底是怎樣進行數字排序的

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入