python 爬取<span></span>中間标簽的内容

2023-03-05 15:46:42

# python 爬取<span></span>中間标簽的内容
html = """
<div>
    <span class='red'>item1</span>
    <div>
        <span id='s1'>item2</span>
    </div>
</div>
"""
# 方法一:使用 scrapy 的Selector
from scrapy.selector import Selector

# scrapy 的選擇器支援 css和xpath選擇。下面是css選擇器。如果你了解前端JQuery的知識，
# 會發現
t1 = Selector(text=html).css('span.red::text').extract()  # class 用點
print(t1)  # ['item1']
t2 = Selector(text=html).css('span::text').extract()  # 所有span 的内容
print(t2)  # ['item1','item2']
t3 = Selector(text=html).css('span#s1::text').extract()  # id 用#
print(t3)  # ['item2']
t4 = Selector(text=html).css('div>div>span::text').extract()  # div 裡邊 span
print(t4)  # ['item2']

# 方法二:使用bs4
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
s1 = soup.find('span', attrs={"class": "red"})  # 查找span class為red的字元串
s2 = soup.find_all("span")  # 查找所有的span
result = [span.get_text() for span in s2]
print(result)  # ['item1', 'item2']

1、正規表達式擷取<td></td>标簽之間的内容

如：<td class="label">行政相對人名稱:</td> 擷取行政相對人名稱:

Name= re.findall('<td class="label">(.*?)</tb>',text)[0]

python 爬取<span></span>中間标簽的内容

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入

python 爬取&lt;span&gt;&lt;/span&gt;中間标簽的内容

繼續閱讀

python 爬取<span></span>中間标簽的内容