python正規表達式re常用方法總結

2023-08-07 03:32:01

1.比對

RE正規表達式在python爬取網頁中經常遇到，不同表達式可比對各種不同字元，常用使用方法如下：

(1) ‘.’可以比對任意單個字元(除換行符)

(2) ‘\’表示轉義字元

(3) ‘[a-zA-Z0-9]’能比對任意大小寫字母和數字

(4) ‘[^abc]’ 可以比對出abc之外的所有字元，‘^’表示除去字元

(5) 管道符号‘|’，表示有個特定的模式，如‘python|perl’隻比對python和perl

(6) (pattern)* 允許模式重複0次或多次

(7) (pattern)+ 允許模式重複1次或多次

(8) (pattern){m,n} 允許模式重複m到n次

(9) (.+) 比對一個或多個字元(貪婪比對)

例：i**like**you

(10) (.+?) 比對一個或多個字元(非貪婪)

(11)group(1,2,….)擷取給定子模式的比對項

(12)start(group)開始位置 end(group)結束位置 span(group)區間位置

(13)\d 比對一個數字，相當于 [0-9]

(14)\D 比對非數字,相當于 [^0-9]

(15)\s 比對任意空白字元，相當于 [ \t\n\r\f\v]

(16)\S 比對非空白字元，相當于 [^ \t\n\r\f\v]

(17)\w 比對數字、字母、下劃線中任意一個字元，相當于 [a-zA-Z0-9_]

(18)\W 比對非數字、字母、下劃線中的任意字元，相當于 [^a-zA-Z0-9_]

示例：

import re

string = '* i ** like ** you *'

stri = re.compile('\*(.+)\*')
stri2 = re.compile('\*(.+?)\*')

print (stri.findall(string))

print (stri2.findall(string))

stri3 = re.search('\*(.+?)\*\*(.+?)\*\*(.+?)\*',string)

print (stri3.groups())
print (stri3.group(,))
print ([stri3.start(),stri3.end()])

輸出

[' i ** like ** you ']
[' i ', ' like ', ' you ']
(' i ', ' like ', ' you ')
(' i ', ' you ')
[, ]

2.函數

compile、search、 match、 split、 findall、 sub、 escape

具體含義參見https://www.cnblogs.com/dyfblog/p/5880728.html

示例

s = '''first line
second line
third line'''

# 需要從開始處比對 是以比對不到 
print re.match('i\w+', s)
# output> None

# 沒有限制起始比對位置
print re.search('i\w+', s)
# output> <_sre.SRE_Match object at 0x0000000002C6A920>

print re.search('i\w+', s).group()
# output> irst

print (re.findall('\w+',s))
# output> ['first', 'line', 'second', 'line', 'third', 'line']

s = '''first 111 line
second 222 line
third 333 line'''

# 按照數字切分
print re.split('\d+', s)
# output> ['first ', ' line\nsecond ', ' line\nthird ', ' line']

# \.+ 比對不到 傳回包含自身的清單
print re.split('\.+', s, )
# output> ['first 111 line\nsecond 222 line\nthird 333 line']

# maxsplit 參數
print re.split('\d+', s, )
# output> ['first ', ' line\nsecond 222 line\nthird 333 line']

3.字元串方法

split 分割

strip傳回去除兩側空格的字元串

預設删除空白符包括‘\n’ ‘\t’ ‘\r ’ ‘ ’

lstrip删除開頭空白

rstrip删除末尾空白

join 添加與split方法相反

python正規表達式re常用方法總結

1.比對

2.函數

3.字元串方法

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入