【Python基礎】05、Python文

一、檔案系統和檔案

1、檔案系統和檔案

檔案系統是OS用于明确磁盤或分區上的檔案的方法和資料結構——即在磁盤上組織檔案的方法

計算機檔案(或稱檔案、電腦檔案、檔案)，是存儲在某種長期儲存裝置或臨時儲存設備中的一段資料流，并且歸屬于計算機檔案系統管理之下

概括來講：

檔案是計算機中由OS管理的具有名字的存儲區域

在Linux系統上，檔案被看做是位元組序列

2、linux檔案系統元件的體系結構

3、Python打開檔案

Python内置函數open()用于打開檔案和建立檔案對象

文法格式：

open(name[,mode[,bufsize]])

open方法可以接收三個參數：檔案名、模式和緩沖區參數

open函數傳回的是一個檔案對象

mode：指定檔案的打開模式

bufsize：定義輸出緩存

0表示無輸出緩存

1表示使用緩沖

負數表示使用系統預設設定

正數表示使用近似指定大小的緩沖

4、檔案的打開模式

簡單模式：

r: 隻讀以讀的方式打開，定位到檔案開頭

open(‘/var/log/message.log’,’r’)

w: 寫入以寫的方式打開，不能讀，定位到檔案開頭，會清空檔案内的資料

a: 附加以寫的方式打開，定位到檔案末尾

在模式後使用“+”表示同時支援輸入、輸出操作

如r+、w+和a+

在模式後附加“b”表示以二進制方式打開

如rb、wb+

In [4]: file.
file.close       file.isatty      file.read        file.tell
file.closed      file.mode        file.readinto    file.truncate
file.encoding    file.mro         file.readline    file.write
file.errors      file.name        file.readlines   file.writelines
file.fileno      file.newlines    file.seek        file.xreadlines
file.flush       file.next        file.softspace   


In [6]: f1=open('/etc/passwd','r')

In [7]: f1
Out[7]: <open file '/etc/passwd', mode 'r' at 0x21824b0>

In [8]: print f1
<open file '/etc/passwd', mode 'r' at 0x21824b0>

In [9]: type(f1)
Out[9]: file



In [10]: f1.next
Out[10]: <method-wrapper 'next' of file object at 0x21824b0>

In [11]: f1.next()   #檔案也是可疊代對象，
Out[11]: 'root:x:0:0:root:/root:/bin/bash\n'

In [12]: f1.next()
Out[12]: 'bin:x:1:1:bin:/bin:/sbin/nologin\n'


In [22]: f1.close()      #關閉檔案，

In [23]: f1.next()      #檔案被關閉後，不能再讀取資料
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-4a9d57471e88> in <module>()
----> 1 f1.next()

ValueError: I/O operation on closed file

複制

In [50]: f1=open('/etc/passwd','r')

In [51]: f1.
f1.close       f1.isatty      f1.readinto    f1.truncate
f1.closed      f1.mode        f1.readline    f1.write
f1.encoding    f1.name        f1.readlines   f1.writelines
f1.errors      f1.newlines    f1.seek        f1.xreadlines
f1.fileno      f1.next        f1.softspace   
f1.flush       f1.read        f1.tell        

In [51]: f1.readl
f1.readline   f1.readlines  

In [51]: f1.readline
Out[51]: <function readline>

In [52]: f1.readline()
Out[52]: 'root:x:0:0:root:/root:/bin/bash\n'

In [53]: f1.readlines()
Out[53]: 
['bin:x:1:1:bin:/bin:/sbin/nologin\n',
 'daemon:x:2:2:daemon:/sbin:/sbin/nologin\n',
 'adm:x:3:4:adm:/var/adm:/sbin/nologin\n',
 'lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin\n',
 'sync:x:5:0:sync:/sbin:/bin/sync\n',
 'shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown\n',
 'halt:x:7:0:halt:/sbin:/sbin/halt\n',
 'mail:x:8:12:mail:/var/spool/mail:/sbin/nologin\n',
 'uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin\n',
 'operator:x:11:0:operator:/root:/sbin/nologin\n',
 'games:x:12:100:games:/usr/games:/sbin/nologin\n',
 'gopher:x:13:30:gopher:/var/gopher:/sbin/nologin\n',
 'ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin\n',
 'nobody:x:99:99:Nobody:/:/sbin/nologin\n',
 'dbus:x:81:81:System message bus:/:/sbin/nologin\n',
 'vcsa:x:69:69:virtual console memory owner:/dev:/sbin/nologin\n',
 'saslauth:x:499:76:"Saslauthd user":/var/empty/saslauth:/sbin/nologin\n',
 'postfix:x:89:89::/var/spool/postfix:/sbin/nologin\n',
 'haldaemon:x:68:68:HAL daemon:/:/sbin/nologin\n',
 'sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin\n']

In [54]:

複制

讀取檔案的指針：

In [57]: f1.tell()    #檢視目前指針在檔案中位置，傳回的事，已讀檔案位元組數
Out[57]: 949

複制

In [69]: help(f1.seek)


Help on built-in function seek:

seek(...)
    seek(offset[, whence]) -> None.  Move to new file position.  
  #offse 偏移量，預設是0  whence從什麼位置偏移,0表示從檔案頭開始偏移，1表示從目前位置開始偏移，2表示從檔案尾開始偏移，預設是0     
    
    Argument offset is a byte count.  Optional argument whence defaults to
    0 (offset from start of file, offset should be >= 0); other values are 1
    (move relative to current position, positive or negative), and 2 (move
    relative to end of file, usually negative, although many platforms allow
    seeking beyond the end of a file).  If the file is opened in text mode,
    only offsets returned by tell() are legal.  Use of other offsets causes
    undefined behavior.
    Note that not all file objects are seekable.
(END) 

In [72]: f1.seek(0)          #沒有指定whence預設是0從檔案首部偏移0

In [73]: f1.tell()
Out[73]: 0

複制

In [29]: help(file.read)

Help on method_descriptor:

read(...)
    read([size]) -> read at most size bytes, returned as a string.
    
    If the size argument is negative or omitted, read until EOF is reached.
    Notice that when in non-blocking mode, less data than what was requested
    may be returned, even if no size parameter was given.
In [82]: f1.read(10)        #傳回最多10個位元組的字元串
Out[82]: 'root:x:0:0'

In [83]: f1.tell()
Out[83]: 10


In [88]: f1.name
Out[88]: '/etc/passwd'

複制

5、檔案方法

檔案對象維護它所打開檔案的狀态，其tell()方法傳回目前在所打開的檔案中的位置

read()方法用于将檔案讀進單一字元串，也可以為其指定要讀取的位元組數

readline()：可以讀取下一行到一個字元串，包括行尾的結束符

readlines()：則讀取整個檔案的所有行至以行為機關的字元串清單中

write(aString)：輸出位元組字元串到檔案

writelines(aList)：用于把清單内所有字元串寫入檔案

f.isatty()：是否是終端裝置檔案

f.truncate：截取最大指定位元組

注意：

檔案方法read()等在讀取檔案時，會一并讀取其行結束符

檔案方法write()執行寫出操作時，不會自動為其添加行結束符

6、檔案對象屬性

with文法

2.5開始支援with文法

用于需要打開、關閉成對的操作

可以自動關閉打開的對象

文法：

with open_expr as obj:

expression

In [90]: f = open("/tmp/passwd","r+")

In [91]: f.closed
Out[91]: False

In [92]: with open("/tmp/passwd","r+") as f:
    pass
   ....: 

In [93]: f.closed
Out[93]: True

複制

二、python文本處理

1、基本字元串處理

1）字元串分隔和連接配接

str.split() 分隔

str.rsplit() 從右邊開始分隔

In [11]: s1="xie xiao jun"


In [13]: help(s1.split)         


Help on built-in function split:

split(...)
    S.split([sep [,maxsplit]]) -> list of strings   #sep為分隔符，預設為空格  最大分隔次數
    
    Return a list of the words in the string S, using sep as the
    delimiter string.  If maxsplit is given, at most maxsplit
    splits are done. If sep is not specified or is None, any
    whitespace string is a separator and empty strings are removed
    from the result.
(END) 


In [12]: s1.spli
s1.split       s1.splitlines  

In [12]: s1.split()
Out[12]: ['xie', 'xiao', 'jun']

In [16]: s1.split("",2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-f3d385e69a09> in <module>()
----> 1 s1.split("",2)

ValueError: empty separator

In [17]: s1.split(" ",2)
Out[17]: ['xie', 'xiao', 'jun']

In [18]: s1.split(" ",1)
Out[18]: ['xie', 'xiao jun']

In [26]: s1.split("x")
Out[26]: ['', 'ie ', 'iao jun']

In [27]: s1
Out[27]: 'xie xiao jun'

In [28]: s1.split("i")
Out[28]: ['x', 'e x', 'ao jun']

In [29]: s1.split("n")
Out[29]: ['xie xiao ju', '']



In [35]: s1.rsplit
s1.rsplit

In [37]: s1.rsplit()                #當設定的分隔次數足夠的話rsplit和split沒差別
Out[37]: ['xie', 'xiao', 'jun']

In [38]: s1.rsplit(" ",1)          #當設定的分隔次數不夠時，rsplit從右邊開始分隔
Out[38]: ['xie xiao', 'jun']

In [39]: s1.split(" ",1)
Out[39]: ['xie', 'xiao jun']

複制

join() + 連接配接

In [57]: s1
Out[57]: 'xie xiao jun'

In [58]: s2
Out[58]: 'aa bb cc dd ee'

In [59]: ",".join(s1)
Out[59]: 'x,i,e, ,x,i,a,o, ,j,u,n'

In [60]: "-".join(s1)
Out[60]: 'x-i-e- -x-i-a-o- -j-u-n'

In [67]: 'xie' + "jun"
Out[67]: 'xiejun'

複制

2）字元串格式化

占位符替換 %s %d|%i %f

In [75]: "adfas%s" % "hello"
Out[75]: 'adfashello'

In [76]: "adfas %s" % "hello"
Out[76]: 'adfas hello'

In [77]: "adfas %s " % "hello"
Out[77]: 'adfas hello '

In [78]: "adfas %s %s%s" % "hello"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-78-97faba8d8356> in <module>()
----> 1 "adfas %s %s%s" % "hello"

TypeError: not enough arguments for format string

In [80]: "adfas %s %s%s" % ("hello","A","B")  #站位符和元祖的元素個數要相同
Out[80]: 'adfas hello AB'

複制

3）字元串查找

str.find() 查找

In [90]: help(s1.find)

Help on built-in function find:

find(...)
    S.find(sub [,start [,end]]) -> int
    
    Return the lowest index in S where substring sub is found,
    such that sub is contained within S[start:end].  Optional
    arguments start and end are interpreted as in slice notation.
    
    Return -1 on failure.
    
In [102]: s1.find("i")       #元素第一次出線的位置
Out[102]: 1    

In [101]: s1.find("i",4,8)
Out[101]: 5

複制

4）字元串替換

str.replace()

In [104]: s1.replace("x","X")
Out[104]: 'Xie Xiao jun'

In [105]: s1.replace("x","X",1)
Out[105]: 'Xie xiao jun'

複制

5）str.strip() 移除字元串首尾的空白字元

str.rstrip() 隻去除右邊的空白字元

str.strip() 隻去除左邊的空白字元

In [21]: s2 = ' xie xiao jun '

In [22]: s2
Out[22]: ' xie xiao jun '

In [23]: s2.st
s2.startswith  s2.strip       

In [23]: s2.strip()
Out[23]: 'xie xiao jun'

In [24]: s2.r
s2.replace     s2.rindex      s2.rpartition  s2.rstrip      
s2.rfind       s2.rjust       s2.rsplit      

In [24]: s2.rstrip()
Out[24]: ' xie xiao jun'

In [25]: s2.l
s2.ljust   s2.lower   s2.lstrip  

In [25]: s2.lstrip()

複制

三、os子產品

目錄不屬于檔案對象，輸于檔案系統，和檔案系統打交道，要使用os子產品

os子產品常用的方法：

1、目錄

getcwd()：擷取目前工作目錄

chdir()：切換工作目錄

chroot()：設定目前程序的根目錄

listdir()：列出指定目錄下的所有檔案名

mkdir()：建立指定目錄

makedirs()：建立多級目錄

rmdir()：删除目錄

removedirs()：删除多級目錄

In [1]: import os

In [4]: help(os.mkdir)


Help on built-in function mkdir in module posix:

mkdir(...)
    mkdir(path [, mode=0777])
    
    Create a directory.
(END) 

In [2]: os.mkdir('/tmp/test')

In [3]: ls /tmp
passwd  vgauthsvclog.txt.0  yum_save_tx-2016-09-02-17-11cyWWR1.yumtx
test/   vmware-root/        yum_save_tx-2016-09-21-23-45jB1DoO.yumtx

In [6]: os.getcwd()
Out[6]: '/root'

In [7]: os.c
os.chdir          os.chroot         os.confstr        os.curdir
os.chmod          os.close          os.confstr_names  
os.chown          os.closerange     os.ctermid        

In [8]: os.chdir('/tmp')

In [9]: os.getcwd()
Out[9]: '/tmp'

In [10]: os.stat('test')
Out[10]: posix.stat_result(st_mode=16877, st_ino=522528, st_dev=2050L, st_nlink=2, st_uid=0, st_gid=0, st_size=4096, st_atime=1474959686, st_mtime=1474959686, st_ctime=1474959686)

複制

2、檔案

mkfifo()：建立匿名管道

mknod()：建立裝置檔案

remove()：删除檔案

unlink():删除連結檔案

rename()：重命名

stat()：傳回檔案狀态資訊，适用于檔案和目錄

symlink()：建立連結

utime()：更新時間戳

tmpfile()：建立并打開（w+b）一個新的臨時檔案

walk()：目錄生成器

In [49]: g1=os.walk('/tmp')

In [50]: g1.
g1.close       g1.gi_frame    g1.next        g1.throw       
g1.gi_code     g1.gi_running  g1.send        

In [50]: g1.next
Out[50]: <method-wrapper 'next' of generator object at 0x24f0050>

In [51]: g1.next()
Out[51]: 
('/tmp',
 ['x', 'test1', 'vmware-root', 'test', '.ICE-unix'],
 ['test2',
  'yum_save_tx-2016-09-02-17-11cyWWR1.yumtx',
  'vgauthsvclog.txt.0',
  'passwd',
  'yum_save_tx-2016-09-21-23-45jB1DoO.yumtx'])

複制

3、通路權限

access()：判斷指定使用者對檔案是否有通路權限

chmod()：修改權限

chown()：改變屬者，屬組

umask()：設定預設權限模式

In [66]: os.a
os.abort   os.access  os.altsep  

In [66]: os.access('/root',0)
Out[66]: True

In [67]: os.access('/root',100)
Out[67]: False

複制

4、檔案描述符

open()：系統底層函數，打開檔案

read()：

write()：

5、裝置檔案

mkdev()：根據主裝置号，次裝置号建立裝置

major()：

minor()：

四、os.path子產品

os.path是os子產品的的子子產品

實作路徑管理，檔案路徑字元串本身的管理

In [5]: os.path
Out[5]: <module 'posixpath' from '/usr/local/python27/lib/python2.7/posixpath.pyc'>
In [3]: os.path.
os.path.abspath                     os.path.join
os.path.altsep                      os.path.lexists
os.path.basename                    os.path.normcase
os.path.commonprefix                os.path.normpath
os.path.curdir                      os.path.os
os.path.defpath                     os.path.pardir
os.path.devnull                     os.path.pathsep
os.path.dirname                     os.path.realpath
os.path.exists                      os.path.relpath
os.path.expanduser                  os.path.samefile
os.path.expandvars                  os.path.sameopenfile
os.path.extsep                      os.path.samestat
os.path.genericpath                 os.path.sep
os.path.getatime                    os.path.split
os.path.getctime                    os.path.splitdrive
os.path.getmtime                    os.path.splitext
os.path.getsize                     os.path.stat
os.path.isabs                       os.path.supports_unicode_filenames
os.path.isdir                       os.path.sys
os.path.isfile                      os.path.walk
os.path.islink                      os.path.warnings
os.path.ismount

複制

1、跟檔案路徑相關

basename()：去檔案路徑基名

dirname()：去檔案路徑目錄名

join()：将字元串連接配接起來

split()：傳回dirname(),basename()元祖

splitext()：傳回（filename，extension 擴充名）元祖

In [6]: os.path.basename('/tmp/passwd')
Out[6]: 'passwd'

In [7]: os.path.dirname('/tmp/passwd')
Out[7]: '/tmp'

In [8]: os.listdir('/tmp')
Out[8]: 
['x',
 'test2',
 'yum_save_tx-2016-09-02-17-11cyWWR1.yumtx',
 'test1',
 'vmware-root',
 'vgauthsvclog.txt.0',
 'passwd',
 'test',
 '.ICE-unix',
 'yum_save_tx-2016-09-21-23-45jB1DoO.yumtx']

In [9]: for filename in os.listdir('/tmp'):print os.path.join("/tmp",filename)
/tmp/x
/tmp/test2
/tmp/yum_save_tx-2016-09-02-17-11cyWWR1.yumtx
/tmp/test1
/tmp/vmware-root
/tmp/vgauthsvclog.txt.0
/tmp/passwd
/tmp/test
/tmp/.ICE-unix
/tmp/yum_save_tx-2016-09-21-23-45jB1DoO.yumtx


In [24]: os.path.split('/etc/sysconfig/network')
Out[24]: ('/etc/sysconfig', 'network')

複制

2、檔案相關資訊

getatime()：傳回檔案最近通路時間

getctime()

getmtime()

getsize()：傳回檔案的大小

3、查詢

exists()：判斷指定檔案是否存在

isabs()：判斷指定路徑是否為絕對路徑

isdir()：是否為目錄

isfile()：是否存在而且檔案

islink()：是否存在且為連結

ismount()：是否為挂載點

samefile()：兩個路徑是否指向同一個檔案

五、pickle子產品

Python程式中實作檔案讀取或寫出時，要使用轉換工具把對象轉換成字元串

實作對象持久存儲

把對象存儲在檔案中：

pickle子產品：

marshal：

把對象存儲在DB中：

DBM接口（需要裝載第三方接口）：

shelve子產品：既然實作流式化也能存在DB中

In [31]: l1=[1,2,3,"4",'abc']

In [34]: f1=open('/tmp/test2','a+')

In [36]: s1="xj"

In [37]: f1.write(s1)

In [40]: cat /tmp/test2

In [42]: f1.close()

In [43]: cat /tmp/test2
xj

In [47]: print l1
[1, 2, 3, '4', 'abc']

In [57]: f1.write(l1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-57-83ae8c8c88d4> in <module>()
----> 1 f1.write(l1)

TypeError: expected a character buffer object      #期望字元緩存對象

複制

pickle子產品：

In [58]: import pickle

In [61]: help(pickle.dump)

Help on function dump in module pickle:

dump(obj, file, protocol=None)
(END) 


[root@Node3 tmp]# cat test2
hello

n [77]: pickle.dump(l1,f1)     #前面已經定義了l1和f1，f1要是已打開的檔案

In [78]: f1.flush()

[root@Node3 tmp]# cat test2
hello                          
(lp0
I1
aI2
aI3
aS'4'
p1
aS'abc'
p2

In [105]: l2=pickle.load(f2)

In [106]: l2
Out[106]: [1, 2, 3, '4', 'abc']

複制

六、Python中的正規表達式

檔案是可疊代對象，以行為機關疊代

正規表達式是一個特殊的字元序列，它能幫助你友善的檢查一個字元串是否與某種模式比對。

Python 自1.5版本起增加了re 子產品，它提供 Perl 風格的正規表達式模式。

re 子產品使 Python 語言擁有全部的正規表達式功能。

compile 函數根據一個模式字元串和可選的标志參數生成一個正規表達式對象。該對象擁有一系列方法用于正規表達式比對和替換。

re 子產品也提供了與這些方法功能完全一緻的函數，這些函數使用一個模式字元串做為它們的第一個參數。

1、python中正規表達式的元字元

和bash中擴充正規表達式一樣：

.，[]，[^]，

中括号用于指向一個字元集合比如[a-z],[a,b,c]

中括号中可以使用元字元

中括号中元字元.僅代表字面量

[0-9]，\d（任意數字），\D（任意非數字）

[0-9a-zA-Z]，\w，\W

\s：任意空白字元：[\n\t\f\v\r]，\S

?，+，｛m｝，{m,n}，{0,n}，{m,}

^，$，\b，

|，（），\nn

（*|+|?|{}）?：預設貪婪模式，在表示重複的元字元後面加個？非貪婪模式

捕獲|分組

位置捕獲：（...）

命名捕獲：（?P<name>...） #python所特有的

斷言

在目标字元串目前比對位置的前面或後面進行的一種測試，但不占用字元

前向斷言(?=...) 肯定 (?!...) 否定

後向斷言(?<=...) 肯定 (?<!) 否定

?是前向，？<是後向

?=是肯定， ?! 是否定

2、re子產品常用的方法

re.math()：傳回match對象

屬性：

string

pos

endpos

方法：

group() ：分組，傳回字元串

groups()：分組，傳回以括号内的内容組成的元祖

start()

end()

re.search()：第一次比對到的字元,傳回match對象

re.findall()：比對到的所有字元，傳回一個清單

re.finditer()：比對到的所有字元，傳回一個疊代器，内容是math對象

re.split(“m”,str)：以m為分隔符，分割str，傳回清單

re.sub（）：替換，傳回字元串

re.subn（）：傳回元祖

flags：

I或IGNORECASE：忽略大小寫

M或MULTILINE：實作跨行比對 #用的不多

A或ASCII：僅執行8位ASCII碼比對

U或UNICODE：

In [251]: import re

In [252]: re.
re.DEBUG        re.S            re.compile      re.search
re.DOTALL       re.Scanner      re.copy_reg     re.split
re.I            re.T            re.error        re.sre_compile
re.IGNORECASE   re.TEMPLATE     re.escape       re.sre_parse
re.L            re.U            re.findall      re.sub
re.LOCALE       re.UNICODE      re.finditer     re.subn
re.M            re.VERBOSE      re.match        re.sys
re.MULTILINE    re.X            re.purge        re.template


In [262]: re.match('a','abc')
Out[262]: <_sre.SRE_Match at 0x319b3d8>       #傳回一個match對象

In [263]: match=re.match('a',"abc")

In [264]: match.         #match對象内部的相關屬性或方法
match.end        match.groupdict  match.pos        match.start
match.endpos     match.groups     match.re         match.string
match.expand     match.lastgroup  match.regs       
match.group      match.lastindex  match.span     


In [264]: match.string     #比對的字元串本身
Out[264]: 'abc'

In [266]: match.re
Out[266]: re.compile(r'a')    #比對使用的文本，比對時會自動編譯

In [268]: match.group()     #greoup是一個方法，比對到的字元串
Out[268]: 'a'

In [269]: match=re.match('a.',"abc")

In [270]: match.group()
Out[270]: 'ab'

In [271]: match.groups()   #以元祖方式傳回所有比對到的結果
Out[271]: ()



In [58]: str1="heelo world"

In [59]: re.search("(l(.))",str1)
Out[59]: <_sre.SRE_Match at 0x1603580>

In [60]: mat1=re.search("(l(.))",str1)

In [61]: mat1.
mat1.end        mat1.groupdict  mat1.pos        mat1.start
mat1.endpos     mat1.groups     mat1.re         mat1.string
mat1.expand     mat1.lastgroup  mat1.regs       
mat1.group      mat1.lastindex  mat1.span       

In [62]: help(mat1.group)


Help on built-in function group:

group(...)
    group([group1, ...]) -> str or tuple.
    Return subgroup(s) of the match by indices or names.
    For 0 returns the entire match.

In [63]: mat1.group()  #比對到的全部字元串
Out[63]: 'lo'
    
In [66]: mat1.group(0)  #比對到的全部字元串
Out[66]: 'lo'

In [67]: mat1.group(1)  #比對到的第一個分組，不保護分組外的内容（括号外比對到的内容）
Out[67]: 'lo'

In [68]: mat1.group(2)  #比對到的第2個分組（第2個括号内的内容）
Out[68]: 'o'

In [69]: mat1.group(3)  #沒有第三個
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-69-fe309512a255> in <module>()
----> 1 mat1.group(3)

IndexError: no such group


In [77]: mat1.groups()    #傳回以比對到分組為内容的元祖，不包括分組外的内容（括号外比對到的内容）
Out[77]: ('lo', 'o')

In [78]: mat1.groups(0)
Out[78]: ('lo', 'o')

In [79]: mat1.groups(1)
Out[79]: ('lo', 'o')

In [80]: mat1.groups(2)
Out[80]: ('lo', 'o')

In [81]: mat1.groups(3)
Out[81]: ('lo', 'o')

In [89]: re.findall("(l(.))",str1)
Out[89]: [('lo', 'o'), ('ld', 'd')]

In [146]: for mat in re.findall("(o(.))",str1):print mat
('o ', ' ')
('or', 'r')

In [148]: for mat in re.finditer("(o(.))",str1):print mat
<_sre.SRE_Match object at 0x1603938>
<_sre.SRE_Match object at 0x16039c0>


In [150]: for mat in re.finditer("(o(.))",str1):print mat.group()
o 
or

In [151]: for mat in re.finditer("(o(.))",str1):print mat.groups()
('o ', ' ')
('or', 'r')





In [114]: str2               
Out[114]: 'hellO wOrld'

In [120]: mat2=re.findall("(l(o))",str2,re.I)    #忽略大小寫

In [121]: mat2
Out[121]: [('lO', 'O')]

In [122]: mat2=re.findall("(l(o))",str2,)

In [123]: mat2
Out[123]: []


In [282]: match.start()   #從哪個位置開始比對到
Out[282]: 0

In [283]: match.end()     #比對結束的位置
Out[283]: 2

In [299]: match.pos     #從哪個位置開始搜尋
Out[299]: 0

In [300]: match.endpos   #搜尋的結束位置
Out[300]: 3

複制

In [2]: url="www.magedu.com"

In [3]: re.search("m",url)
Out[3]: <_sre.SRE_Match at 0x14f7098>

In [5]: mat=re.search("m",url)

In [6]: mat
Out[6]: <_sre.SRE_Match at 0x14f7100>

In [7]: mat.
mat.end        mat.group      mat.lastgroup  mat.re         mat.start
mat.endpos     mat.groupdict  mat.lastindex  mat.regs       mat.string
mat.expand     mat.groups     mat.pos        mat.span       

In [8]: mat.group()
Out[8]: 'm'


In [10]: re.findall("m",url)
Out[10]: ['m', 'm']

In [11]: re.finditer("m",url)
Out[11]: <callable-iterator at 0x162f510>

In [12]: mat1=re.fi
re.findall   re.finditer  

In [12]: mat1=re.finditer("m",url)

In [13]: mat1.next()
Out[13]: <_sre.SRE_Match at 0x1626e68>

In [14]: mat1.next()
Out[14]: <_sre.SRE_Match at 0x1626ed0>

In [15]: mat1.next()
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-15-e0f232c7f87c> in <module>()
----> 1 mat1.next()

StopIteration: 


In [19]: re.split(".",url)    #需要轉義
Out[19]: ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

In [20]: re.split("\.",url)
Out[20]: ['www', 'magedu', 'com']

In [30]: f1=open("/tmp/passwd","r+")

In [31]: re.split(":",f1.readline())
Out[31]: ['root', 'x', '0', '0', 'root', '/root', '/bin/bash\n']

複制

re.sub（）：

In [34]: help(re.sub)


Help on function sub in module re:

sub(pattern, repl, string, count=0, flags=0)
    Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a string, backslash escapes in it are processed.  If it is
    a callable, it's passed the match object and must return
    a replacement string to be used.


In [35]: url
Out[36]: 'www.magedu.com'

In [37]: re.sub("ma","MA",url)
Out[38]: 'www.MAgedu.com'

In [35]: re.sub("m","M",url)
Out[35]: 'www.Magedu.coM'

In [36]: re.sub("m","M",url,1)
Out[36]: 'www.Magedu.com'

In [37]: re.sub("m","M",url,2)
Out[37]: 'www.Magedu.coM'

In [39]: re.subn("m","M",url,3)    #會顯示替換了幾次
Out[39]: ('www.Magedu.coM', 2)

In [169]: re.sub("M","S",url,count=2,flags=re.I)
Out[169]: 'www.Sagedu.coS'

In [170]: re.sub("M","S",url,count=2)
Out[170]: 'www.magedu.com'

複制

re.match與re.search的差別

re.match隻比對字元串的開始，如果字元串開始不符合正規表達式，則比對失敗，函數傳回None；而re.search比對整個字元串，直到找到一個比對。

執行個體：

#!/usr/bin/pythonimport re

line = "Cats are smarter than dogs";matchObj = re.match( r'dogs', line, re.M|re.I)if matchObj: #加一個r表示是自然字元串不會被轉義，例如\n在raw string中，是兩個字元，\和n，而不會轉意為換行符。由于正規表達式和\會有沖突，是以，當一個字元串使用了正規表達式後，最好在前面加上r
   print "match --> matchObj.group() : ", matchObj.group()else:
   print "No match!!"matchObj = re.search( r'dogs', line, re.M|re.I)if matchObj:
   print "search --> matchObj.group() : ", matchObj.group()else:
   print "No match!!"

複制

以上執行個體運作結果如下：

No match!!search --> matchObj.group() :  dogs

複制

檢索和替換

Python 的re子產品提供了re.sub用于替換字元串中的比對項。

文法：

re.sub(pattern, repl, string, max=0)

複制

傳回的字元串是在字元串中用 RE 最左邊不重複的比對來替換。如果模式沒有發現，字元将被沒有改變地傳回。

可選參數 count 是模式比對後替換的最大次數；count 必須是非負整數。預設值是 0 表示替換所有的比對。

執行個體：

#!/usr/bin/pythonimport re

phone = "2004-959-559 # This is Phone Number"# Delete Python-style commentsnum = re.sub(r'#.*$', "", phone)print "Phone Num : ", num# Remove anything other than digitsnum = re.sub(r'\D', "", phone)    print "Phone Num : ", num

複制

以上執行個體執行結果如下：

Phone Num :  2004-959-559Phone Num :  2004959559

複制

正規表達式修飾符 - 可選标志

正規表達式可以包含一些可選标志修飾符來控制比對的模式。修飾符被指定為一個可選的标志。多個标志可以通過按位 OR(|) 它們來指定。如 re.I | re.M 被設定成 I 和 M 标志：

修飾符	描述
re.I	使比對對大小寫不敏感
re.L	做本地化識别（locale-aware）比對
re.M	多行比對，影響 ^ 和 $
re.S	使 . 比對包括換行在内的所有字元
re.U	根據Unicode字元集解析字元。這個标志影響 \w, \W, \b, \B.
re.X	該标志通過給予你更靈活的格式以便你将正規表達式寫得更易于了解。