Python學習筆記 – 序列（二）字元串

标簽（空格分隔）： python 字元串序列

字元串

是Python中的資料集類型（資料集類型：是由一組Python對象構成的單個對象）之一。

字元串類型是一類特殊的資料集對象，稱為序列。序列類型中的對象按一定的順序排列，即對象序列。

python字元串是對象，它的屬性就是字元序列。其是單個字元的序列。

注：字元串是不可變的

>>> str1 = "Hello World"
>>> print str1
Hello World
>>> id(str1)

>>> str1 = "Hello Python"
>>> print str1
Hello Python
>>> id(str1)

對字元串修改時，字元串會建立新的對象。即此處雖然是同名變量，但是關聯（引用）的對象已經不同了(通過其id傳回的值可以看出)。

>>> str1[] = 'h'
Traceback (most recent call last):
  File "<pyshell#24>", line , in <module>
    str1[] = 'h'
TypeError: 'str' object does not support item assignment

對之前的str1做修改，會提示錯誤。這種限制在一定程度下來說是基于效率原因而設定的。字元串不可改變，python解釋器運作起來更快。

>>> print str1[]
e
>>> for i in str1:
    print i,

H e l l o   P y t h o n
>>>

由于字元串也是序列，是以也可以通過下标來通路其中的值。當然，其也有切片操作。

>>> print str1[:]
Hello Python
>>> print str1[:]
ello P

字元串的建立

Python中的字元串通過兩種方式建立：

1. 通過字元串的構造函數str來建立

>>> str("String")
'String'

2. 通過單引号（’）或雙引号（”）建立 - - 字面量建立

>>> 'String'
'String'
>>> "String"
'String'

注意：在使用引号建立字元串時，要注意單雙引号的混用，不要交叉使用。

#單雙引号的混用
>>> "It's a String"
"It's a String"
#引号比對問題 -> Error
>>> 'It's a String'
SyntaxError: invalid syntax
>>>

更多單雙引号及三引号的使用問題，參看Python中單、雙引号及多引号差別

3. 原始字元串操作符

原始字元串，是指所有的字元串都是直接按照字面的意思來使用，沒有轉義特殊或不能列印的字元。

原始字元串這個特性讓很多字元串操作變得簡單友善，比如正規表達式的建立。

‘r’符号就是原始字元串操作符，可以是大寫，也可以是小寫，但是在使用時必須要緊靠在字元串的第一個引号前。

>>> '\n'
'\n'
>>> print '\n'


>>> r'\n'
'\\n'
>>> print r'\n'
\n

在涉及到路徑和正規表達式中，使用的較多，這個後續整理。

字元串運算符操作

1. +和*運算符（重載運算符）

運算符	作用
+運算符	用于連接配接字元串
*運算符	用于字元串的重複

>>> str1 = "Hello "
>>> str2 = "World!"  
>>> str3 = str1 + str2  # +運算符，連接配接str1和str2字元串
>>> print str3
Hello World!
>>> str4 = str3 *    # *運算符，重複str3字元串3次
>>> print str4
Hello World!Hello World!Hello World!

2. in和not in運算符

運算符	作用
in運算符	用于檢查集合的成員，測試字元串序列是否在字元串中
not in運算符	用于檢查某對象是不在字元串中

#in操作符
>>> str1 = "Hello World"
>>> 'o' in str1
True
>>> 'b' in str1
False
>>> 'rl' in str1
True
>>> 'Hl' in str1
False
#not in 操作符
>>> str1 = "Hello World"
>>> a not in str1    #注意a和'a'的差別
Traceback (most recent call last):
  File "<pyshell#58>", line , in <module>
    a not in str1
NameError: name 'a' is not defined
>>> 'a' not in str1
True

單個字元時，會檢查字元是否在字元串中，是的話傳回True，否則為False；多個字元時，會檢查是否按照此字元順序顯示在字元串中，是則為True,否則為False。

字元串比較

1. 單個字元比較

>>> 'a' > 'B'
True
>>> 'a' == 'A'
False
>>> 'a' < ' '
False

單個字元比較時，比較的是字元的ASCII碼，可以通過ord()（将字元轉為ASCII值的函數，chr()與之相反）來測試。

函數	作用
ord()	将字元轉為ASCII值
chr()	将ASCII值轉為字元

>>> ord('a'),ord('B')
(, )
>>> ord('a'),ord('A')
(, )
>>> ord('a'),ord(' ')
(, )

2. 多個字元比較

>>> 'abc' > 'abd'
False
>>> 'abc' < 'abcd'
True

多個字元串比較也是基于字元的ASCII的比較，基本思路是并行的檢查兩個字元串中位于同一位置的兩個字元的ASCII碼大小，直到找到兩個不同的字元為止。

從兩個字元串中索引為0的位置開始比較
比較位于目前位置的兩個單字元

如果兩個字元相等，則兩個字元串的索引加1，并繼續下一個比較

如果兩個字元串不相等，傳回這兩個字元的比較結果，作為字元串的比較結果
如果兩個字元串比較到一個字元串結束時都相等，則較長的字元串更大

3. 内建函數比較

cmp()函數，同操作符一樣，根據字元串的ACSII碼值來比較。

cmp(x,y)，當x

>>> str1 = 'abc';
>>> str2 = 'bcd';
>>> str3 = '123';
>>> cmp(str1,str2)
-
>>> cmp(str2,str3)

>>> cmp(str3,str1)
-

字元串變換

1. 字元串中字元大小寫變換

1.1 Python内置函數**

函數名稱	函數作用
lower()	字元串轉為小寫
upper()	字元串轉為大寫
swapcase()	大小寫互換
capitalize()	首字母大寫

>>> str1 = 'hello Python!'
>>> print str1
hello Python!
>>> str1.lower()
'hello python!'
>>> str1.upper()
'HELLO PYTHON!'
>>> str1.swapcase()
'HELLO pYTHON!'
>>> str1.capitalize()
'Hello python!'

1.2 string子產品函數：

string.capwords(s)   #首字母轉為大寫
            #它是先把s用split()函數分開，再用capitalize()把首字母變成大寫，最後用join()合并到一起

>>> import string
>>> str1 = 'hello Python!'
>>> print(string.capwords(str1))
Hello Python!
>>> str1 = 'hello-Python!'
>>> print(string.capwords(str1))
Hello-python!
>>> str1 = 'hEllO Python'
>>> print(string.capwords(str1))
Hello Python
>>> str1 = 'hEllO-Python'
>>> print(string.capwords(str1))
Hello-python

由上面程式，可以看出，capwords(str1),是通過split()以空格把字元串分割成多個，然後對每個字元串的首字母進行轉換，其餘字母變成小寫字母，然後再通過join()來連結字元串。

字元串轉換

1. 字元串轉換為其他函數

下面的幾個函數存在于string子產品中。

函數	含義
S.atoi(s[,base])	base預設為10，如果為0,那麼s就可以是012或0x23這種形式的字元串，如果是16那麼s就隻能是0x23或0X12這種形式的字元串
S.atol(s[,base])	轉成long
S.atof(s[,base])	轉成float

2. 字元串的編碼和解碼

字元串在Python内部的表示是unicode編碼，是以，在做編碼轉換時，通常需要以unicode作為中間編碼，即先将其他編碼的字元串解碼（decode）成unicode，再從unicode編碼（encode）成另一種編碼

函數	含義
S.encode([encoding,[errors]])	将unicode編碼轉換成其他編碼的字元串
S.decode([encoding,[errors]])	将其他編碼的字元串轉換成unicode編碼

這一部分後面再詳細整理…

字元串格式化

>>> str1 = 'Hello Python'
>>> print "format String : %s" %(str1)
format String : Hello Python
>>> '{0} {1}'.format('Hello','Python')
'Hello Python'
>>> str1 = '{0},{1}'
>>> str1.format('Hello','Python')
'Hello,Python'

1. 常用格式描述符

格式描述符	含義
%s	字元串
%d	十進制數
%f	浮點小數
%e	浮點指數

格式描述符的格式：

%[name][flags][width][.precision]code

2. 字元串格式轉換類型

格式描述符	含義
%c	字元及其ASCII碼
%s	字元串(使用str轉換任意Python對象)
%r	字元串(使用repr轉換任意Python對象)
%d(%i)	有符号整數(十進制)
%u	無符号整數(十進制)
%o	無符号整數(八進制)
%x	無符号整數(十六進制)
%X	無符号整數(十六進制大寫字元)
%e	浮點數字(科學計數法)
%E	浮點數字(科學計數法，用E代替e)
%f(%F)	浮點數字(用小數點符号)
%g	浮點數字(根據值的大小采用%e或%f)
%G	浮點數字(類似于%g)
%p	指針(用十六進制列印值的記憶體位址)
%n	存儲輸出字元的數量放進參數清單的下一個變量中

字元串的其他操作

1. 字元串連接配接：

- 字面量的自動級連字元串
 - +運算符
 - s.__add__(s1)   
 - s.join(iterable)

>>> 'Hello ''Python'
'Hello Python'          #自動級連
>>> str1 = 'Hello'
>>> str1 = str1.__add__('Python')
>>> print str1
HelloPython
>>> list = ['Hello','Python']
>>> str1 = ' '.join(list)  # 将清單轉為字元串
>>> print str1
Hello Python

通過上面代碼，延伸一下，我們即可以通過切片操作來實作字元串的修改。

>>> str1 = "HeLlo"
>>> print str1
HeLlo
>>> str1 = str1[:] + "l" + str1[:]
>>> print str1
Hello

2. 字元串的分割

函數	含義
S.split([sep, [maxsplit]])	以sep是分隔符，把s分割成一個list。sep預設為空格。maxsplit是分割的次數，不填寫這個參數時，預設是對整個s進行分割
S.rsplit([sep, [maxsplit]])	和split()的差別是它是從s的串尾往前進行分割
S.splitlines([keepends])	把S按照行分割符分為一個list，keepends是一個bool值，如果為真每行後而會保留行分割符，否則不保留分隔符

>>> str1 = "Hello World, Hello Python!"
>>> str1.split ()
['Hello', 'World,', 'Hello', 'Python!']
>>> str1.split ("o")
['Hell', ' W', 'rld, Hell', ' Pyth', 'n!']
>>> str1.split ("o",)
['Hell', ' W', 'rld, Hell', ' Python!']
>>> str1.rsplit ("o",)
['Hello W', 'rld, Hell', ' Pyth', 'n!']
>>> str2 = "Hello\n World,\n Hello\n Python!"
>>> str2.splitlines(True)
['Hello\n', ' World,\n', ' Hello\n', ' Python!']
>>> str2.splitlines(False)
['Hello', ' World,', ' Hello', ' Python!']

3. 字元串的查找（搜尋）：

查找字元串中是否含有某個字元串，并給出相應的操作。

函數	含義
S.find(substr, [start, [end]])	傳回S中出現substr的第一個字母的标号，如果S中沒有substr則傳回-1。
S.index(substr, [start, [end]])	與find()相同，隻是在S中沒有substr時，會傳回一個運作時錯誤
S.rfind(substr, [start, [end]])	傳回S中最後出現的substr的第一個字母的标号，如果S中沒有substr則傳回-1
S.rindex(substr, [start, [end]])	傳回S中最後出現的substr的第一個字母的标号，如果S中沒有substr則傳回一個運作時錯誤
S.count(substr, [start, [end]])	計算substr在S中出現的次數
S.replace(oldstr, newstr, [count])	把S中的oldstar替換為newstr，count為替換次數，不給替換次數時，預設全部替換。這是替換的通用形式，還有一些函數進行特殊字元的替換
S.strip([chars])	把S中前後chars中有的字元全部去掉，可以了解為把S前後chars替換為None
S.lstrip([chars])	把S前，chars中有的字元全部去掉，可以了解為把S前的chars替換為None
S.rstrip([chars])	把S後，chars中有的字元全部去掉，可以了解為把S後的chars替換為None

>>> str1 = "Hello Python"
>>> print str1.find("P")

>>> print str1.find("Py") #傳回要查找的字元串的第一個字元在str1中第一次出現的索引

>>> print str1.find("a") #要查找的字元串不存在時傳回-1
-
>>> print str1.index
<built-in method index of str object at >
>>> print str1.index("P")

>>> print str1.index("Py") #傳回要查找的字元串的第一個字元在str1中第一次出現的索引

>>> print str1.index("a") #要查找的字元串不存在時報錯

Traceback (most recent call last):
  File "<pyshell#19>", line , in <module>
    print str1.index("a")
ValueError: substring not found
>>> 
>>> str2 = "Hello world"
>>> print str2.count("l")  #傳回查找的字元串在str2中出現的次數

>>> print str2.count("a")  #str2中沒有該字元串時，傳回0

>>>
>>> print str2.replace ("l","a",)  #替換str2中的"l"為"a"一次
Healo world
>>> print str2.replace("l","a") #不給替換次數時，預設為全部替換
Heaao worad
>>>
>>> str3 = "   Hello World   "
>>> print str3
   Hello World   
>>> print str3.lstrip()  #去除str3中"Hello World"前的空格
Hello World   
>>> print str3.rstrip()
   Hello World
>>> print str3.strip()
Hello World
>>>   
>>> str3 = '''
Hello World
'''
>>> print str3

Hello World

>>> print str3.strip() #去除str3中前後的換行
Hello World
>>> str4 = "---Hello World---"
>>> print str4
---Hello World---
>>> print str4.strip("-") #去除str4中前後的'-'
Hello World
>>>

4. 字元串的檢測方法

下面函數用于檢測字元串，但是都不在string子產品中，且傳回的都是bool類型值。

函數	含義
S.startwith(prefix[,start[,end]])	是否以prefix開頭
S.endwith(suffix[,start[,end]])	是否以suffix結尾
S.isalnum()	是否全是字母和數字，并至少有一個字元
S.isalpha()	是否全是字母，并至少有一個字元
S.isdigit()	是否全是數字，并至少有一個字元
S.isspace()	是否全是空白字元，并至少有一個字元
S.islower()	S中的字母是否全是小寫
S.isupper()	S中的字母是否便是大寫
S.istitle()	S是否是首字母大寫的

>>> str1 = "-Hello  world-"
>>> str1.startswith("-")
True
>>> str1.endswith("-")
True
>>> str1.startswith(" ")
False
>>> 
>>> str1.startswith() #必須給一個值，不然會報錯
Traceback (most recent call last):
  File "<pyshell#77>", line , in <module>
    str1.startswith()
TypeError: startswith() takes at least  argument ( given)
>>>
>>> str1.isalnum () #是否全是數字或字母
False
>>> str2 = "Hello World"
>>> str2.isalnum ()
False
>>> str3 = "HelloWorld"
>>> str3.isalnum ()
True
>>> str3 = "HelloWorld123"
>>> str3.isalnum ()
True
>>>

Python學習筆記 -- 序列（二）字元串Python學習筆記 – 序列（二）字元串