今天因為某個原因再次寫python腳本,像我這種懶鬼,能讓機器做到的事很少自己去動手。算起來我也算是寫了很多python腳本了,大部分都是文本分析之類的,是以提取了一些常用的函數出來
基于Python3.2,在分析大量的文本時會用到的函數
PS:GetFileFromThisRootDir 之前是大小寫敏感的,現在修改
1.[檔案] MyTools.py ~ 2KB 下載下傳(71)
__author__ = 'soso_fy'
#codeing:utf-8
# 寫python腳本經常要用到的一些函數
# 免得每次都重寫蛋疼
# require python 3.2 or later
import os
import codecs
# 讀取文本檔案函數,支援bom-utf-8,utf-8,utf-16,gbk,gb2312
# 傳回檔案内容
def ReadTextFile(filepath):
try:
file = open(filepath, 'rb')
except IOError as err:
print('讀取檔案出錯 in ReadFile', err)
bytes = file.read()
file.close()
if bytes[:3] == codecs.BOM_UTF8:
content = bytes[3:].decode('utf-8')
else:
try:
content = bytes.decode('gb2312')
except UnicodeDecodeError as err:
try:
content = bytes.decode('utf-16')
except UnicodeDecodeError as err:
try:
content = bytes.decode('utf-8')
except UnicodeDecodeError as err:
try:
content = bytes.decode('gbk')
except UnicodeDecodeError as err:
content = ''
print('不支援此種類型的文本檔案編碼', err)
return content
# 擷取指定路徑下所有指定字尾的檔案
# dir 指定路徑
# ext 指定字尾,連結清單&不需要帶點或者不指定。例子:['xml', 'java']
def GetFileFromThisRootDir(dir,ext = None):
allfiles = []
needExtFilter = (ext != None)
if needExtFilter:
ext = list(map(lambda x:x.lower(), ext))
for root,dirs,files in os.walk(dir):
for filespath in files:
filepath = os.path.join(root, filespath).lower()
extension = os.path.splitext(filepath)[1][1:]
if needExtFilter and extension in ext:
allfiles.append(filepath)
elif not needExtFilter:
allfiles.append(filepath)
return allfiles