昨天處理一批序列槽資料文本,有許多模式,對應了很多資料,資料是每十個為一組,全部以48開頭。為了省事,可以通過python進行文本處理,将資料全部按48換行。
原始資料如:48 54 05 00 03 00 66 18 00 22 48 54 05 00 03 00 66 18 00 22 48 54 05 00 03 00 66 18 00 22 48 54 04 00 00 00 00 00 00 A2 48 54 05 00 03 00 66 18 00 22 48 54 05 00 03 00 66 18 00 22 48 54 05 00 03 00 66 18 00 22 48 54 04 00 02 00 66 18 00 22
儲存了很多檔案。
1、對檔案夾内是以檔案進行處理,适合大批量檔案
#!/usr/bin/env python
#-*- coding: utf-8 -*-
#filename:exchange_newline_all.py
import re,os
def newline(file_name):#定義轉換函數
f = open(file_name,'r')
content = f.readlines()
f.close()
f = open(file_name,'w')
for line in content:
f.writelines(re.sub('48','\n48',line))#在48之前換行替換
f.close()
while True:
path = raw_input('INPUT THE FILEPATH(eg F:\sscom\dir):')#輸入待處理檔案夾
print path
l = os.listdir(r'%s' % path)#周遊檔案夾内檔案
num = len(l)
for i in range(0,num):#逐一處理每個檔案
file_x = r'%s\%s' % (path, l[i])
newline(file_x)
print "FINISHED,OK"
輸出:
>>> ================================ RESTART ================================
>>>
INPUT THE FILEPATH(eg F:\sscom\dir):F:\tmp\1307\sscom\tmp2\mode_test
F:\tmp\1307\sscom\tmp2\mode_test
FINISHED,OK
FINISHED,OK
FINISHED,OK
FINISHED,OK
FINISHED,OK
FINISHED,OK
FINISHED,OK
FINISHED,OK
FINISHED,OK
INPUT THE FILEPATH(eg F:\sscom\dir):
運作處理後檔案内容為:
48 54 05 00 03 00 66 18 00 22
48 54 05 00 03 00 66 18 00 22
48 54 05 00 03 00 66 18 00 22
48 54 04 00 00 00 00 00 00 A2
48 54 05 00 03 00 66 18 00 22
48 54 05 00 03 00 66 18 00 22
48 54 05 00 03 00 66 18 00 22
48 54 04 00 02 00 66 18 00 22
這樣就友善使用了。
2、對單個檔案進行處理
#!/usr/bin/env python
#-*- coding: utf-8 -*-
#filename:exchange_newline2.py
import re,os
def newline(file_name):
f = open(file_name,'r')
content = f.readlines()
f.close()
f = open(file_name,'w')
for line in content:
f.writelines(re.sub('48','\n48',line))
f.close()
while True:#對單個檔案進行處理
path = raw_input('INPUT THE FILEPATH(eg F:\sscom\crt.TXT):')
file_path = r'%s' % path
newline(file_path)
print "FINISHED,OK"
輸出:
>>> ================================ RESTART ================================
>>>
INPUT THE FILEPATH(eg F:\sscom\crt.TXT):F:\tmp\1307\sscom\tmp2\mode_test\custom.TXT
FINISHED,OK
INPUT THE FILEPATH(eg F:\sscom\crt.TXT):
3、使用帶指令行模式輸入檔案路徑,處理單個檔案
#!/usr/bin/env python
#-*- coding: utf-8 -*-
#filename:exchange_newline1.py
import re,os
from sys import argv
def newline(file_name):
f = open(file_name,'r')
content = f.readlines()
f.close()
f = open(file_name,'w')
for line in content:
f.writelines(re.sub('48','\n48',line))
f.close()
script, path = argv
file_path = r'%s' % path
newline(file_path)
輸出:
D:\>cd Python\tmp
D:\Python\tmp>python exchange_newline1.py F:\tmp\1307\sscom\tmp2\mode_test\custo
m.TXT
D:\Python\tmp>
這樣就基本都能滿足需求了,不過檔案隻支援txt檔案,office檔案不支援。