天天看点

python文件中单词的删除_使用python删除文件中的多余单词

嗨,我正在学习Python,出于好奇,我编写了一个程序来删除文件中多余的单词。

我正在比较文件text1.txt中的测试。和‘text2.txt’,基于text1中的测试,我删除了test2中多余的单词。在# Bin/ Python

text1 = open('text1.txt','r')

text2 = open('text2.txt','r')

t_l1 = text1.readlines()

t_l2 = text2.readlines()

# printing to check if the file contents were read properly.

print ' Printing the file 1 contents:'

w_t1 = []

for i in range(len(t_l1)):

w_t1 = t_l1[i].split(' ')

for j in range(len(w_t1)):

print w_t1[j]

#printing to see if the contents were read properly.

print'File 2 contents:'

w_t2 = []

for i in range(len(t_l2)):

w_t2.extend(t_l2[i].split(' '))

for j in range(len(w_t2)):

print w_t2[j]

print 'comparing and deleting the excess variables.'

i = 1

while (i<=len(w_t1)):

if(w_t1[i-1] == w_t2[i-1]):

print w_t1[i-1]

i += 1

# I put all words of file1 in list w_t1 and file2 in list w_t2. Now I am checking if

# each word in w_t1 is same as word in same place of w_t2 if not, i am deleting the

# that word in w_t2 and continuing the while loop.

else:

w.append(str(w_t2[i-1]))

w_t2.remove(w_t2[i-1])

i = i

print 'The extra words are: '+str(w) +'\n'

print w

print 'The original words are: '+ str(w_t2) +'\n'

print 'The extra values are: '

for item in w:

print item

# opening the file out.txt to write the output.

out = open('out.txt', 'w')

out.write(str(w))

# I am closing the files

text1.close()

text2.close()

out.close()

说text1.txt文件有“生日快乐亲爱的朋友”的字样

text2.txt上写着“祝你生日快乐,我亲爱的朋友”

程序应该在text2.txt中给出额外的单词,即“鼓掌,给,你,我的,最好”

上面的程序运行得很好,但是如果我必须对一个包含数百万字或百万行的文件执行此操作呢??检查每个单词似乎不是一个好主意。我们有没有Python预定义的函数呢??在

注:如果这是一个错误的问题,请原谅我,我正在学习python。很快我就不再问这些了。在