近日學習了python語言,簡單實作了一個爬蟲,爬取了慕課網課程簡介上的圖檔,并儲存到本地。以下是實驗代碼:
# -*- coding: utf-8 -*-
"""
Spyder Editor
"""
import re
import os
import urllib.request #在python3.6環境中實作
f_soure = urllib.request.urlopen('http://www.imooc.com/course/list')
#爬取目标位址
mybytes = f_soure.read()
mystr = mybytes.decode('utf8')
result = re.findall(r'http:.+\.jpg',mystr)
#列印輸出并對正則結果進行字元串切割
print(len(result[]))
print(result[].index('.jpg'))
print(result[][:])
l = []
for i in result:
l.append(i[:])
print (l)
#重新生成圖檔url位址,讀出并儲存到本地
k=
for url in l:
f = open('F:\\python_test\\%d.jpg'%(k),'wb+')
rep = urllib.request.urlopen(url)
f.write(rep.read())
f.close()
k+=
print ('success')
在實驗過程中,發現使用python自帶的os子產品,當urllib.request.urlopen(url)方法傳回的類file對象時,使用此對象的read方法會出現讀取不完全現象。代碼如下:
# -*- coding: utf-8 -*-
"""
Spyder Editor
"""
import re
import urllib.request
f_soure = urllib.request.urlopen('http://www.imooc.com/course/list')
mybytes = f_soure.read()
mystr = mybytes.decode('utf8')
result = re.findall(r'http:.+\.jpg',mystr)
print(len(result[]))
print(result[].index('.jpg'))
print(result[][:])
l = []
for i in result:
l.append(i[:])
print (l)
k=
#使用os子產品進行寫入
for url in l:
f = os.open('F:\\python_test\\%d.jpg'%(k),os.O_CREAT|os.O_RDWR)
rep = urllib.request.urlopen(url)
iter_f = iter(rep)
for line in iter_f:
os.write(f,line)
os.close(f)
k+=
print ('success')
如果有大神知道為什麼這樣,還請解惑~~