python爬取的資料儲存到表格中_利用Python爬取的資料存入Excel表格

2023-03-08 03:02:03

分析要爬取的内容的網頁結構：

demo.py:

import requests #requests是HTTP庫

import re

from openpyxl import workbook # 寫入Excel表所用

from openpyxl import load_workbook # 讀取Excel表所用

from bs4 import BeautifulSoup as bs #bs:通過解析文檔為使用者提供需要抓取的資料

import os

import io

import sys

sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') #改變标準輸出的預設編碼

#我們開始利用requests.get（）來擷取網頁并利用bs4解析網頁：

def getData(src):

html = requests.get(src).content # requests.get(src)傳回的是狀态碼，加上.content以位元組形式（二進制傳回資料。和前端一樣，分為get post等 http://www.cnblogs.com/ranxf/p/7808537.html

soup = bs(html,'lxml') # lxml解析器解析位元組形式的資料，得到完整的類似頁面的html代碼結構的資料

print(soup)

global ws

Name = []

Introductions = []

introductions = soup.find_all("a",class_="book-item-name")

nameList = soup.find_all("a",class_="author")

print (nameList)

for name in nameList:

print (name.text)

Name.append(name.text)

for introduction in introductions:

Introductions.append(introduction.text)

for i in range(len(Name)):

ws.append([Name[i],Introductions[i]])

if __name__ == '__main__':

# 讀取存在的Excel表測試

# wb = load_workbook('t est.xlsx') #加載存在的Excel表

# a_sheet = wb.get_sheet_by_name('Sheet1') #根據表名擷取表對象

# for row in a_sheet.rows: #周遊輸出行資料

# for cell in row: #每行的每一個單元格

# print cell.value,

# 建立Excel表并寫入資料

wb = workbook.Workbook() # 建立Excel對象

ws = wb.active # 擷取目前正在操作的表對象

# 往表中寫入标題行,以清單形式寫入！

ws.append(['角色名字', '票數'])

src = 'http://www.lrts.me/book/category/3058'

getData(src)

wb.save('qinshi.xlsx') # 存入所有資訊後，儲存為filename.xlsx

執行：python demo.py

效果生成一個qinshi.xlsx檔案

python爬取的資料儲存到表格中_利用Python爬取的資料存入Excel表格

繼續閱讀

python爬取的資料儲存到表格中_Python3将爬取的資料存儲到Excel