python爬取滬深所有股票資料并生成Excel檔案

爬取滬深所有股票資料并生成Excel檔案

一、分析需求

1、對于滬深兩市的各隻股票，擷取其：‘股票代碼’, ‘股票名稱’, ‘最高’, ‘最低’, ‘漲停’, ‘跌停’, ‘換手率’, ‘振幅’, '成交量’等資訊；

2、将擷取的資訊存放在Excel檔案中，股票資訊屬性作為表頭，每隻股票資訊作為表格的一行，每個單元格存放一個資訊。

#程式運作結果如下：

二、分析需求并設計算法

1、确定爬取的網站

　　選取網站的原則有以下三點：

　　①網站包含所有滬深股票資訊；

　　②網站robots協定允許非商業爬蟲；

　　③網站的源代碼是腳本語言，而非JavaScript；

　　綜上三點，我們最終選取網站為：股城網　　

2、選擇爬取的工具

　　本例使用python爬取資訊，并引用以下三方庫：

三方庫名稱	功能簡介及在本例中的作用
requests	用于http請求的子產品，可以擷取HTML；本例用于擷取股城網HTML
BeautifulSoup4	解析、周遊、維護“标簽樹”(例如html、xml等格式的資料對象)的功能庫；本例用于解析目标對象，獲得股票資訊。
re	有強大的正規表達式工具,允許你快速檢查給定字元串是否與給定的模式比對；本例用于查找比對股票代碼格式的字元串，提取股票代碼。
xlwt	支援python語言對Excel表格的操作；本例用于存儲爬取的資訊
time	time提供了一些處理日期和一天内時間的函數. 它是建立在 C 運作時庫的簡單封裝；本例用于計算程式運作耗費時間。

3、實作步驟

　　為實作需求，分以下三步實作：

　　①向爬取對象發送http請求，擷取HTML文本；

　　②擷取所有股票代碼，存入清單，将用于生成單隻股票URL。從股城網我們可以看到單隻股票的網頁URL格式為“https://hq.gucheng.com/股票代碼/”，如平安銀行的url為https://hq.gucheng.com/SZ000001/；　　③對每隻股票的網頁進行爬取，并解析網頁，将擷取的資訊存入字典中；

　　④将股票資訊存入TXT檔案中；

　　⑤将TXT檔案轉換為Excel。

#CrawGuchengStocks.py
import requests
from bs4 import BeautifulSoup
import re     #引入正規表達式庫，便于後續提取股票代碼
import xlwt   #引入xlwt庫，對Excel進行操作。
import time   #引入time庫，計算爬蟲總共花費的時間。
 

def getHTMLText(url, code="utf-8"):  #擷取HTML文本
    try:
        r = requests.get(url)
        r.raise_for_status()
        r.encoding = code
        return r.text
    except:
        return ""
        
def getStockList(lst, stockURL):          #擷取股票代碼清單
    html = getHTMLText(stockURL, "GB2312")
    soup = BeautifulSoup(html, 'html.parser') 
    a = soup.find_all('a')      #得到一個清單
    for i in a:
        try:
            href = i.attrs['href']       #股票代碼都存放在href标簽中
            lst.append(re.findall(r"[S][HZ]\d{6}", href)[0])
        except:
            continue
            
def getStockInfo(lst, stockURL, fpath):
    count = 0
    #lst = [item.lower() for item in lst]  股城網url是大寫,是以不用切換成小寫
    for stock in lst:
        url = stockURL + stock + "/"  #url為單隻股票的url 
        html = getHTMLText(url)       #爬取單隻股票網頁，得到HTML
        try:
            if html=="":              #爬取失敗，則繼續爬取下一隻股票
                continue
            infoDict = {}             #單隻股票的資訊存儲在一個字典中
            soup = BeautifulSoup(html, 'html.parser')  #單隻股票做一鍋粥
            stockInfo = soup.find('div',attrs={'class':'stock_top clearfix'})
    #在觀察股城網時發現，單隻股票資訊都存放在div的'class':'stock_top clearfix'中
            #在soup中找到所有标簽div中屬性為'class':'stock_top clearfix'的内容
            name = stockInfo.find_all(attrs={'class':'stock_title'})[0]
            #在stockInfo中找到存放有股票名稱和代碼的'stock_title'标簽
            infoDict["股票代碼"] = name.text.split("\n")[2] 
            infoDict.update({'股票名稱': name.text.split("\n")[1]})
           #對name以換行進行分割，得到一個清單，第1項為股票名稱，第2項為代碼
           #如果以空格股票名稱中包含空格，會産生異常，
           #如“萬 科A",得到股票名稱為萬，代碼為科A
    
            keyList = stockInfo.find_all('dt')
            valueList = stockInfo.find_all('dd')
            #股票資訊都存放在dt和dd标簽中，用find_all産生清單
            for i in range(len(keyList)):
                key = keyList[i].text
                val = valueList[i].text
                infoDict[key] = val
                #将資訊的名稱和值作為鍵值對，存入字典中
             
            with open(fpath, 'a', encoding='utf-8') as f:
                f.write( str(infoDict) + '\n' )
                #将每隻股票資訊作為一行輸入檔案中
                count = count + 1
                print("\r爬取成功，目前進度: {:.2f}%".format(count*100/len(lst)),end="")
        except:
            count = count + 1
            print("\r爬取失敗，目前進度: {:.2f}%".format(count*100/len(lst)),end="")
            continue
 
def get_txt(): #将爬取的資料儲存在TXT檔案中
    stock_list_url = 'https://hq.gucheng.com/gpdmylb.html'
    stock_info_url = 'https://hq.gucheng.com/'
    output_file = '\\檔案\\中大\\Python\\練習項目\\MOOC python爬蟲\\GuChengStockInfoTest.txt'
    slist=[]
    getStockList(slist, stock_list_url)
    getStockInfo(slist, stock_info_url, output_file)

def T_excel(file_name,path):   #将TXT檔案轉換為Excel檔案
    fo = open(file_name,"rt",encoding='utf-8') 
    file = xlwt.Workbook(encoding='utf-8', style_compression=0)
    #建立一個Workbook對象，這就相當于建立了一個Excel檔案。
    #Workbook類初始化時有encoding和style_compression參數
    #w = Workbook(encoding='utf-8')，就可以在excel中輸出中文了。
    sheet = file.add_sheet('stockinfo')
    line_num = 0    #初始行用來添加表頭
    
    #給Excel添加表頭
    title = ['股票代碼', '股票名稱', '最高', '最低', '今開', '昨收', 
             '漲停', '跌停', '換手率', '振幅', '成交量', '成交額',
             '内盤', '外盤', '委比', '漲跌幅', '市盈率(動)', '市淨率',
             '流通市值', '總市值']
    for i in range(len(title)):
        sheet.write(0, i, title[i])
        
    for line in fo:
        stock_txt = eval（line)
        #print(stock_txt)
        line_num += 1    #每周遊一行TXT檔案，line_num加一
        keys = []
        values = []
        for key,value in stock_txt.items():  
            #周遊字典項，并将鍵和值分别存入清單
            keys.append(key)
            values.append(value)
        #print(keys,values,len(values))
        for i in range(len(values)):
            #sheet.write(0, i, keys[i])
            sheet.write(line_num,i,values[i])  #在第line_num行寫入資料
            i = i+1
    file.save(path)   #将檔案儲存在path路徑。

def main():
    start = time.perf_counter()
    get_txt()
    txt = "\\檔案\\中大\\Python\\練習項目\\MOOC python爬蟲\\GuChengStockInfoTest.txt"
    excelname = '\\檔案\\中大\\Python\\練習項目\\MOOC python爬蟲\\GuChengStockInfoTest.xls'
    T_excel(txt,excelname)
    time_cost = time.perf_counter() - start
    print("爬取成功，檔案儲存路徑為:\n{}\n,共用時：{:.2f}s".format(excelname,time_cost))
    
main()

python爬取滬深所有股票資料并生成Excel檔案

爬取滬深所有股票資料并生成Excel檔案

一、分析需求

二、分析需求并設計算法

繼續閱讀

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

403 Forbidden，You don't have permission to access / on this server.Forbidden

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入