【python】東方财富網擷取股東減持資訊

2023-05-23 02:12:54

目标：為了擷取東方财富網的上市公司減持資訊，檢視了下資訊頁面，屬于單頁面的資訊，不需要使用scrapy，甚至于beautifulsoup也不用使用，隻需要檢視頁面重新整理資料時候通路的url即可。比起巨潮，東方财富網更簡單的使用了get方法，并且參數使用都在頁面加載傳回的一個js裡寫的非常清楚，感覺就是為了友善爬取。。。同時，對參數pagenum都沒有做限制，更加友善擷取資料。唯一的難點在編解碼，并不清楚為什麼按照頁面編碼格式進行轉換依舊會出現問題。

# -*- coding: GB2312 -*-
import requests
from bs4 import BeautifulSoup
import csv
import time
import codecs

#擷取資料
def getHTML(url):
    response = requests.get(url)
	print(response.apparent_encoding) #傳回頁面編碼：GB2312
	#單用GB2312也可以好像
    try:
        print(response.text.encode('GB2312').decode('GB2312'))
    except:
        print(response.text.encode('utf-8').decode('utf-8'))
    return response.text
	
#處理資料	
def writeFile(writer,res):
	#傳回的資料：var TbrNdpvg={pages:401,data:[]},"url":""}
    startid = res.find("[")
    endid = res.find("]")
    print(startid)
    print(endid)
    new_content = res[startid + 1:endid - 1]
    # print(content_txt[startid:endid])
    list = new_content.split("\",")
    print(len(list))
    new_list = []
    for i in list:
        i = i.replace('"', '')
        new_list.append(i)
    print(new_list)

    m = len(new_list)
    for i in range(m):
        # print(type(new_list[i]))
        try:
            new_list[i].encode('GB2312').decode('GB2312')

        except:
            new_list[i].encode('utf-8').decode('utf-8')

        new_list_i = new_list[i].split(',')
        new_list_i[0] = new_list_i[0] + '\t'   #寫csv，不允許科學計算

        print(type(new_list_i))
        print(new_list_i)
        writer.writerow(new_list_i)

for i in range(40):
    url = "http://data.eastmoney.com/DataCenter_V3/gdzjc.ashx?pagesize=500&page="+str(i)+"&js=var%20UWExJjvK¶m=&sortRule=-1&sortType=BDJZ&tabid=jjc&code=&name=&rt=50815994"
    print(url)
    res = getHTML(url)
    time.sleep(1)   #每隔一秒通路
    csvFile = codecs.open('C:\yangnan\csvFile2.csv', 'a', encoding='utf8')  # codecs.open() 防止寫入亂碼
    writer = csv.writer(csvFile)
    writeFile(writer,res)

【python】東方财富網擷取股東減持資訊

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入