AI Financial Investment: Download the prospectus of public REITs of the Shenzhen Stock Exchange in batches

2024-06-15 07:13:00

Open the prospectus page of the public REITs of the Shenzhen Stock Exchange, F12 checks the network, and finds the real address: https://reits.szse.cn/api/disc/announcement/annList?random=0.3555675437003616

AI Financial Investment: Download the prospectus of public REITs of the Shenzhen Stock Exchange in batches

{

"announceCount": 39,

"data": [

{

"id": "80bc99a7-8a04-4803-b42a-d9cca1e6c5d5",

"annId": 1220300147,

"title": "ChinaAMC Commercial REIT: Update on the Prospectus of ChinaAMC Commercial Asset Closed-end Infrastructure Securities Investment Fund",

"content": null,

"publishTime": "2024-06-08 00:00:00",

"attachPath": "/disc/disk03/finalpage/2024-06-08/a77d6a34-c4eb-4dcf-9b16-7c2ce856ebdd.PDF",

"attachFormat": "PDF",

"attachSize": 6265,

"secCode": [

"180601"

"secName": [

"ChinaAMC China Resources Commercial REIT"

"bondType": null,

"bigIndustryCode": null,

"bigCategoryId": null,

"smallCategoryId": null,

"channelCode": null,

"_index": "ows_disclosure-20180825"

返回的是json数据，PDF地址在这里："/disc/disk03/finalpage/2024-06-08/a77d6a34-c4eb-4dcf-9b16-7c2ce856ebdd.PDF",

打开下载页面，查看网站URL：https://disc.static.szse.cn/disc/disk03/finalpage/2024-06-08/a77d6a34-c4eb-4dcf-9b16-7c2ce856ebdd.PDF

Well, the first thing to add is "https://disc.static.szse.cn"

Enter the prompt word in Deepseek:

You are a Python programming expert and write a Python script, the specific steps are as follows:

Request URL:

https://reits.szse.cn/api/disc/announcement/annList?random=0.3555675437003616

Request Method:

POST

Status Codes:

200 OK

Remote Address:

58.251.50.138:443

Referrer Policy:

strict-origin-when-cross-origin

Request Load:

{"seDate":["",""],"channelCode":["reits-xxpl"],"bigCategoryId":["directions"],"pageSize":50,"pageNum":1}

Request Headers:

Accept:

application/json, text/javascript, */*; q=0.01

Accept-Encoding:

gzip, deflate, br, zstd

Accept-Language:

zh-CN,zh; q=0.9,en; q=0.8

Connection:

keep-alive

Content-Length:

104

Content-Type:

application/json

Host:

http://reits.szse.cn

Origin:

https://reits.szse.cn

Refer:

https://reits.szse.cn/disclosure/index.html

Sec-Ch-Ua:

"Google Chrome"; v="125", "Chromium"; v="125", "Not.A/Brand"; v="24"

Sec-Ch-Ua-Mobile:

Sec-Ch-Ua-Platform:

"Windows"

Sec-Fetch-Dest:

empty

Sec-Fetch-Mode:

Horns

Sec-Fetch-Site:

same-origin

User-Agent:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36

X-Request-Type:

ajax

X-Requested-With:

XMLHttpRequest

Get the response returned by the web page, which is a nested JSON data;

Navigate to the value corresponding to the "title" key under the "data" key, which is the title of the PDF file;

Locate the value corresponding to the "attachPath" key under the "data" key, which is the PDF file URL, and add "https://disc.static.szse.cn" in front of it to form a complete PDF download URL;

Download the PDF file and save it to the folder: F:\AI self-media content\AI stock trading\REITs

Note: Information is output at each step

The title of the PDF file may contain some special symbols that do not conform to the naming rules of the window system, and should be processed before renaming the PDF file.

For each next PDF file, randomly pause for 3-6 seconds;

Source:

import requests

import json

import os

import time

import random

import re

# Define the request URL and request header

url = "https://reits.szse.cn/api/disc/announcement/annList?random=0.3555675437003616"

headers = {

"Accept": "application/json, text/javascript, */*; q=0.01",

"Accept-Encoding": "gzip, deflate, br, zstd",

"Accept-Language": "zh-CN,zh; q=0.9,en; q=0.8",

"Connection": "keep-alive",

"Content-Type": "application/json",

"Host": "http://reits.szse.cn",

"Origin": "https://reits.szse.cn",

"Referer": "https://reits.szse.cn/disclosure/index.html",

"Sec-Ch-Ua": '"Google Chrome"; v="125", "Chromium"; v="125", "Not.A/Brand"; v="24"',

"Sec-Ch-Ua-Mobile": "?0",

"Sec-Ch-Ua-Platform": '"Windows"',

"Sec-Fetch-Dest": "empty",

"Sec-Fetch-Mode": "cors",

"Sec-Fetch-Site": "same-origin",

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",

"X-Request-Type": "ajax",

"X-Requested-With": "XMLHttpRequest"

}

# Define the payload of the request

payload = {

"seDate": ["", ""],

"channelCode": ["reits-xxpl"],

"bigCategoryId": ["directions"],

"pageSize": 50,

"pageNum": 1

}

# Send a POST request

response = requests.post(url, headers=headers, json=payload)

# Check the response status code

if response.status_code == 200:

print("请求成功，状态码: 200 OK")

else:

print(f"请求失败，状态码: {response.status_code}")

exit()

# Parsing JSON responses

data = response.json()

# Check if there is data

if "data" in data and isinstance(data["data"], list):

for item in data["data"]:

# Get PDF titles

pdf_title = item.get("title", "unknown_title")

print(f"PDF标题: {pdf_title}")

# Get the PDF URL

pdf_url = item.get("attachPath", "")

if pdf_url:

pdf_url = "https://disc.static.szse.cn" + pdf_url

print(f"PDF URL: {pdf_url}")

# Handle illegal characters in PDF titles

pdf_title = re.sub(r'[<>:"/\\|? *]', '_', pdf_title)

# Define the save path

save_path = f"F:\\AI自媒体内容\\AI炒股\\REITs\\{pdf_title}.pdf"

# Download PDF file

pdf_response = requests.get(pdf_url)

if pdf_response.status_code == 200:

with open(save_path, 'wb') as f:

f.write(pdf_response.content)

print(f"PDF文件已保存到: {save_path}")

else:

print(f"下载PDF文件失败，状态码: {pdf_response.status_code}")

# Random pause for 3-6 seconds

time.sleep(random.uniform(3, 6))

else:

print("No data found")

AI Financial Investment: Download the prospectus of public REITs of the Shenzhen Stock Exchange in batches

Read on