Open the prospectus page of the public REITs of the Shenzhen Stock Exchange, F12 checks the network, and finds the real address: https://reits.szse.cn/api/disc/announcement/annList?random=0.3555675437003616
{
"announceCount": 39,
"data": [
{
"id": "80bc99a7-8a04-4803-b42a-d9cca1e6c5d5",
"annId": 1220300147,
"title": "ChinaAMC Commercial REIT: Update on the Prospectus of ChinaAMC Commercial Asset Closed-end Infrastructure Securities Investment Fund",
"content": null,
"publishTime": "2024-06-08 00:00:00",
"attachPath": "/disc/disk03/finalpage/2024-06-08/a77d6a34-c4eb-4dcf-9b16-7c2ce856ebdd.PDF",
"attachFormat": "PDF",
"attachSize": 6265,
"secCode": [
"180601"
],
"secName": [
"ChinaAMC China Resources Commercial REIT"
],
"bondType": null,
"bigIndustryCode": null,
"bigCategoryId": null,
"smallCategoryId": null,
"channelCode": null,
"_index": "ows_disclosure-20180825"
},
返回的是json数据,PDF地址在这里:"/disc/disk03/finalpage/2024-06-08/a77d6a34-c4eb-4dcf-9b16-7c2ce856ebdd.PDF",
打开下载页面,查看网站URL:https://disc.static.szse.cn/disc/disk03/finalpage/2024-06-08/a77d6a34-c4eb-4dcf-9b16-7c2ce856ebdd.PDF
Well, the first thing to add is "https://disc.static.szse.cn"
Enter the prompt word in Deepseek:
You are a Python programming expert and write a Python script, the specific steps are as follows:
Request URL:
https://reits.szse.cn/api/disc/announcement/annList?random=0.3555675437003616
Request Method:
POST
Status Codes:
200 OK
Remote Address:
58.251.50.138:443
Referrer Policy:
strict-origin-when-cross-origin
Request Load:
{"seDate":["",""],"channelCode":["reits-xxpl"],"bigCategoryId":["directions"],"pageSize":50,"pageNum":1}
Request Headers:
Accept:
application/json, text/javascript, */*; q=0.01
Accept-Encoding:
gzip, deflate, br, zstd
Accept-Language:
zh-CN,zh; q=0.9,en; q=0.8
Connection:
keep-alive
Content-Length:
104
Content-Type:
application/json
Host:
http://reits.szse.cn
Origin:
https://reits.szse.cn
Refer:
https://reits.szse.cn/disclosure/index.html
Sec-Ch-Ua:
"Google Chrome"; v="125", "Chromium"; v="125", "Not.A/Brand"; v="24"
Sec-Ch-Ua-Mobile:
?0
Sec-Ch-Ua-Platform:
"Windows"
Sec-Fetch-Dest:
empty
Sec-Fetch-Mode:
Horns
Sec-Fetch-Site:
same-origin
User-Agent:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36
X-Request-Type:
ajax
X-Requested-With:
XMLHttpRequest
Get the response returned by the web page, which is a nested JSON data;
Navigate to the value corresponding to the "title" key under the "data" key, which is the title of the PDF file;
Locate the value corresponding to the "attachPath" key under the "data" key, which is the PDF file URL, and add "https://disc.static.szse.cn" in front of it to form a complete PDF download URL;
Download the PDF file and save it to the folder: F:\AI self-media content\AI stock trading\REITs
Note: Information is output at each step
The title of the PDF file may contain some special symbols that do not conform to the naming rules of the window system, and should be processed before renaming the PDF file.
For each next PDF file, randomly pause for 3-6 seconds;
Source:
import requests
import json
import os
import time
import random
import re
# Define the request URL and request header
url = "https://reits.szse.cn/api/disc/announcement/annList?random=0.3555675437003616"
headers = {
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Encoding": "gzip, deflate, br, zstd",
"Accept-Language": "zh-CN,zh; q=0.9,en; q=0.8",
"Connection": "keep-alive",
"Content-Type": "application/json",
"Host": "http://reits.szse.cn",
"Origin": "https://reits.szse.cn",
"Referer": "https://reits.szse.cn/disclosure/index.html",
"Sec-Ch-Ua": '"Google Chrome"; v="125", "Chromium"; v="125", "Not.A/Brand"; v="24"',
"Sec-Ch-Ua-Mobile": "?0",
"Sec-Ch-Ua-Platform": '"Windows"',
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
"X-Request-Type": "ajax",
"X-Requested-With": "XMLHttpRequest"
}
# Define the payload of the request
payload = {
"seDate": ["", ""],
"channelCode": ["reits-xxpl"],
"bigCategoryId": ["directions"],
"pageSize": 50,
"pageNum": 1
}
# Send a POST request
response = requests.post(url, headers=headers, json=payload)
# Check the response status code
if response.status_code == 200:
print("请求成功,状态码: 200 OK")
else:
print(f"请求失败,状态码: {response.status_code}")
exit()
# Parsing JSON responses
data = response.json()
# Check if there is data
if "data" in data and isinstance(data["data"], list):
for item in data["data"]:
# Get PDF titles
pdf_title = item.get("title", "unknown_title")
print(f"PDF标题: {pdf_title}")
# Get the PDF URL
pdf_url = item.get("attachPath", "")
if pdf_url:
pdf_url = "https://disc.static.szse.cn" + pdf_url
print(f"PDF URL: {pdf_url}")
# Handle illegal characters in PDF titles
pdf_title = re.sub(r'[<>:"/\\|? *]', '_', pdf_title)
# Define the save path
save_path = f"F:\\AI自媒体内容\\AI炒股\\REITs\\{pdf_title}.pdf"
# Download PDF file
pdf_response = requests.get(pdf_url)
if pdf_response.status_code == 200:
with open(save_path, 'wb') as f:
f.write(pdf_response.content)
print(f"PDF文件已保存到: {save_path}")
else:
print(f"下载PDF文件失败,状态码: {pdf_response.status_code}")
# Random pause for 3-6 seconds
time.sleep(random.uniform(3, 6))
else:
print("No data found")