python爬虫 requests.get()返回值与html网页不一致

2023-06-08 01:06:36

写爬虫时，需要的html和用requests.get返回的html不一样导致后面用bs老出错

requests.get()获取不到正确的源代码HTML

# 1. 获取网页数据
url = 'https://movie.douban.com/top250'
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'
}
response = requests.get(url, headers=headers)

# 2. 解析数据
soup = BeautifulSoup(response.text, 'lxml')

这个不行

# 指定要爬取的网站
    url = 'http://www.360doc.com/index.html?type=36&classid=19'
    soup = getsoup(url)
    print(soup)
    # 错了这么多，soup中竟没有
    imgList =soup.select('.c5_ul3>li') # 上下两标签内容 .class名>下级 标签

试了下下面的网址不行，更换headers一样不同：

#获取网页数据
url = 'https://movie.douban.com/tv'
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'
}
response = requests.get(url, headers=headers)

# 2. 解析数据
soup = BeautifulSoup(response.text, 'lxml')

这个库，没看出来为什么，有的网页可以，有的却是错的

python爬虫 requests.get()返回值与html网页不一致

继续阅读

无法解析的外部符号 wmain，该符号在函数 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink导出用例转换工具(XML2Excel)

YAML简介和PyYAML安全操作YAML支持的类型YAML的优点：yaml的基本语法python操作

Small tricks

libsvm for python 安装

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

27. Remove Element(列表)题目代码

sort()函数到底是怎样进行数字排序的

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】【数据处理】画多维数据分布图

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

在python中创建excel并写入