道客巴巴爬虫

2022-04-28 23:02:50

使用xpathhelp控件

import requests, re, json, pandas as pd, time
from selenium import webdriver  # selenium2.48.0 支持phantomjs
from lxml import etree
import time
import os, time
# 页 https://www.doc88.com/list-8308-0-1.html
# 文件  https://www.doc88.com/p-9139147359378.html
driver = webdriver.PhantomJS(executable_path=r'C:\Users\wang\Desktop\phantomjs-2.1.1-windows (1)\bin\phantomjs.exe')
file_urls_list=[]
for i in range(1,30,1):
    time.sleep(3)
    url = "https://www.doc88.com/list-8308-0-"+str(i)+"1.html"
    driver.get(url=url)
    tree = etree.HTML(driver.page_source)
    file_urls = tree.xpath(".//h3[@class='sd-type-title']/a/@href")
    file_urls=[ "https://www.doc88.com/"+str(i) for i in file_urls ]
    file_urls_list.extend(file_urls)
    print(file_urls)
with open("url.txt","w",encoding="utf-8") as f:
    for i in file_urls:
        if len(i)==len("https://www.doc88.com//p-7367816610215.html"):
            f.write(i)
            f.write("\n")
f.close()

Python django list tornado html

上一篇: Springboot 整合druid+mybatis+jta分布式事务+多数据源aop注解动态切换（一篇到位）-----接下来就是代码环节-----该篇就到此结束，若对你有帮助，给我留个言即可; 若有不对之处，欢迎指出交流。

下一篇: 【水果识别】基于计算机视觉实现橙子数量识别含Matlab源码

道客巴巴爬虫

使用xpathhelp控件

继续阅读

TestLink导出用例转换工具(XML2Excel)

YAML简介和PyYAML安全操作YAML支持的类型YAML的优点：yaml的基本语法python操作

Small tricks

403 Forbidden，You don't have permission to access / on this server.Forbidden

libsvm for python 安装

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

27. Remove Element(列表)题目代码

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】【数据处理】画多维数据分布图

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

Linux设备模型（中）之上层容器

在python中创建excel并写入