天天看点

ajax-hook+ selenium抓取带参数的 Ajax 数据

环境

依赖安装

pip install flask-cors flask selenium      

安装chromedriver

mac下安装selenium+phantomjs+chromedriver

实现代码

1、hook.js

监听 XMLHttpRequest 请求

// 打开链接,复制代码到这里
// https://unpkg.com/[email protected]/dist/ajaxhook.min.js
// https://unpkg.com/axios/dist/axios.min.js


ah.proxy({
  //请求成功后进入
  onResponse: (response, handler) => {
    if (response.config.url.startsWith('/api/movie')) {
      axios.post('http://localhost:5000/receiver/movie', {
        url: window.location.href,
        data: response.response
      })
      console.log(response.response)
      handler.next(response)
    }
  }
})      

2、main.py

驱动chrome

# -*- coding: utf-8 -*-
from selenium import webdriver
import time

browser = webdriver.Chrome()
browser.get('https://dynamic2.scrape.center/')
browser.execute_script(open('hook.js').read())
time.sleep(2)

for index in range(3):
    print('current page', index)
    btn_next = browser.find_element_by_css_selector('.btn-next')
    btn_next.click()
    time.sleep(2)

browser.close()
browser.quit()      

3、server.py

接收数据的服务,可以进一步将数据存入数据库

# -*- coding: utf-8 -*-
import json
from flask import Flask, request, jsonify
from flask_cors import CORS

app = Flask(__name__)
CORS(app)


@app.route('/receiver/movie', methods=['POST'])
def receive():
    content = json.loads(request.data)
    print(content)
    # to something
    return jsonify({'status': True})


if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000, debug=True)      

参考

如何用 Hook 实时处理和保存 Ajax 数据