laitimes

python uses Selenium to crawl movie paradise to achieve movie freedom

author:13 Demon Studios

In the previous issues, we realized the freedom of novels and music, of course, the freedom of film and television is indispensable, and in this issue, we will take a look at how to use python to achieve film freedom.

First of all, the old way is to open the website of Movie Paradise, right-click to check the source code of the website

python uses Selenium to crawl movie paradise to achieve movie freedom

After analysis, I secretly rejoiced, following the previous routine, directly requesting requests, xpth analysis, and the data was not ready to be caught, and it was done.

import requests

url="https://www.dygod.net/html/gndy/china/"
headers={
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",
"cookie": "guardret=UAQF; guardret=C1c=; Hm_lvt_93b4a7c2e07353c3853ac17a86d4c8a4=1710422041; Hm_lvt_0113b461c3b631f7a568630be1134d3d=1710422041; Hm_lvt_8e745928b4c636da693d2c43470f5413=1710422041; guard=b45b4fbfCWp691; Hm_lpvt_93b4a7c2e07353c3853ac17a86d4c8a4=1710508766; Hm_lpvt_0113b461c3b631f7a568630be1134d3d=1710508766; Hm_lpvt_8e745928b4c636da693d2c43470f5413=1710508766"
}
resp=requests.get(url,headers=headers) 
print(resp.text)           

This time, even the user-agent and cookies should be fine, but I'm dumbfounded when it runs.

python uses Selenium to crawl movie paradise to achieve movie freedom

The website directly returned a javascript script, and the website did an anti-crawl, and the data could not be obtained......

Since the data is not returned to us, let's change our thinking and use the WYSIWYG function of Selenium to open the website directly to obtain the relevant data.

Let's check the Chrome version first: enter chrome://version/ in the chrome address bar.

You can check the chrome version and download the driver according to the corresponding version

https://chromedriver.storage.googleapis.com/index.html

Install Selenium

pip install selenium           

Use find_elements method to get the corresponding element value.

find_elements(By.XPATH,'//*[@id="header"]/div/div[3]/div[4]/div[2]/div[2]/div[2]/ul/table/tbody/tr[2]/td[2]/b/a')           

The result is as follows:

python uses Selenium to crawl movie paradise to achieve movie freedom

We use the same method to obtain the download link of the obtained movie detail page in a loop.

python uses Selenium to crawl movie paradise to achieve movie freedom

Here are the codes to utilize Selenium to get movies:

from selenium import webdriver
from selenium.webdriver.common.by import By
browser = webdriver.Chrome()
chrome_options = webdriver.ChromeOptions()
# 把允许提示这个弹窗关闭
prefs = {"profile.default_content_setting_values.notifications": 2}
chrome_options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(options=chrome_options)
url = 'https://www.dygod.net/html/gndy/china/'
driver.get(url)

# 返回所有的元素 find_elements
contents=driver.find_elements(By.XPATH,'//*[@id="header"]/div/div[3]/div[4]/div[2]/div[2]/div[2]/ul/table/tbody/tr[2]/td[2]/b/a')
#print(contents)
for i in contents:
    print(i.text, i.get_attribute('href'))           

Students can try to write down the code to get the download link, save all the links to a txt file, and use Thunderbolt's batch download function to download. As long as your hard drive is big enough, the world can fit it!

Okay, we'll see you next time.

Read on