1.chrome浏覽器 headless模式vb.net教程下如何跳過webdriver檢測?
環境:
1.selenium-java
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>3.4.0</version>
</dependency>
1.問題描述:
當使用webdriver驅動chrome headless時,若被c#教程識别出來為webdriver時,則爬蟲無法繼續采集資料,那麼該如何跳過浏覽器檢測繼續采集資料?
2.如何識别浏覽器為webdriver?
a. 在Chrome控制台輸入:window.navigator.webdriver,如何是webdriver則為true,否則為undefined
b. 在Java代碼中,隻要初始化webdriver的參python基礎教程數中帶 enable-automation,headless,remote-debugging-pipe 中的任意一個參數,就會将AutomationControlledEnabled 設定為true,然後 navigator.h 就會設定webdriver為true
ChromeOptions options = new ChromeOptions();
String[] a = { "enable-automation" };
options.setExperimentalOption("excludeSwitches", a);
options.addArguments("--headless");
c. 浏覽器中的window.navigator.webdriver值來java基礎教程自于navigator.h中的webdriver()方法,當AutomationControlledEnabled為true則webdriver=true
參考chromium的源代碼: https://github.com/chromium/chromium/blob/d7da0240cae77824d1eda25745c4022757499131/third_party/blink/renderer/core/frame/navigator.h
bool webdriver() const {
return RuntimeEnabledFeatures::AutomationControlledEnabled();
}
d. AutomationControlledEnabled什麼時sql教程候設定成true?
參考chromium的源代碼: https://github.com/chromium/chromium/blob/d7da0240cae77824d1eda25745c4022757499131/content/child/runtime_features.cc
隻要啟動參數帶EnableAutomation,Headless,RemoteDebuggingPipe就會标志位AutomationControlled
{wrf::EnableAutomationControlled, switches::kEnableAutomation, true},
{wrf::EnableAutomationControlled, switches::kHeadless, true},
{wrf::EnableAutomationControlled, switches::kRemoteDebuggingPipe, true},
3.如何跳過浏覽器webdriver檢測?
a. 第一種方式:修改navigator.h 将webdriver改為false, 編譯自己的chromium,這種可以從根本上解決問題.
b. 第二種方式:執行cdp指令,修改webdriver的值為undefined .但是selenium-java-3.4.0版本不支援executeCdpCommand方法.這個時候就需要定制自己的ChromiumDriver,添加executeCdpCommand方法
ChromiumDriver driver = new ChromiumDriver(chromeCaps);
HashMap<String, Object> cdpCmd = new HashMap<String, Object>();
cdpCmd.put("source", "Object.defineProperty(navigator, 'webdriver', {get: () => undefined }); ");
driver.executeCdpCommand("Page.addScriptToEvaluateOnNewDocument", cdpCmd);
JS指令:Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
參考: https://www.cnblogs.com/scholarscholar/p/14364822.html
https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-addScriptToEvaluateOnNewDocument
c.第二種方式:更新selenium-java到beta版本,selenium-java-4.0.0-beta版本支援executeCdpCommand方法,但是更新selenium-java-4.0.0會有很多依賴錯誤需要處理.
<!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java -->
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>4.0.0-beta-4</version>
</dependency>
4.selenium-java-3.4.0版本不支援executeCdpCommand方法,定制自己的ChromiumDriver,添加executeCdpCommand方法
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>3.4.0</version>
</dependency>
![](https://img.laitimes.com/img/__Qf2AjLwojIjJCLyojI0JCLicmbw5iYhZjMzQ2NhRmYhFTYmNjZ4ATZmljZ2UWNhJGNwMDO08CX0JXZ252bj91Ztl2Lc52YucWbp5GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.png)
package com.xxx.selenium;
import java.util.Map;
import org.openqa.selenium.Capabilities;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriverService;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.remote.CommandExecutor;
import org.openqa.selenium.remote.RemoteWebDriver;
import com.google.common.collect.ImmutableMap;
public class ChromiumDriver extends RemoteWebDriver {
public ChromiumDriver(Capabilities capabilities) {
this(new ChromiumDriverCommandExecutor("goog", ChromeDriverService.createDefaultService()), capabilities, ChromeOptions.CAPABILITY);
}
protected ChromiumDriver(CommandExecutor commandExecutor, Capabilities capabilities, String capabilityKey) {
super(commandExecutor, capabilities);
}
/**
* Launches Chrome app specified by id.
*
* @param id Chrome app id.
*/
public void launchApp(String id) {
execute(ChromiumDriverCommand.LAUNCH_APP, ImmutableMap.of("id", id));
}
/**
* Execute a Chrome Devtools Protocol command and get returned result. The
* command and command args should follow
* <a href="https://chromedevtools.github.io/devtools-protocol/" target="_blank" rel="external nofollow" >chrome devtools
* protocol domains/commands</a>.
*/
public Map<String, Object> executeCdpCommand(String commandName, Map<String, Object> parameters) {
@SuppressWarnings("unchecked")
Map<String, Object> toReturn = (Map<String, Object>) getExecuteMethod().execute(ChromiumDriverCommand.EXECUTE_CDP_COMMAND,
ImmutableMap.of("cmd", commandName, "params", parameters));
return ImmutableMap.copyOf(toReturn);
}
@Override
public void quit() {
super.quit();
}
}
package com.xxx.selenium;
/**
* Constants for the ChromiumDriver specific command IDs.
*/
final class ChromiumDriverCommand {
private ChromiumDriverCommand() {}
static final String LAUNCH_APP = "launchApp";
static final String GET_NETWORK_CONDITIONS = "getNetworkConditions";
static final String SET_NETWORK_CONDITIONS = "setNetworkConditions";
static final String DELETE_NETWORK_CONDITIONS = "deleteNetworkConditions";
static final String EXECUTE_CDP_COMMAND = "executeCdpCommand";
// Cast Media Router APIs
static final String GET_CAST_SINKS = "getCastSinks";
static final String SET_CAST_SINK_TO_USE = "selectCastSink";
static final String START_CAST_TAB_MIRRORING = "startCastTabMirroring";
static final String GET_CAST_ISSUE_MESSAGE = "getCastIssueMessage";
static final String STOP_CASTING = "stopCasting";
static final String SET_PERMISSION = "setPermission";
}
package com.xxx.selenium;
import static java.util.Collections.unmodifiableMap;
import java.util.HashMap;
import java.util.Map;
import org.openqa.selenium.remote.CommandInfo;
import org.openqa.selenium.remote.http.HttpMethod;
import org.openqa.selenium.remote.service.DriverCommandExecutor;
import org.openqa.selenium.remote.service.DriverService;
/**
* {@link DriverCommandExecutor} that understands ChromiumDriver specific commands.
*
* @see <a href="https://chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/client/command_executor.py" target="_blank" rel="external nofollow" >List of ChromeWebdriver commands</a>
*/
public class ChromiumDriverCommandExecutor extends DriverCommandExecutor {
private static Map<String, CommandInfo> buildChromiumCommandMappings(String vendorKeyword) {
String sessionPrefix = "/session/:sessionId/";
String chromiumPrefix = sessionPrefix + "chromium";
String vendorPrefix = sessionPrefix + vendorKeyword;
HashMap<String, CommandInfo> mappings = new HashMap<>();
mappings.put(ChromiumDriverCommand.LAUNCH_APP,
new CommandInfo(chromiumPrefix + "/launch_app", HttpMethod.POST));
String networkConditions = chromiumPrefix + "/network_conditions";
mappings.put(ChromiumDriverCommand.GET_NETWORK_CONDITIONS,
new CommandInfo(networkConditions, HttpMethod.GET));
mappings.put(ChromiumDriverCommand.SET_NETWORK_CONDITIONS,
new CommandInfo(networkConditions, HttpMethod.POST));
mappings.put(ChromiumDriverCommand.DELETE_NETWORK_CONDITIONS,
new CommandInfo(networkConditions, HttpMethod.DELETE));
mappings.put( ChromiumDriverCommand.EXECUTE_CDP_COMMAND,
new CommandInfo(vendorPrefix + "/cdp/execute", HttpMethod.POST));
// Cast / Media Router APIs
String cast = vendorPrefix + "/cast";
mappings.put(ChromiumDriverCommand.GET_CAST_SINKS,
new CommandInfo(cast + "/get_sinks", HttpMethod.GET));
mappings.put(ChromiumDriverCommand.SET_CAST_SINK_TO_USE,
new CommandInfo(cast + "/set_sink_to_use", HttpMethod.POST));
mappings.put(ChromiumDriverCommand.START_CAST_TAB_MIRRORING,
new CommandInfo(cast + "/start_tab_mirroring", HttpMethod.POST));
mappings.put(ChromiumDriverCommand.GET_CAST_ISSUE_MESSAGE,
new CommandInfo(cast + "/get_issue_message", HttpMethod.GET));
mappings.put(ChromiumDriverCommand.STOP_CASTING,
new CommandInfo(cast + "/stop_casting", HttpMethod.POST));
mappings.put(ChromiumDriverCommand.SET_PERMISSION,
new CommandInfo(sessionPrefix + "/permissions", HttpMethod.POST));
return unmodifiableMap(mappings);
}
public ChromiumDriverCommandExecutor(String vendorPrefix, DriverService service) {
super(service, buildChromiumCommandMappings(vendorPrefix));
}
}
package com.xxx.selenium;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
import org.openqa.selenium.Proxy;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.remote.DesiredCapabilities;
public class DriverUtil {
/**
* 擷取可以執行cdp指令的ChromiumDriver,可以繞過 webdriver檢測
* 1.https://intoli.com/blog/not-possible-to-block-chrome-headless/
* 2.https://intoli.com/blog/making-chrome-headless-undetectable/
* 3.https://github.com/chromium/chromium/blob/d7da0240cae77824d1eda25745c4022757499131/third_party/blink/renderer/core/frame/navigator.h
* @param request
* @return
*/
public ChromiumDriver getChromiumDriver() {
// 設定谷歌浏覽器驅動,我放在項目的路徑下,這個驅動可以幫你打開本地的谷歌浏覽器
String driverFilePath = "谷歌浏覽器驅動位址";
if (!StringUtils.isEmpty(driverFilePath)){
System.setProperty("webdriver.chrome.driver", driverFilePath);
}
// 設定對谷歌浏覽器的初始配置 開始
HashMap<String, Object> prefs = new HashMap<String, Object>();
ChromeOptions options = new ChromeOptions();
options.setExperimentalOption("prefs", prefs);
String[] a = { "enable-automation" };
options.setExperimentalOption("excludeSwitches", a);
options.addArguments("--headless");
options.addArguments("window-size=1920,1080");
String ua="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36";
options.addArguments(String.format("--user-agent=%s", ua));
DesiredCapabilities chromeCaps = DesiredCapabilities.chrome();
chromeCaps.setCapability(ChromeOptions.CAPABILITY, options);
//執行cdp指令,修改webdriver的值為undefined
ChromiumDriver driver = new ChromiumDriver(chromeCaps);
HashMap<String, Object> cdpCmd = new HashMap<String, Object>();
cdpCmd.put("source", "Object.defineProperty(navigator, 'webdriver', {get: () => undefined }); ");
driver.executeCdpCommand("Page.addScriptToEvaluateOnNewDocument", cdpCmd);
return driver;
}
![](https://img.laitimes.com/img/__Qf2AjLwojIjJCLyojI0JCLicmbw5iYhZjMzQ2NhRmYhFTYmNjZ4ATZmljZ2UWNhJGNwMDO08CX0JXZ252bj91Ztl2Lc52YucWbp5GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.png)