天天看点

java jsoup解析url_java – Jsoup.parse()与Jsoup.parse() – 或者如何在Jsoup中使用URL检测?...

它用于其他

Element#absUrl(),以便您可以检索< a href>,< img src>,< link href>,< script src>等的(预期)绝对URL.

for (Element link : document.select("a")) {

System.out.println(link.absUrl("href"));

}

如果您还想下载和/或解析链接的资源,这非常有用.

In the 2nd parse() version, what does “resolve relative URLs to absolute URLs, that occur before the HTML declares a tag” mean? What if a tag never occurs in the page?

一些(不良)网站可能已声明< link>或者< script>在< base>之前使用相对URL标签.或者如果没有< base>的方法.标签,然后只是给定的baseUri将用于解析整个文档的相对URL.

What is the purpose of absolute URL detection? Why does Jsoup need to find the absolute URL?

为了在Element#absUrl()上返回正确的URL.这纯粹是为了最终用户的便利. Jsoup不需要它来成功解析HTML.

Lastly, but most importantly: Is baseUri the full URL of HTML page (as phrased in original documentation) or is it the base URL of the HTML page?

前者.如果是后者,那么文件就会撒谎. baseUri不得与< base href>混淆.