實作網絡圖檔爬蟲，隻需5秒快速把整個網頁上的圖檔全下載下傳打包zip

我們經常需要用到網際網路上的一些共享資源，圖檔就是資源的一種，怎麼把網頁上的圖檔批量下載下傳下來？有時候我們需要把網頁上的圖檔下載下傳下來，但網頁上圖檔那麼多，怎麼下載下傳我們想要的東西呢，如果這個網頁都是我們想要的圖檔,難道我們要一點一點一張一張右鍵下載下傳嗎？當然不好，這裡提供一段Java實作的網絡爬蟲抓圖檔代碼,程式員同志有喜歡的記得收藏哦，

這個工具我已經釋出了，位址就是：

http://www.yzcopen.com/img/imgdown

材料：必須會java開發，用到的核心jar Jsoup自己去網上下載下傳很多。

以下是我已經實作的界面化的抓取圖檔的線上工具，有興趣的朋友可以按照圖檔位址打開看看

下圖是抓取效果網絡上随便找第一個美女圖檔網站

這個是要抓取的網站的主界面：

這裡是抓取的結果已經到我本地電腦了

下面是實作代碼：

/**

*模拟使用者請求

public final static String UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.26 Safari/537.36 Core/1.63.6821.400

QQBrowser/10.3.3040.400";

*抓取全部圖檔位址備注：zfilepath是zip檔案路徑 url是網頁位址 pp是img的其中屬性一般是src即可

public static boolean getImgSrc(String zfilepath,String url,String pp){

boolean isb =false;

// 利用Jsoup獲得連接配接

Connection connect = Jsoup.connect(url).timeout(5000);

connect.header("Connection", "Keep-Alive");

connect.header("Content-Type", "application/x-www-form-urlencoded");

connect.header("Accept-Encoding", "gzip, deflate, sdch");

connect.header("Accept", "*/*");

connect.header("User-Agent",Const.UserAgent);

ZipOutputStream out = null;

try {

// 得到Document對象

Document document = connect.ignoreContentType(true).timeout(5000).get();

// 查找所有img标簽

Elements imgs = document.getElementsByTag("img");

File zipfile = new File(zfilepath);

out=new ZipOutputStream(new FileOutputStream(zipfile));

int i=1;

List<String> listimg = new ArrayList<String>();

for (Element element : imgs) {

//擷取每個img标簽URL "abs:"表示絕對路徑

String imgSrc = element.attr("abs:"+pp);

listimg.add(imgSrc);

}

listimg = removeCf(listimg);

if(listimg!=null && listimg.size()>0){

for(int x=0;x<listimg.size();x++){

long stime = System.currentTimeMillis();

String imgSrc =listimg.get(x);

// 列印URL

System.out.println(imgSrc);

//下載下傳圖檔到本地

boolean is = downImages(imgSrc,out);

long etime = System.currentTimeMillis();

float alltime = (float)(etime - stime)/1000;

Map<String,String> rest = new HashMap<String,String>();

rest.put("img",imgSrc);

rest.put("time",(alltime)+"");

rest.put("num",i+"");

rest.put("status","true");

if(is){

rest.put("http","成功");

}else{

rest.put("http","失敗");

i++;

Map<String,String> rest1 = new HashMap<String,String>();

rest1.put("status","true");

rest1.put("msg","打包完成");

System.out.println("下載下傳完成");

isb =true;

rest1.put("msg","未抓取到資料，有可能反爬蟲了");

client.sendEvent("chatevent", rest1);

} catch (IOException e) {

e.printStackTrace();

rest.put("status","false");

} catch (InterruptedException e) {

// TODO Auto-generated catch block

}finally{

if(out!=null){

out.close();

return isb;

* 下載下傳圖檔到指定目錄

* @param filePath 檔案路徑

* @param imgUrl 圖檔URL

public static boolean downImages(/*String filePath,*/ String imgUrl,ZipOutputStream outStream) {

boolean is = false;

// 若指定檔案夾沒有，則先建立

/* File dir = new File(filePath);

if (!dir.exists()) {

dir.mkdirs();

}*/

// 截取圖檔檔案名

String fileName = imgUrl.substring(imgUrl.lastIndexOf('/') + 1, imgUrl.length());

// 檔案名裡面可能有中文或者空格，是以這裡要進行處理。但空格又會被URLEncoder轉義為加号

String urlTail = URLEncoder.encode(fileName, "UTF-8");

// 是以要将加号轉化為UTF-8格式的%20

imgUrl = imgUrl.substring(0, imgUrl.lastIndexOf('/') + 1) + urlTail.replaceAll("\+", "\%20");

* 驗證圖檔格式保證擷取動态圖檔

fileName = vidImg(fileName);

if(fileName.equals("")){

return is;

} catch (UnsupportedEncodingException e) {

// 寫出的路徑

InputStream in = null;

// 擷取圖檔URL

URL url = new URL(imgUrl);

// 獲得連接配接

HttpURLConnection connection = (HttpURLConnection) url.openConnection();

connection.setRequestProperty("User-Agent",Const.UserAgent);

// 設定10秒的相應時間

connection.setConnectTimeout(10 * 1000);

// 獲得輸入流

in = connection.getInputStream();

byte[] data=readInputStream(in);

outStream.putNextEntry(new ZipEntry(fileName));

outStream.write(data);

is = true;

} catch (MalformedURLException e) {

} catch (Exception e) {

outStream.closeEntry();

in.close();

* 去除重複的圖檔

* @param list

* @return

public static List<String> removeCf(List<String> list){

List<String> listTemp = new ArrayList<String> ();

for(int i=0;i<list.size();i++){

if(!listTemp.contains(list.get(i))){

listTemp.add(list.get(i));

return listTemp;

喜歡的記得收藏哦

實作網絡圖檔爬蟲，隻需5秒快速把整個網頁上的圖檔全下載下傳打包zip

繼續閱讀

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

面試題解析：你接口測試是怎麼做的？

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method