java 複制 inputstream_java InputStream 複制

2023-06-24 10:22:56

tika 是一個解析文檔的工具箱，可以自己判别文檔種類，再用合适的jar包解析對應的文檔

今天遇到一個需求，把網絡上的檔案内容解析成資料。

寫爬蟲，解析頁面，對頁面進行處理，解析出檔案的url

接着就是檔案下載下傳到本地，對照tika 的demo 解析資料

可是檔案下載下傳到本地占用硬碟空間不說還會消耗磁盤io

既然需要資料隻要在記憶體中處理就好了

使用

inputStream in =response.getEntity

用Tika 解析檔案屬性和正文，但是tika 分兩步解析資料，任何一步都會改變inputStream

解決辦法一：

inputStream的複制

ByteArrayOutputStream baos = new ByteArrayOutputStream();

// Fake code simulating the copy

// You can generally do better with nio if you need...

// And please, unlike me, do something about the Exceptions :D

byte[] buffer = new byte[1024];

int len;

while ((len = entity.read(buffer)) > -1 ) {

baos.write(buffer, 0, len);

}

baos.flush();

// Open new InputStreams using the recorded bytes

// Can be repeated as many times as you wish

InputStream is1 = new ByteArrayInputStream(baos.toByteArray());

InputStream is2 = new ByteArrayInputStream(baos.toByteArray());

第二種：包裝inputStream 防止被關閉

InputStream is = null; is = getStream(); //obtain the stream CloseShieldInputStream csis = new CloseShieldInputStream(is); // call the bad function that does things it shouldn't badFunction(csis); // happiness follows: do something with the original input stream is.read();

java 複制 inputstream_java InputStream 複制

繼續閱讀

java 複制 inputstream_使用FileStream在Java中複制檔案

java 複制 inputstream,如何使Java中的InputStream的深層複制

java 複制 inputstream_關于對inputstream流的複制

java 複制 inputstream_Java實作inputstream流的複制代碼執行個體