Java讀取pdf執行個體

2021-11-08 02:44:18

java讀取pdf的一個執行個體

1：請下載下傳附件的jar包，

2：将jar包導入到工程下

編寫類如下：

import java.io.file;

import java.io.fileoutputstream;

import java.io.outputstreamwriter;

import java.io.writer;

import java.net.malformedurlexception;

import java.net.url;

import org.pdfbox.pdmodel.pddocument;

import org.pdfbox.util.pdftextstripper;

public class analyzapdffile {

public void readpdf(string file) throws exception {

// 是否排序

boolean sort = false;

// pdf檔案名

string pdffile = file;

// 輸入文本檔案名稱

string textfile = null;

// 編碼方式

string encoding = "utf-8";

// 開始提取頁數

int startpage = 1;

// 結束提取頁數

int endpage = integer.max_value;

// 檔案輸入流，生成文本檔案

writer output = null;

// 記憶體中存儲的pdf document

pddocument document = null;

try {

// 首先裝載傳過來的檔案

url url = new url(pdffile); //注意參數是file而不是url。

document = pddocument.load(pdffile);

// 擷取pdf的檔案名

string filename = url.getfile();

// 以原來pdf的名稱來命名新産生的txt檔案

if (filename.length() > 4) {

file outputfile = new file(filename.substring(0, filename.length() - 4) + ".txt");

textfile = outputfile.getname();

}

} catch (malformedurlexception e) {

// 如果作為url裝載得到異常則從檔案系統裝載 //注意參數已不是以前版本中的url.而是file。

if (pdffile.length() > 4) { textfile = pdffile.substring(0, pdffile.length() - 4)+ ".txt";

}

// 檔案輸入流，寫入檔案倒textfile

output = new outputstreamwriter(new fileoutputstream(textfile),

encoding);

// pdftextstripper來提取文本

pdftextstripper stripper = null;

stripper = new pdftextstripper();

// 設定是否排序

stripper.setsortbyposition(sort);

// 設定起始頁

stripper.setstartpage(startpage);

// 設定結束頁

stripper.setendpage(endpage);

// 調用pdftextstripper的writetext提取并輸出文本

stripper.writetext(document, output);

} finally {

if (output != null) {

// 關閉輸出流

output.close();

if (document != null) {

// 關閉pdf document

document.close();

}

} /**

* @param args

* @throws exception

public static void main(string[] args) throws exception {

// todo auto-generated method stub

analyzapdffile analyzapdffile = new analyzapdffile();

// 擷取e盤下的a.pdf的内容

analyzapdffile.readpdf("e:\\a.pdf");

}

執行結束後，你會發現e盤下多出a.txt，這個檔案就是通過pdf的檔案内容

Java讀取pdf執行個體

繼續閱讀

關于Gradle配置的小結

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method