2.摘 自: http://www.codeproject.com/KB/string/pdf2text.aspx

PDFBox is another Java PDF library. It is also ready to use with the original Java Lucene (see LucenePDFDocument).

Fortunately, there is a .NET version of PDFBox that is created using IKVM.NET (just download the PDFBox package, it's in the bin directory).

Using PDFBox in .NET requires adding references to:


and copying IKVM.Runtime.dll to the bin directory.

Using the PDFBox to parse PDFs is fairly easy:

private  static   string parseUsingPDFBox(string filename)


         PDDocument doc = PDDocument.load(filename);

         PDFTextStripper stripper = new PDFTextStripper();

         return stripper.getText(doc);




