天天看點

利用java實作doc轉換pdf

word目前應該是現在最主流的編輯軟體了吧,基本每個人都會用到,功能也十分強大,應用人群廣泛,但是他也存在一些問題,比如,不同軟體或者不同作業系統之間傳輸時,格式會發生變化,這種變化很讓人惱火。是以現在越來越多的人把word轉換成pdf格式檔案,以保證檔案格式不發生變化。

如果隻是1個Word檔案轉換成Pdf檔案,簡直so easy;10個Word檔案轉換成pdf檔案,雖煩躁,但能忍;如果是将1000個word檔案轉換成pdf檔案呢?這會估計一股無名之火直沖天靈蓋,立馬想摔電腦的沖動都有了。

是以今天突發奇想,想試試是否可以通過程式将docx批量轉換成pdf文檔,通過參考Apache poi java庫以及docx4j元件,于是選擇以docx4j元件來進行文檔操作。

話不多說,開始幹:

一、下載下傳依賴

docx4j所有的依賴jar包使用maven去處理還是蠻簡潔的:

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-JAXB-Internal</artifactId>
    <version>8.2.4</version>
</dependency>
<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j-export-fo</artifactId>
    <version>8.2.4</version>
</dependency>
           

二、代碼實作

package com.convert.test;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;

import org.docx4j.Docx4J;
import org.docx4j.fonts.IdentityPlusMapper;
import org.docx4j.fonts.Mapper;
import org.docx4j.fonts.PhysicalFonts;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;


public class ConvertTest {
    
    public static void main(String[] args) {
        
        word2pdf("D:\\tran\\2.doc", "D:\\tran\\2.pdf");
        
    }
    
    public static void word2pdf(String source, String target) {
        
        try {
            WordprocessingMLPackage pkg = Docx4J.load(new File(source));

            Mapper fontMapper = new IdentityPlusMapper();
            fontMapper.put("隸書", PhysicalFonts.get("LiSu"));
            fontMapper.put("宋體", PhysicalFonts.get("SimSun"));
            fontMapper.put("微軟雅黑", PhysicalFonts.get("Microsoft Yahei"));
            fontMapper.put("黑體", PhysicalFonts.get("SimHei"));
            fontMapper.put("楷體", PhysicalFonts.get("KaiTi"));
            fontMapper.put("新宋體", PhysicalFonts.get("NSimSun"));
            fontMapper.put("華文行楷", PhysicalFonts.get("STXingkai"));
            fontMapper.put("華文仿宋", PhysicalFonts.get("STFangsong"));
            fontMapper.put("仿宋", PhysicalFonts.get("FangSong"));
            fontMapper.put("幼圓", PhysicalFonts.get("YouYuan"));
            fontMapper.put("華文宋體", PhysicalFonts.get("STSong"));
            fontMapper.put("華文中宋", PhysicalFonts.get("STZhongsong"));
            fontMapper.put("等線", PhysicalFonts.get("SimSun"));
            fontMapper.put("等線 Light", PhysicalFonts.get("SimSun"));
            fontMapper.put("華文琥珀", PhysicalFonts.get("STHupo"));
            fontMapper.put("華文隸書", PhysicalFonts.get("STLiti"));
            fontMapper.put("華文新魏", PhysicalFonts.get("STXinwei"));
            fontMapper.put("華文彩雲", PhysicalFonts.get("STCaiyun"));
            fontMapper.put("方正姚體", PhysicalFonts.get("FZYaoti"));
            fontMapper.put("方正舒體", PhysicalFonts.get("FZShuTi"));
            fontMapper.put("華文細黑", PhysicalFonts.get("STXihei"));
            fontMapper.put("宋體擴充", PhysicalFonts.get("simsun-extB"));
            fontMapper.put("仿宋_GB2312", PhysicalFonts.get("FangSong_GB2312"));
            fontMapper.put("新細明體", PhysicalFonts.get("SimSun"));
            pkg.setFontMapper(fontMapper);

            Docx4J.toPDF(pkg, new FileOutputStream(target));
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (Docx4JException e) {
            e.printStackTrace();
        } catch (Exception e) {
            e.printStackTrace();
        }
        
    }
    
    
}           

三、轉換結果

SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Using pdbs 420=7mm
Using pdbs 420=7mm           

有一點報錯,不過并不影響pdf的生成,打開生成的pdf,内容也是完整的。算是完成了吧,隻要再寫一個for循環,去周遊所有的文檔就可以了。但是後來發現轉換下來的pdf數量少了10個,所有的文檔并沒有全都轉換成功。

四、後續研究

排查一番,發現這些文檔中有10個doc文檔,就該就是這10個沒有成功了,單獨拎出來轉換一下,結果就報錯了:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
org.docx4j.openpackaging.exceptions.Docx4JException: This file seems to be a binary doc/ppt/xls, not an encrypted OLE2 file containing a doc/pptx/xlsx
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:612)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:414)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:287)
    at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:265)
    at org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(WordprocessingMLPackage.java:168)
    at org.docx4j.Docx4J.load(Docx4J.java:232)
    at com.convert.test.ConvertTest.word2pdf(ConvertTest.java:26)
    at com.convert.test.ConvertTest.main(ConvertTest.java:19)
This file seems to be a binary doc/ppt/xls, not an encrypted OLE2 file containing a doc/pptx/xlsx
           

“此檔案似乎是一個二進制檔案doc/ppt/xls,而不是包含doc/pptx/xlsx的加密OLE2檔案,經過驗證docx4j并不能完美的支援所有的word文檔,至少doc文檔并不能支援。不知道你們有沒有遇到過這個問題,又是怎麼解決的