JAVA判斷檔案編碼類型

java讀取檔案，處理過程中，可能因為檔案的編碼問題導緻了中文亂碼。有時需要将utf-8的改為ansi的編碼。以下代碼就可以判斷檔案是什麼編碼方式。

主要jar包：cpdetector.jar

下載下傳位址http://cpdetector.sourceforge.net/

同時還需jchardet-1.0.jar這個包，否則detector.add(cpdetector.io.jchardetfacade.getinstance()); 會報錯；

下載下傳位址http://www.jarfinder.com/index.php/jars/versioninfo/40297

還有一個antlr.jar，不然運作過程中detector.add(new parsingdetector(false));會報錯；

下載下傳位址http://www.java2s.com/code/jar/abc/downloadantlrjar.htm

import info.monitorenter.cpdetector.io.asciidetector;

import info.monitorenter.cpdetector.io.codepagedetectorproxy;

import info.monitorenter.cpdetector.io.jchardetfacade;

import info.monitorenter.cpdetector.io.parsingdetector;

import info.monitorenter.cpdetector.io.unicodedetector;

import java.io.file;

import java.nio.charset.charset;

/**

* @author weict

public class judgefilecode {

/**

* @param args

public static void main(string[] args) {

/*------------------------------------------------------------------------

detector是探測器，它把探測任務交給具體的探測實作類的執行個體完成。

cpdetector内置了一些常用的探測實作類，這些探測實作類的執行個體可以通過add方法

加進來，如parsingdetector、 jchardetfacade、asciidetector、unicodedetector。

detector按照“誰最先傳回非空的探測結果，就以該結果為準”的原則傳回探測到的

字元集編碼。

--------------------------------------------------------------------------*/

codepagedetectorproxy detector = codepagedetectorproxy.getinstance();

/*-------------------------------------------------------------------------

parsingdetector可用于檢查html、xml等檔案或字元流的編碼,構造方法中的參數用于

訓示是否顯示探測過程的詳細資訊，為false不顯示。

---------------------------------------------------------------------------*/

detector.add(new parsingdetector(false));//如果不希望判斷xml的encoding，而是要判斷該xml檔案的編碼，則可以注釋掉

/*--------------------------------------------------------------------------

jchardetfacade封裝了由mozilla組織提供的jchardet，它可以完成大多數檔案的編碼

測定。是以，一般有了這個探測器就可滿足大多數項目的要求，如果你還不放心，可以

再多加幾個探測器，比如下面的asciidetector、unicodedetector等。

---------------------------------------------------------------------------*/

detector.add(jchardetfacade.getinstance());

// asciidetector用于ascii編碼測定

detector.add(asciidetector.getinstance());

// unicodedetector用于unicode家族編碼的測定

detector.add(unicodedetector.getinstance());

charset charset = null;

file f = new file("檔案路徑");

try {

charset = detector.detectcodepage(f.tourl());

} catch (exception ex) {

ex.printstacktrace();

}

if (charset != null) {

system.out.println(f.getname() + "編碼是：" + charset.name());

} else {

system.out.println(f.getname() + "未知");

}

上面代碼中的detector不僅可以用于探測檔案的編碼，也可以探測任意輸入的文本流的編碼，方法是調用其重載形式：

charset=detector.detectcodepage(待測的文本輸入流,測量該流所需的讀入位元組數);

上面的位元組數由程式員指定，位元組數越多，判定越準确，當然時間也花得越長。要注意，位元組數的指定不能超過文本流的最大長度。

判定檔案編碼的具體應用舉例：

屬性檔案(.properties)是java程式中的常用文本存儲方式，象struts架構就是利用屬性檔案存儲程式中的字元串資源。它的内容如下所示：

1. #注釋語句

2. 屬性名=屬性值

讀入屬性檔案的一般方法是：

fileinputstream ios=new fileinputstream("屬性檔案名");

properties prop=new properties();

prop.load(ios);

ios.close();

利用java.io.properties的load方法讀入屬性檔案雖然友善，但如果屬性檔案中有中文，在讀入之後就會發現出現亂碼現象。發生這個原因是load方法使用位元組流讀入文本，在讀入後需要将位元組流編碼成為字元串，而它使用的編碼是“iso-8859-1”,這個字元集是ascii碼字元集，不支援中文編碼，是以這時需要使用顯式的轉碼:

string value=prop.getproperty("屬性名");

string encvalue=new string(value.getbytes("iso-8859-1"),"屬性檔案的實際編碼");

在上面的代碼中，屬性檔案的實際編碼就可以利用上面的方法獲得。當然，象這種屬性檔案是項目内部的，我們可以控制屬性檔案的編碼格式，比如約定采用windows内定的gbk，就直接利用"gbk"來轉碼，如果約定采用utf-8，也可以是使用"utf-8"直接轉碼。如果想靈活一些，做到自動探測編碼，就可利用上面介紹的方法測定屬性檔案的編碼，進而友善開發人員的工作。

JAVA判斷檔案編碼類型

繼續閱讀

關于Gradle配置的小結

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method