JAVA判断文件编码类型

java读取文件，处理过程中，可能因为文件的编码问题导致了中文乱码。有时需要将utf-8的改为ansi的编码。以下代码就可以判断文件是什么编码方式。

主要jar包：cpdetector.jar

下载地址http://cpdetector.sourceforge.net/

同时还需jchardet-1.0.jar这个包，否则detector.add(cpdetector.io.jchardetfacade.getinstance()); 会报错；

下载地址http://www.jarfinder.com/index.php/jars/versioninfo/40297

还有一个antlr.jar，不然运行过程中detector.add(new parsingdetector(false));会报错；

下载地址http://www.java2s.com/code/jar/abc/downloadantlrjar.htm

import info.monitorenter.cpdetector.io.asciidetector;

import info.monitorenter.cpdetector.io.codepagedetectorproxy;

import info.monitorenter.cpdetector.io.jchardetfacade;

import info.monitorenter.cpdetector.io.parsingdetector;

import info.monitorenter.cpdetector.io.unicodedetector;

import java.io.file;

import java.nio.charset.charset;

/**

* @author weict

public class judgefilecode {

/**

* @param args

public static void main(string[] args) {

/*------------------------------------------------------------------------

detector是探测器，它把探测任务交给具体的探测实现类的实例完成。

cpdetector内置了一些常用的探测实现类，这些探测实现类的实例可以通过add方法

加进来，如parsingdetector、 jchardetfacade、asciidetector、unicodedetector。

detector按照“谁最先返回非空的探测结果，就以该结果为准”的原则返回探测到的

字符集编码。

--------------------------------------------------------------------------*/

codepagedetectorproxy detector = codepagedetectorproxy.getinstance();

/*-------------------------------------------------------------------------

parsingdetector可用于检查html、xml等文件或字符流的编码,构造方法中的参数用于

指示是否显示探测过程的详细信息，为false不显示。

---------------------------------------------------------------------------*/

detector.add(new parsingdetector(false));//如果不希望判断xml的encoding，而是要判断该xml文件的编码，则可以注释掉

/*--------------------------------------------------------------------------

jchardetfacade封装了由mozilla组织提供的jchardet，它可以完成大多数文件的编码

测定。所以，一般有了这个探测器就可满足大多数项目的要求，如果你还不放心，可以

再多加几个探测器，比如下面的asciidetector、unicodedetector等。

---------------------------------------------------------------------------*/

detector.add(jchardetfacade.getinstance());

// asciidetector用于ascii编码测定

detector.add(asciidetector.getinstance());

// unicodedetector用于unicode家族编码的测定

detector.add(unicodedetector.getinstance());

charset charset = null;

file f = new file("文件路径");

try {

charset = detector.detectcodepage(f.tourl());

} catch (exception ex) {

ex.printstacktrace();

}

if (charset != null) {

system.out.println(f.getname() + "编码是：" + charset.name());

} else {

system.out.println(f.getname() + "未知");

}

上面代码中的detector不仅可以用于探测文件的编码，也可以探测任意输入的文本流的编码，方法是调用其重载形式：

charset=detector.detectcodepage(待测的文本输入流,测量该流所需的读入字节数);

上面的字节数由程序员指定，字节数越多，判定越准确，当然时间也花得越长。要注意，字节数的指定不能超过文本流的最大长度。

判定文件编码的具体应用举例：

属性文件(.properties)是java程序中的常用文本存储方式，象struts框架就是利用属性文件存储程序中的字符串资源。它的内容如下所示：

1. #注释语句

2. 属性名=属性值

读入属性文件的一般方法是：

fileinputstream ios=new fileinputstream("属性文件名");

properties prop=new properties();

prop.load(ios);

ios.close();

利用java.io.properties的load方法读入属性文件虽然方便，但如果属性文件中有中文，在读入之后就会发现出现乱码现象。发生这个原因是load方法使用字节流读入文本，在读入后需要将字节流编码成为字符串，而它使用的编码是“iso-8859-1”,这个字符集是ascii码字符集，不支持中文编码，所以这时需要使用显式的转码:

string value=prop.getproperty("属性名");

string encvalue=new string(value.getbytes("iso-8859-1"),"属性文件的实际编码");

在上面的代码中，属性文件的实际编码就可以利用上面的方法获得。当然，象这种属性文件是项目内部的，我们可以控制属性文件的编码格式，比如约定采用windows内定的gbk，就直接利用"gbk"来转码，如果约定采用utf-8，也可以是使用"utf-8"直接转码。如果想灵活一些，做到自动探测编码，就可利用上面介绍的方法测定属性文件的编码，从而方便开发人员的工作。

JAVA判断文件编码类型

继续阅读

关于Gradle配置的小结

Java小案例——随机数猜测随机数猜测

nginx location中斜线的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的简单使用

neo4j之cypher使用文档

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

mybatis_入门程序Mybatis入门

AOP编程_Android优雅权限框架(1)概念基础，2021金三银四前言正文大纲正文

Effective Java 8:通用程序设计

OOM三种类型

工厂模式-三种类型

【递归】高效率求2的n次幂

win10本地scala和spark安装安装scala安装spark

scala (3) Function 和 Method