xml讀取異常Invalid byte 1 of 1-byte UTF-8 sequence

http://blog.csdn.net/chenyanbo/article/details/6866941

說簡單點當你解析别人的xml格式出現這個錯誤可能就是别人在生成xml時沒有儲存為utf-8的字元編碼格式。

在中文版的window下java的預設的編碼為GBK，也就是所雖然我們辨別了要将xml儲存為utf-8格式但實際上檔案是以GBK格式來儲存的，是以這也就是為什麼能夠我們使用GBK、GB2312編碼來生成xml檔案能正确的被解析，而以UTF-8格式生成的檔案不能被xml解析器所解析的原因。

解決：

1、最簡單就是把<?xml version="1.0" encoding="UTF-8"?>改成<?xml version="1.0" encoding="gbk"?>

2、或者把xml打開另存的時候把字元集改為UTF-8後儲存

3、在代碼解析的時候先把xml重新寫一遍

4、直接dom4j讀取的時候用io來讀，修改字元編碼

------------------------------------------------------------------------------------------------------------------------------

om4j進行中文之編碼問題

轉自：http://zhonghuaweixu.blog.163.com/blog/static/11793205920106693820542/

問題描述

在使用dom4j的時候發現有時會出現這樣一個問題：無法以UTF-8編碼格式成功儲存xml檔案，具體表現為儲存後中文呈現亂碼（如果沒有亂碼，說明儲存前的編碼沒有設定成功，儲存成了本地的gbk或者gb2312格式）再次讀取的時候會報類似如下的錯誤：

Invalid byte 2 of 2-byte UTF-8 sequence. Nested exception: Invalid byte 2 of 2-byte UTF-8 sequence.

Invalid byte 1 of 1-byte UTF-8 sequence. Nested exception: Invalid byte 1 of 1-byte UTF-8 sequence.

Invalid byte 2 of 2-byte UTF-8 sequence. Nested exception: Invalid byte 2 of 2-byte UTF-8 sequence.

Invalid byte 1 of 1-byte UTF-8 sequence. Nested exception: Invalid byte 1 of 1-byte UTF-8 sequence.

2 位元組 UTF-8 序列的無效位元組 2。 Nested exception: 2 位元組 UTF-8 序列的無效字

在dom4j的範例中建立一個xml文檔的代碼如下：

// 輸出XML文檔

try

{

XMLWriter output = new XMLWriter(new FileWriter(new File("data/catalog.xml")));

output.write(document);

output.close();

}

catch (IOException e)

{

System.out.println(e.getMessage());

}

錯誤原因分析

在上面的代碼中輸出使用的是FileWriter對象進行檔案的輸出。這就是不能正确進行檔案編碼的原因所在，Java中由Writer類繼承下來的子類沒有提供編碼格式處理，是以dom4j也就無法對輸出的檔案進行正确的格式處理。這時候所儲存的檔案會以系統的預設編碼對檔案進行儲存，在中文版的window下Java的預設的編碼為GBK，也就是說雖然我們辨別了要将xml儲存為utf-8格式，但實際上檔案是以GBK格式來儲存的，是以這也就是為什麼我們使用GBK、GB2312編碼來生成xml檔案能正确的被解析，而以UTF-8格式生成的檔案不能被xml解析器所解析的原因。

如何解決問題？

首先我們看看dom4j是如何實作編碼處理的，如下所示：

publicXMLWriter(OutputStream out) throws UnsupportedEncodingException {

//System.out.println("In OutputStream");

this.format = DEFAULT_FORMAT;

this.writer = createWriter(out, format.getEncoding());

this.autoFlush = true;

namespaceStack.push(Namespace.NO_NAMESPACE);

}

publicXMLWriter(OutputStream out, OutputFormat format) throws UnsupportedEncodingException {

//System.out.println("In OutputStream,OutputFormat");

this.format = format;

this.writer = createWriter(out, format.getEncoding());

this.autoFlush = true;

namespaceStack.push(Namespace.NO_NAMESPACE);

}

protected Writer createWriter(OutputStream outStream, String

encoding) throws UnsupportedEncodingException {

returnnew BufferedWriter(

new OutputStreamWriter( outStream, encoding )

);

}

由上面的代碼我們可以看出dom4j對編碼并沒有進行什麼很複雜的處理，完全通過 Java本身的功能來完成。是以我們在使用dom4j生成xml檔案時不應該直接在建構XMLWriter時，為其賦一個Writer對象，而應該通過一個OutputStream的子類對象來建構。也就是說在我們上面的代碼中，不應該用FileWriter對象來建構xml文檔，而應該使用FileOutputStream對象來建構，修改後的代碼如下:

// 輸出XML文檔

try

{

OutputFormat outFmt = new OutputFormat("\t", true);

outFmt.setEncoding("UTF-8");

XMLWriter output = new XMLWriter(new FileOutputStream(filename), outFmt);

output.write(document);

output.close();

}

catch (IOException e)

{

System.out.println(e.getMessage());

}

如何讀取呢？

public List extractXMLText(File inputXml, String node)

{

List texts = null;

try

{

// 使用SAXReader解析XML文檔,SAXReader包含在org.dom4j.io包中。

// inputXml是由xml檔案建立的java.io.File。

SAXReader saxReader = new SAXReader();

saxReader.setEncoding("UTF-8");

Document document = saxReader.read(inputXml);

texts = document.selectNodes(node); // 擷取sentence清單

}

catch (DocumentException e)

{

System.out.println(e.getMessage());

}

return texts;

}

xml讀取異常Invalid byte 1 of 1-byte UTF-8 sequence

om4j進行中文之編碼問題

繼續閱讀

關于Gradle配置的小結

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method