用Apache HttpClient 4.0時強制指定響應的字元編碼

前兩天一段調用HTTP服務的腳本出了問題，仔細一看，發現是提供的HTTP服務在響應頭裡寫了：

HTTP/1.1 200 OK
Server: xxxxxxxxxx
Content-Type: text/html; charset=utf-8
Connection: close
Content-Length:2014

響應的頭中聲明了Content-Type，其中指定了charset=utf-8；但實際上響應中的文本卻是GBK編碼的。這使得原本我寫的請求腳本出了問題。

依賴的Apache HttpClient如下：

pom.xml：

<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpcomponents-client</artifactId>
  <version>4.0</version>
</dependency>
<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpcomponents-core</artifactId>
  <version>4.0.1</version>
</dependency>

原本的腳本使用[url=http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/impl/client/DefaultHttpClient.html]DefaultHttpClient[/url]去發起請求，并通過[url=http://hc.apache.org/httpcomponents-core-ga/httpcore/apidocs/org/apache/http/util/EntityUtils.html]EntityUtils[/url]自己實作一個與[url=http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/impl/client/BasicResponseHandler.html]BasicResponseHandler[/url]相似的[url=http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/client/ResponseHandler.html]ResponseHandler[/url]，類似這樣的：

import org.apache.http.client.HttpResponseException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;

def httpClient = new DefaultHttpClient();
def makeResponseHandler(charset) {
  { response ->
      def statusLine = response.statusLine;
      if (statusLine.statusCode >= 300) {
        throw new HttpResponseException(statusLine.statusCode, statusLine.reasonPhrase);
      }

      def entity = response.entity;
      entity ? EntityUtils.toString(entity, charset) : null;
  } as ResponseHandler
}

def httpGet = new HttpGet(requestUrl);
def responseBody = httpClient.execute(httpGet, makeResponseHandler('GBK'));

原本要調用的那個HTTP服務傳回的響應的頭裡面沒有Content-Type，是以這樣去使用[url=http://hc.apache.org/httpcomponents-core-ga/httpcore/apidocs/org/apache/http/util/EntityUtils.html#toString(org.apache.http.HttpEntity)]EntityUtils.toString(entity, defaultCharset)[/url]就已經可以達到指定解析響應内容時使用的字元編碼的目的了。

問題是那個HTTP服務現在帶上了錯誤的Content-Type，而EntityUtils.toString(entity, defaultCharset)認為Content-Type中的charset比defaultCharset更優先，此時上面的腳本就達不到強制指定字元編碼的目的了。

咋辦呢？最直覺的當然是自己把響應的内容的byte數組拿到手，然後自己想怎麼處理就怎麼處理：

import org.apache.http.client.HttpResponseException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;

def httpClient = new DefaultHttpClient();
def makeResponseHandler(charset) {
  { response ->
      def statusLine = response.statusLine;
      if (statusLine.statusCode >= 300) {
        throw new HttpResponseException(statusLine.statusCode, statusLine.reasonPhrase);
      }

      def entity = response.entity;
      def bytes = entity ? EntityUtils.toByteArray(entity) : null;
      bytes ? new String(bytes, charset) : null;
  } as ResponseHandler
}

def httpGet = new HttpGet(requestUrl);
def responseBody = httpClient.execute(httpGet, makeResponseHandler('GBK'));

不知道還有沒有啥更好的辦法呢？我對HttpClient還是太不熟悉了。

本來最好自然是提供HTTP服務的那邊把響應頭的資訊修正，但這又要經過各種繁瑣的流程，我在跟進的某工具卻等不及了，隻好hack一下 =_=

用Apache HttpClient 4.0時強制指定響應的字元編碼

繼續閱讀

配置apache支援PHP（win7）

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

maven No compiler is provided in this environment. Perhaps you are running on a JRE rather than a J

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method

Opendaylight課堂之深度剖析toaster（一）