前兩天一段調用HTTP服務的腳本出了問題,仔細一看,發現是提供的HTTP服務在響應頭裡寫了:
HTTP/1.1 200 OK
Server: xxxxxxxxxx
Content-Type: text/html; charset=utf-8
Connection: close
Content-Length:2014
響應的頭中聲明了Content-Type,其中指定了charset=utf-8;但實際上響應中的文本卻是GBK編碼的。這使得原本我寫的請求腳本出了問題。
依賴的Apache HttpClient如下:
pom.xml:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcomponents-client</artifactId>
<version>4.0</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcomponents-core</artifactId>
<version>4.0.1</version>
</dependency>
原本的腳本使用[url=http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/impl/client/DefaultHttpClient.html]DefaultHttpClient[/url]去發起請求,并通過[url=http://hc.apache.org/httpcomponents-core-ga/httpcore/apidocs/org/apache/http/util/EntityUtils.html]EntityUtils[/url]自己實作一個與[url=http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/impl/client/BasicResponseHandler.html]BasicResponseHandler[/url]相似的[url=http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/client/ResponseHandler.html]ResponseHandler[/url],類似這樣的:
import org.apache.http.client.HttpResponseException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;
def httpClient = new DefaultHttpClient();
def makeResponseHandler(charset) {
{ response ->
def statusLine = response.statusLine;
if (statusLine.statusCode >= 300) {
throw new HttpResponseException(statusLine.statusCode, statusLine.reasonPhrase);
}
def entity = response.entity;
entity ? EntityUtils.toString(entity, charset) : null;
} as ResponseHandler
}
def httpGet = new HttpGet(requestUrl);
def responseBody = httpClient.execute(httpGet, makeResponseHandler('GBK'));
原本要調用的那個HTTP服務傳回的響應的頭裡面沒有Content-Type,是以這樣去使用[url=http://hc.apache.org/httpcomponents-core-ga/httpcore/apidocs/org/apache/http/util/EntityUtils.html#toString(org.apache.http.HttpEntity)]EntityUtils.toString(entity, defaultCharset)[/url]就已經可以達到指定解析響應内容時使用的字元編碼的目的了。
問題是那個HTTP服務現在帶上了錯誤的Content-Type,而EntityUtils.toString(entity, defaultCharset)認為Content-Type中的charset比defaultCharset更優先,此時上面的腳本就達不到強制指定字元編碼的目的了。
咋辦呢?最直覺的當然是自己把響應的内容的byte數組拿到手,然後自己想怎麼處理就怎麼處理:
import org.apache.http.client.HttpResponseException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;
def httpClient = new DefaultHttpClient();
def makeResponseHandler(charset) {
{ response ->
def statusLine = response.statusLine;
if (statusLine.statusCode >= 300) {
throw new HttpResponseException(statusLine.statusCode, statusLine.reasonPhrase);
}
def entity = response.entity;
def bytes = entity ? EntityUtils.toByteArray(entity) : null;
bytes ? new String(bytes, charset) : null;
} as ResponseHandler
}
def httpGet = new HttpGet(requestUrl);
def responseBody = httpClient.execute(httpGet, makeResponseHandler('GBK'));
不知道還有沒有啥更好的辦法呢?我對HttpClient還是太不熟悉了。
本來最好自然是提供HTTP服務的那邊把響應頭的資訊修正,但這又要經過各種繁瑣的流程,我在跟進的某工具卻等不及了,隻好hack一下 =_=