在我使用get請求進行查詢的時候遇到一個問題：

當我的請求參數中有中文時，出現亂碼。

可是即使我設定了Spring的characterEncodingFilter，也還是出現亂碼。

原因：tomcat預設使用ISO8859-1編碼來解析get中的url參數，導緻亂碼。而

characterEncodingFilter

或者

 request.setCharacterEncoding("UTF-8");

都隻針對post請求體有效。

下面對Http中get方法編碼到tomcat的解碼過程進行探究。

解決方法

更改tomcat中get方法預設ISO8859-1編碼為utf-8編碼。

找到
conf/server.xml ,在 <Connector port="8082" protocol="HTTP/1.1" 中加入 URIEncoding="utf-8" 。
将參數以iso8859-1編碼轉化為位元組數組，然後再以UTF-8将位元組數組轉化為字元串。 userName = new String(userName.getBytes("ISO8859-1"), "UTF-8");

URL是怎麼編碼的？

參考關于URL編碼

一般來說，URL隻能使用英文字母、阿拉伯數字和某些标點符号，不能使用其他文字和符号。比如，世界上有英文字母的網址"http://www.abc.com"，但是沒有希臘字母的網址"http://www.aβγ.com"（讀作阿爾法-貝塔-伽瑪.com）。這是因為網絡标準RFC 1738做了硬性規定。

這意味着，如果URL中有漢字，就必須編碼後使用。但是麻煩的是，RFC 1738沒有規定具體的編碼方法，而是交給應用程式（浏覽器）自己決定。這導緻"URL編碼"成為了一個混亂的領域。

不同的作業系統、不同的浏覽器、不同的網頁字元集，将導緻完全不同的編碼結果。經過測試，現在的浏覽器大部分都是utf-8編碼。但是為了相容所有的浏覽器，可以使用Javascript函數：

encodeURI()

encodeURI()是Javascript中真正用來對URL編碼的函數。

它着眼于對整個URL進行編碼，是以除了常見的符号以外，對其他一些在網址中有特殊含義的符号"; / ? : @ & = +

"，也不進行編碼。編碼後，它輸出符号的utf-8形式，并且在每個位元組前加上%。

它對應的解碼函數是decodeURI()。

tomcat是怎麼解碼的？

get請求是使用url編碼方式，而post請求基于請求體自身的編碼。

get方法的編碼

檢視tomcat源碼中，

org.apache.catalina.connector.CoyoteAdapter

的方法：

使用在conf/server.xml中

<Connector port="8082" protocol="HTTP/1.1">

配置的URIEncoding作為将前端傳過來的參數轉化為字元數組的編碼，預設為ISO8859-1。

protected void convertURI(MessageBytes uri, Request request)
    throws Exception {

    ByteChunk bc = uri.getByteChunk();
    int length = bc.getLength();
    CharChunk cc = uri.getCharChunk();
    cc.allocate(length, -1);
    // 使用預設編碼 ISO8859-1 将位元組數組程式設計字元
    String enc = connector.getURIEncoding();

    if (enc != null) {
        B2CConverter conv = request.getURIConverter();
        try {
            if (conv == null) {
                conv = new B2CConverter(enc, true);
                request.setURIConverter(conv);
            } else {
                conv.recycle();
            }
        } catch (IOException e) {
            log.error("Invalid URI encoding; using HTTP default");
            connector.setURIEncoding(null);
        }
        if (conv != null) {
            try {
                conv.convert(bc, cc, true);
                uri.setChars(cc.getBuffer(), cc.getStart(), cc.getLength());
                return;
            } catch (IOException ioe) {
                // Should never happen as B2CConverter should replace
                // problematic characters
                request.getResponse().sendError(
                        HttpServletResponse.SC_BAD_REQUEST);
            }
        }
    }

    // Default encoding: fast conversion for ISO-8859-1
    byte[] bbuf = bc.getBuffer();
    char[] cbuf = cc.getBuffer();
    int start = bc.getStart();
    for (int i = 0; i < length; i++) {
        cbuf[i] = (char) (bbuf[i + start] & 0xff);
    }
    uri.setChars(cbuf, 0, length);
}

post方法的字元編碼

如果在servlet的doPost方法中或者filter中設定了request的字元編碼，那麼就以設定的為準。

request設定編碼

public void doPost(HttpServletRequestrequest,HttpServletResponse response)
      throws IOException,ServletException{
//必須在getParameter,getParameterNames,
//getParameterValues方法調用之前進行設定
request.setContentType("UTF-8");
}

web.xml中配置filter

<filter>
    <filter-name>SetCharacterEncoding</filter-name>
    <filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
    <init-param>
      <param-name>encoding</param-name>
      <param-value>UTF-8</param-value>
    </init-param>
</filter>

如果沒有進行上面的配置，那麼從http header中取出content-type,然後從content-type的值中取出charset的值，charset的值作為post的字元編碼。

如
content-type=application/x-www-form-urlencoded;charset=utf-8
那麼，post的字元編碼就是utf-8。

如果從http header中沒有取到content-type中的charset，那麼，就使用預設的ISO-8859-1。

參考文檔

關于URL編碼
get請求中url傳參中文亂碼問題--集錦

get請求中文亂碼及get,post編碼探究

解決方法

URL是怎麼編碼的？

tomcat是怎麼解碼的？

get方法的編碼

post方法的字元編碼