天天看点

ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考

问题描述

A服务提供了个RPC接口给B服务使用,入参里有个参数是List类型,B服务在传参时使用Guava里的 ImmutableList,结果发生报错。

其中,B服务即consumer端的异常为:「com.alibaba.dubbo.remoting.RemotingException: Fail to decode request due to: RpcInvocation [methodName=…」 。consumer端异常截图:

ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考

A服务即provider端的异常为:「com.alibaba.com.caucho.hessian.io.HessianFieldException: com.pdd.service.smart.contract.request.agreement.QueryCarrierAgreementEffectiveWarehouseInfoRequest.agreementTypeList: java.util.List cannot be assigned from null」。provider端异常截图:

ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考

然而,当将 ImmutableList 改成 ArrayList 时报错消失,接口恢复正常。

环境说明

已知我司使用的RPC框架是dubbo,其中的序列化协议是dubbo默认的hessian2,对应版本是:hessian-lite:3.2.1-fixed-2。

ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考

< 上图出自:https://cn.dubbo.apache.org/zh/docs3-v2/java-sdk/reference-manual/protocol/dubbo/ >

问题复现

从第一部分【问题描述】中的异常堆栈信息可知,问题主要出在hessian2序列化的过程中,更准确一点是consumer反序列化时解析出错,导致异常。因此问题定位为dubbo的Serialize 数据序列化层,仅从hessian2序列化和反序列化来复现和分析问题。

< dubbo代码架构 https://cn.dubbo.apache.org/zh/docs3-v2/java-sdk/concepts-and-architecture/code-architecture/ >

public class HessianLiteUtils {
    /**
     * 序列化
     */
    public static byte[] serialize(Object object) {
        Hessian2Output h2o = null;
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

        try {
            h2o = new Hessian2Output(outputStream);
            h2o.writeObject(object);
            h2o.flush();
            return outputStream.toByteArray();
        } catch (Exception e) {
            System.out.println("HessianUtils.serialize exception " + e.getMessage());
            throw new RuntimeException("HessianUtils.serialize 异常", e);
        } finally {
            if (h2o != null) {
                try {
                    h2o.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    /**
     * 反序列化.
     */
    public static Object deserialize(byte[] bytes) {
        Hessian2Input h2i = null;
        try {
            ByteArrayInputStream inputStream = new ByteArrayInputStream(bytes);
            h2i = new Hessian2Input(inputStream);
            return h2i.readObject();
        } catch (Exception e) {
            System.out.println("HessianUtils.deserialize exception " + e.getMessage());
            throw new RuntimeException("HessianUtils.deserialize 异常", e);
        } finally {
            if (h2i != null) {
                try {
                    h2i.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    public static String hex(byte[] bytes) {
        StringBuilder result = new StringBuilder();
        for (byte b : bytes) {
            result.append(String.format("%02x", b));
            // upper case
            // result.append(String.format("%02X", aByte));
        }
        return result.toString();
    }
    
    public static void main(String[] args) {
        Person person = new Person();
        person.setName("胖嘟嘟");
        person.setMale(true);
        person.setAge(10);
        person.setList(Lists.newArrayList(1, 2));

        // Person 可正常序列化和反序列化
        byte[] bytes = serialize(person);
        System.out.println("serialized hex bytes: " + hex(bytes));
        Object object = deserialize(bytes);
        System.out.println("deserialized object: " + object.toString());

        // Request 序列化成功,反序列化失败
        Request request = new Request();
        request.setList(ImmutableList.of(1, 2));
        byte[] bytesRequest = serialize(request);
        System.out.println("serialized hex bytes: " + hex(bytesRequest));
        Object objectReq = deserialize(bytesRequest);
        System.out.println("deserialized object: " + objectReq.toString());
    }
}

// 其中Person和Request的定义为
public class Person implements Serializable {
    private static final long serialVersionUID = 7242674494415301661L;

    private String name;
    private boolean isMale;
    private int age;
    private List<Integer> list;
	// 此处省略getter、setter和toString
}

public class Request implements Serializable {
    private static final long serialVersionUID = 408723925489378046L;

    private List<Integer> list;

    public List<Integer> getList() {
        return list;
    }

    public void setList(List<Integer> list) {
        this.list = list;
    }
}
           

在上述代码中Person可正常序列化,输出结果为:

serialized hex bytes: 4306506572736f6e94046c697374036167650669734d616c65046e616d65607a91929a5403e88396e5989fe5989f
deserialized object: Person{name='胖嘟嘟', isMale=true, age=10, list=[1, 2]}
           

Request中List参数传ImmutableList会导致反序列化异常,表现和rpc接口异常一样,问题复现。

serialized hex bytes: 43075265717565737491046c69737460433036636f6d2e676f6f676c652e636f6d6d6f6e2e636f6c6c6563742e496d6d757461626c654c6973742453657269616c697a6564466f726d9108656c656d656e74736172075b6f626a6563749192
HessianUtils.deserialize exception Request.list: com.google.common.collect.ImmutableList cannot be assigned from null
Exception in thread "main" java.lang.RuntimeException: HessianUtils.deserialize 异常
	at HessianLiteUtils.deserialize(HessianLiteUtils.java:59)
	at HessianLiteUtils.main(HessianLiteUtils.java:99)
Caused by: com.alibaba.com.caucho.hessian.io.HessianFieldException: Request.list: com.google.common.collect.ImmutableList cannot be assigned from null
	at com.alibaba.com.caucho.hessian.io.JavaDeserializer.logDeserializeError(JavaDeserializer.java:171)
	at com.alibaba.com.caucho.hessian.io.JavaDeserializer$ObjectFieldDeserializer.deserialize(JavaDeserializer.java:414)
	at com.alibaba.com.caucho.hessian.io.JavaDeserializer.readObject(JavaDeserializer.java:275)
	at com.alibaba.com.caucho.hessian.io.JavaDeserializer.readObject(JavaDeserializer.java:199)
	at com.alibaba.com.caucho.hessian.io.SerializerFactory.readObject(SerializerFactory.java:529)
	at com.alibaba.com.caucho.hessian.io.Hessian2Input.readObjectInstance(Hessian2Input.java:2803)
	at com.alibaba.com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:2743)
	at com.alibaba.com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:2272)
	at com.alibaba.com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:2717)
	at com.alibaba.com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:2272)
	at HessianLiteUtils.deserialize(HessianLiteUtils.java:56)
	... 1 more
Caused by: java.lang.UnsupportedOperationException: com.alibaba.com.caucho.hessian.io.CollectionDeserializer@7bb11784
	at com.alibaba.com.caucho.hessian.io.AbstractDeserializer.readObject(AbstractDeserializer.java:121)
	at com.alibaba.com.caucho.hessian.io.Hessian2Input.readObjectInstance(Hessian2Input.java:2801)
Caused by: com.alibaba.com.caucho.hessian.io.HessianFieldException: Request.list: com.google.common.collect.ImmutableList cannot be assigned from null

	at com.alibaba.com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:2140)
	at com.alibaba.com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:2069)
Caused by: java.lang.UnsupportedOperationException: com.alibaba.com.caucho.hessian.io.CollectionDeserializer@7bb11784

	at com.alibaba.com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:2113)
	at com.alibaba.com.caucho.hessian.io.Hessian2Input.readObject(Hessian2Input.java:2069)
	at com.alibaba.com.caucho.hessian.io.JavaDeserializer$ObjectFieldDeserializer.deserialize(JavaDeserializer.java:410)
	... 10 more
           

为什么ArrayList可以正常序列化和反序列化,而ImmutableList只能序列化却不可以反序列化呢?我们先看下Hessian2协议的内容。

Hessian 2 协议

  • Hessian 的对象序列化支持八种基本类型:
    1. 原始 二进制数据(binary)
    2. 布尔型数据(boolean)
    3. 64位 毫秒 日期类型数据(date)
    4. 64位 双精度 浮点类型数据(double)
    5. 32位 整数类型数据(int)
    6. 64位 长整数类型数据(long)
    7. null(null)
    8. UTF-8 编码的 字符串类型数据(string)
  • 支持三种递归类型(recursive type):
    1. 支持链表(list)和数组(array)的 (list)
    2. 支持映射(map)和字典(dictionary)的( map)
    3. 支持对象的(object)。
  • 支持一种特殊的结构:
    1. 支持共享和循环引用的(ref)。
  • Hessian 2.0 有三种内部引用映射:
    1. 一种 object/list 引用映射
    2. 一种类定义引用映射
    3. 一种类型(类名)引用映射

字节码映射

Hessian是一个字节码协议,反序列化过程本质上是对8位字节进行switch语句转换的过程。

x00 - x1f    # utf-8 string length 0-32
x20 - x2f    # binary data length 0-16
x30 - x33    # utf-8 string length 0-1023
x34 - x37    # binary data length 0-1023
x38 - x3f    # three-octet compact long (-x40000 to x3ffff)
x40          # reserved (expansion/escape)
x41          # 8-bit binary data non-final chunk ('A')
x42          # 8-bit binary data final chunk ('B')
x43          # object type definition ('C')
x44          # 64-bit IEEE encoded double ('D')
x45          # reserved
x46          # boolean false ('F')
x47          # reserved
x48          # untyped map ('H')
x49          # 32-bit signed integer ('I')
x4a          # 64-bit UTC millisecond date
x4b          # 32-bit UTC minute date
x4c          # 64-bit signed long integer ('L')
x4d          # map with type ('M')
x4e          # null ('N')
x4f          # object instance ('O')
x50          # reserved
x51          # reference to map/list/object - integer ('Q')
x52          # utf-8 string non-final chunk ('R')
x53          # utf-8 string final chunk ('S')
x54          # boolean true ('T')
x55          # variable-length list/vector ('U')
x56          # fixed-length list/vector ('V')
x57          # variable-length untyped list/vector ('W')
x58          # fixed-length untyped list/vector ('X')
x59          # long encoded as 32-bit int ('Y')
x5a          # list/map terminator ('Z')
x5b          # double 0.0
x5c          # double 1.0
x5d          # double represented as byte (-128.0 to 127.0)
x5e          # double represented as short (-32768.0 to 327676.0)
x5f          # double represented as float
x60 - x6f    # object with direct type
x70 - x77    # fixed list with direct length
x78 - x7f    # fixed untyped list with direct length
x80 - xbf    # one-octet compact int (-x10 to x3f, x90 is 0)
xc0 - xcf    # two-octet compact int (-x800 to x7ff)
xd0 - xd7    # three-octet compact int (-x40000 to x3ffff)
xd8 - xef    # one-octet compact long (-x8 to xf, xe0 is 0)
xf0 - xff    # two-octet compact long (-x800 to x7ff, xf8 is 0)
           

先对Hessian序列化协议的基本结构和字节码映射有个印象,具体每种类型对应的语法在遇到的时候我们再回来查阅协议文档: Hessian 2.0 Serialization Protocol.

示例

本节,我们通过对前文 Person 序列化后的字节码进行反序列化来熟悉下Hessian2协议。

serialized hex bytes: 4306506572736f6e94046c697374036167650669734d616c65046e616d65607a91929a5403e88396e5989fe5989f
deserialized object: Person{name='胖嘟嘟', isMale=true, age=10, list=[1, 2]}
           
  1. 该16进制字节数组首位是43,查阅字节码映射可知 x43 表示

    object type definition ('C')

    ,即对象类型定义,查阅协议文档找到对应的Object语法为

    class-def ::= 'C' string int string*

    ,表示 C后面拼接的是个String然后再拼接int和string,对照Class def的解释

    The object definition includes a mandatory type string, the number of fields, and the field names.

    可知,C后面先拼接类型,再拼接字段数和字段名;
  2. 第2位06表示长度为6的utf-8 string;

    x00 - x1f # utf-8 string length 0-32

  3. 接下来我们找到长度为6的string部分,即‘ 506572736f6e ’,16进制转utf-8得到‘ Person ’;
  4. 接下来是94,对应字节码映射

    x80 - xbf # one-octet compact int (-x10 to x3f, x90 is 0)

    ,x94-x90表示int 4,表示接下来的是Person的4个字段;
  5. 04 表示长为4的string,即Person第一个字段的名字长度;
  6. 6c697374 16进制转utf-8得到 list,第一个字段为list;
  7. 03 第2个字段 长为3的string;
  8. 616765 age;
  9. 06 第3个字段长度为6
  10. 69734d616c65 isMale
  11. 04 第4个字段长度为4
  12. 6e616d65 name
  13. 60 第1个字段是object with direct type
  14. 7a 字节码映射

    x78 - x7f # fixed untyped list with direct length

    表示长度为2的list
  15. 91 92

    x80 - xbf # one-octet compact int (-x10 to x3f, x90 is 0)

    表示list的值分别为1和2
  16. 9a 第2个字段的值是 10
  17. 54 第3个字段是boolean true (‘T’)
  18. 03 第4个字段是长度为3的string
  19. e88396e5989fe5989f 第4个字段为‘胖嘟嘟’
ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考

问题分析

回到最初的问题,我们构建如下三个测试Case:

// # Case 1
Request request = new Request();
request.setList(ImmutableList.of(1, 2));
byte[] bytes = serialize(request);

// 43075265717565737491046c69737460433036636f6d2e676f6f676c652e636f6d6d6f6e2e636f6c6c6563742e496d6d757461626c654c6973742453657269616c697a6564466f726d9108656c656d656e74736172075b6f626a6563749192

// # Case 2
byte[] bytes = serialize(ImmutableList.of(1, 2));
// 433036636f6d2e676f6f676c652e636f6d6d6f6e2e636f6c6c6563742e496d6d757461626c654c6973742453657269616c697a6564466f726d9108656c656d656e74736072075b6f626a6563749192

// # Case 3
byte[] bytes = serialize(Lists.newArrayList(1, 2));
// 7a9192
           

实际测试发现,Case 1序列化结果无法反序列化,Case 2和Case 3序列化结果可以正常反序列化。

(这里刚开始复现时用的Case 2,怎么都无法复现线上报错,以为是hessian协议版本/dubbo实现问题,在这耗费了大半天时间。这里也是比较奇怪的,直接将ImmutableList作为参数可以反序列化,将其作为其他对象的参数后就不能正常序列化了)

接下来,先尝试根据Hessian2协议对前两种Case的序列化结果进行初步分析,然后再去深入分析二者在序列化和反序列化代码逻辑实现上的差异。

// # Case 1
43-07-52657175657374-91-04-6c697374-60-43-30-36-636f6d2e676f6f676c652e636f6d6d6f6e2e636f6c6c6563742e496d6d757461626c654c6973742453657269616c697a6564466f726d-91-08656c656d656e7473-61-72-07-5b6f626a656374-91-92

43              # object type definition ('C')
07              # 字符长度7
52657175657374  # Request
91              # 1个字段
04              # 4
6c697374        # list
60              # x60 - x6f 表示 object with direct type 接下来是对象中的值
43              # object type definition ('C')
30              # x30 - x33 utf-8 string length 0-1023 类型为string,语法:[x30-x33] b0 <utf8-data>
36              # 54个
636f6d2e676f6f676c652e636f6d6d6f6e2e636f6c6c6563742e496d6d757461626c654c6973742453657269616c697a6564466f726d # com.google.common.collect.ImmutableList$SerializedForm
91              # 1个字段
08              # 第1个字段为8位string
656c656d656e7473  # elements
61              # x60 - x6f    # object with direct type ----
72              # x70 - x77    # fixed list with direct length 长度为2的list
07              # x00 - x1f    # utf-8 string length 0-32 list的type为 长度为7的string
5b6f626a656374  # [object
91              # 数值 1
92              # 数值 2


// # Case 2
43-30-36-636f6d2e676f6f676c652e636f6d6d6f6e2e636f6c6c6563742e496d6d757461626c654c6973742453657269616c697a6564466f726d-91-08-656c656d656e7473-60-72-07-5b6f626a656374-91-92

43              # object type definition ('C')
30              # x30 - x33 utf-8 string length 0-1023 类型为string // readString
36              # 十进制 54
636f6d2e676f6f676c652e636f6d6d6f6e2e636f6c6c6563742e496d6d757461626c654c6973742453657269616c697a6564466f726d  # com.google.common.collect.ImmutableList$SerializedForm
91              # 1个字段
08              # 第1个字段为8位string
656c656d656e7473  # elements
60              # object with direct type
72              # x70 - x77    # fixed list with direct length 长度为2的object list
07              # x00 - x1f    # utf-8 string length 0-32 list的type为 长度为7的string
5b6f626a656374  # [object
91              # 数值 1
92              # 数值 2

// # Case 3
7a-91-92

7a              # x78 - x7f    # fixed untyped list with direct length 长度为7a-78=2的list
91              # 数值 1
92              # 数值 2
           

初步看下来,Case1 和Case2 都有个名为 com.google.common.collect.ImmutableList$SerializedForm 的对象定义,里面有个字段叫elements,它的值是个object数组([object)。不同点是Case1中有两个对象定义,相比Case2多了Request。

目前看起来一切正常,需要进一步深入源码。

源码分析

序列化

序列化代码入口在Hessian2Output,对着协议理解代码。

ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考
ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考

三个reference map分别存储value、class、type的映射,value reference的可以实现图/递归/循环等特殊数据结构,class和type用于减少重复数据,节省空间。

_buffer字节数组用于缓存序列化后的结果,缓存不足时将缓存结果输出到OutputStream,提高性能。

/**
   * Writes any object to the output stream.
   */
  public void writeObject(Object object) throws IOException
  {
    if (object == null) {
      writeNull();
      return;
    }

    Serializer serializer;
    serializer = findSerializerFactory().getSerializer(object.getClass());

    serializer.writeObject(object, this);
  }
  
    public void writeNull() throws IOException
  {
    int offset = _offset;
    byte []buffer = _buffer;

    if (SIZE <= offset + 16) {
      flush();
      offset = _offset;
    }

    buffer[offset++] = 'N';

    _offset = offset;
  }
           

Hessian2Output.writeObject:对象为null,直接写null后结束;非null对象,先根据class获取对应的序列化实现,然后不同类型按各自协议进行序列化。

其中,Serializer是个接口定义,不同类型的序列化协议实现在对应的实现类中。在我们的例子中,writeObject的具体实现在JavaSerializer中。对于其他实现可自行研究。

对于按class获取序列化实现的逻辑可参阅SerializerFactory,之所以提这点是因为我们上面人工解码出来的

[object

就在这里定义的,表示Object数组类的typeName。

wireteNull的逻辑比较简单,写字符N即可。只不过多了字节数组缓存空间的判断,一个字符2个字节16位,先判断缓存空间不够就先将缓存写到输出流。

ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考
ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考

对于Case 2 调试下,要序列化的对象是size为2的 com.google.common.collect.RegularImmutableList ,其上注释:用writeReplace来序列化,而非默认序列化。

ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考

在ImmutableList中重写了writeReplace,具体实现如下。看到这段代码再比对之前我们人工解码出来的内容,是不是恍然大悟。

com.google.common.collect.ImmutableList$SerializedForm 、 elements 和 [object Object数组 的由来找到了,前途一片光明:)

@Override
  Object writeReplace() {
    return new SerializedForm(toArray());
  }
  
  /*
   * Serializes ImmutableLists as their logical contents. This ensures that
   * implementation types do not leak into the serialized representation.
   */
  static class SerializedForm implements Serializable {
    final Object[] elements;

    SerializedForm(Object[] elements) {
      this.elements = elements;
    }

    Object readResolve() {
      return copyOf(elements);
    }

    private static final long serialVersionUID = 0;
  }
           

后面的逻辑就简单了:

更新reference(删掉老的,用新的替换。主要为了优化性能,跟本文关系不大,忽略);

循环序列化(对Case 2,共进行了2次JavaSerializer.writeObject:先序列化RegularImmutableList、再ImmutableList.SerializedForm,和一次BasicSerializer:OBJECT_ARRAY,详见下面第2张截图)。

ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考
ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考

到这序列化过程就结束了,上述第一张图后面还有部分逻辑在此不再讨论,自行学习吧。

反序列化

反序列化代码在Hessian2Input.readObject中,主要结构为:1)取offset位置对应的字节码作为tag;2)按tag类型反序列化。

【注意下面代码有删减,标识为注释:/// !注意:出于篇幅考虑,此处省略了一大波 case ** …】

public Object readObject(List<Class<?>> expectedTypes) throws IOException {
        int tag = _offset < _length ? (_buffer[_offset++] & 0xff) : read();

        switch (tag) {
            case 'N':
                return null;

            case 'T':
                return Boolean.valueOf(true);

            case 'F':
                return Boolean.valueOf(false);

            // direct integer
            case 0x80:    /// !注意:出于篇幅考虑,此处省略了一大波 case ** ... 
            case 0xbf:
                return Integer.valueOf(tag - BC_INT_ZERO);

            /* byte int */
            case 0xc0:    /// !注意:出于篇幅考虑,此处省略了一大波 case ** ... 
            case 0xcf:
                return Integer.valueOf(((tag - BC_INT_BYTE_ZERO) << 8) + read());

            /* short int */
            case 0xd0:    /// !注意:出于篇幅考虑,此处省略了一大波 case ** ... 
            case 0xd7:
                return Integer.valueOf(((tag - BC_INT_SHORT_ZERO) << 16)
                        + 256 * read() + read());

            case 'I':
                return Integer.valueOf(parseInt());

            // direct long
            case 0xd8:    /// !注意:出于篇幅考虑,此处省略了一大波 case ** ... 
            case 0xef:
                return Long.valueOf(tag - BC_LONG_ZERO);

            /* byte long */
            case 0xf0:    /// !注意:出于篇幅考虑,此处省略了一大波 case ** ... 
            case 0xff:
                return Long.valueOf(((tag - BC_LONG_BYTE_ZERO) << 8) + read());

            /* short long */
            case 0x38:    /// !注意:出于篇幅考虑,此处省略了一大波 case ** ... 
            case 0x3f:
                return Long.valueOf(((tag - BC_LONG_SHORT_ZERO) << 16) + 256 * read() + read());

            case BC_LONG_INT:
                return Long.valueOf(parseInt());

            case 'L':
                return Long.valueOf(parseLong());

            case BC_DOUBLE_ZERO:
                return Double.valueOf(0);

            case BC_DOUBLE_ONE:
                return Double.valueOf(1);

            case BC_DOUBLE_BYTE:
                return Double.valueOf((byte) read());

            case BC_DOUBLE_SHORT:
                return Double.valueOf((short) (256 * read() + read()));

            case BC_DOUBLE_MILL: {
                int mills = parseInt();

                return Double.valueOf(0.001 * mills);
            }

            case 'D':
                return Double.valueOf(parseDouble());

            case BC_DATE:
                return new Date(parseLong());

            case BC_DATE_MINUTE:
                return new Date(parseInt() * 60000L);

            case BC_STRING_CHUNK:
            case 'S': {
                _isLastChunk = tag == 'S';
                _chunkLength = (read() << 8) + read();

                int data;
                _sbuf.setLength(0);

                while ((data = parseChar()) >= 0)
                    _sbuf.append((char) data);

                return _sbuf.toString();
            }

            case 0x00:    /// !注意:出于篇幅考虑,此处省略了一大波 case ** ... 
            case 0x1f: {
                _isLastChunk = true;
                _chunkLength = tag - 0x00;

                int data;
                _sbuf.setLength(0);

                while ((data = parseChar()) >= 0)
                    _sbuf.append((char) data);

                return _sbuf.toString();
            }

            case 0x30:
            case 0x31:
            case 0x32:
            case 0x33: {
                _isLastChunk = true;
                _chunkLength = (tag - 0x30) * 256 + read();

                _sbuf.setLength(0);

                int ch;
                while ((ch = parseChar()) >= 0)
                    _sbuf.append((char) ch);

                return _sbuf.toString();
            }

            case BC_BINARY_CHUNK:
            case 'B': {
                _isLastChunk = tag == 'B';
                _chunkLength = (read() << 8) + read();

                int data;
                ByteArrayOutputStream bos = new ByteArrayOutputStream();

                while ((data = parseByte()) >= 0)
                    bos.write(data);

                return bos.toByteArray();
            }

            case 0x20:    /// !注意:出于篇幅考虑,此处省略了一大波 case ** ... 
            case 0x2f: {
                _isLastChunk = true;
                int len = tag - 0x20;
                _chunkLength = 0;

                byte[] data = new byte[len];

                for (int i = 0; i < len; i++)
                    data[i] = (byte) read();

                return data;
            }

            case 0x34:
            case 0x35:
            case 0x36:
            case 0x37: {
                _isLastChunk = true;
                int len = (tag - 0x34) * 256 + read();
                _chunkLength = 0;

                byte[] buffer = new byte[len];

                for (int i = 0; i < len; i++) {
                    buffer[i] = (byte) read();
                }

                return buffer;
            }

            case BC_LIST_VARIABLE: {
                // variable length list
                String type = readType();

                return findSerializerFactory().readList(this, -1, type);
            }

            case BC_LIST_VARIABLE_UNTYPED: {
                return findSerializerFactory().readList(this, -1, null);
            }

            case BC_LIST_FIXED: {
                // fixed length lists
                String type = readType();
                int length = readInt();

                Deserializer reader;
                reader = findSerializerFactory().getListDeserializer(type, null);

                boolean valueType = expectedTypes != null && expectedTypes.size() == 1;

                return reader.readLengthList(this, length, valueType ? expectedTypes.get(0) : null);
            }

            case BC_LIST_FIXED_UNTYPED: {
                // fixed length lists
                int length = readInt();

                Deserializer reader;
                reader = findSerializerFactory().getListDeserializer(null, null);

                boolean valueType = expectedTypes != null && expectedTypes.size() == 1;

                return reader.readLengthList(this, length, valueType ? expectedTypes.get(0) : null);
            }

            // compact fixed list
            case 0x70:
            case 0x71:
            case 0x72:
            case 0x73:
            case 0x74:
            case 0x75:
            case 0x76:
            case 0x77: {
                // fixed length lists
                String type = readType();
                int length = tag - 0x70;

                Deserializer reader;
                reader = findSerializerFactory().getListDeserializer(type, null);

                boolean valueType = expectedTypes != null && expectedTypes.size() == 1;

                return reader.readLengthList(this, length, valueType ? expectedTypes.get(0) : null);
            }

            // compact fixed untyped list
            case 0x78:
            case 0x79:
            case 0x7a:
            case 0x7b:
            case 0x7c:
            case 0x7d:
            case 0x7e:
            case 0x7f: {
                // fixed length lists
                int length = tag - 0x78;

                Deserializer reader;
                reader = findSerializerFactory().getListDeserializer(null, null);

                boolean valueType = expectedTypes != null && expectedTypes.size() == 1;

                return reader.readLengthList(this, length, valueType ? expectedTypes.get(0) : null);
            }

            case 'H': {

                boolean keyValuePair = expectedTypes != null && expectedTypes.size() == 2;

                // fix deserialize of short type
                Deserializer reader;
                reader = findSerializerFactory().getDeserializer(Map.class);

                return reader.readMap(this
                        , keyValuePair ? expectedTypes.get(0) : null
                        , keyValuePair ? expectedTypes.get(1) : null);
            }

            case 'M': {
                String type = readType();

                return findSerializerFactory().readMap(this, type);
            }

            case 'C': {
                readObjectDefinition(null);

                return readObject();
            }

            case 0x60:    /// !注意:出于篇幅考虑,此处省略了一大波 case ** ... 
            case 0x6f: {
                int ref = tag - 0x60;

                if (_classDefs == null)
                    throw error("No classes defined at reference '{0}'" + tag);

                ObjectDefinition def = (ObjectDefinition) _classDefs.get(ref);

                return readObjectInstance(null, def);
            }

            case 'O': {
                int ref = readInt();

                ObjectDefinition def = (ObjectDefinition) _classDefs.get(ref);

                return readObjectInstance(null, def);
            }

            case BC_REF: {
                int ref = readInt();

                return _refs.get(ref);
            }

            default:
                if (tag < 0)
                    throw new EOFException("readObject: unexpected end of file");
                else
                    throw error("readObject: unknown code " + codeName(tag));
        }
    }


    private void readObjectDefinition(Class cl)
            throws IOException {
        String type = readString();    // com.google.common.collect.ImmutableList$SerializedForm
        int len = readInt();            // length=1

        String[] fieldNames = new String[len];    // String[1]
        for (int i = 0; i < len; i++)
            fieldNames[i] = readString();        // elements

        ObjectDefinition def = new ObjectDefinition(type, fieldNames);
        if (_classDefs == null)
            _classDefs = new ArrayList();
        _classDefs.add(def);                    // 确定对象定义,类型和字段列表
    }
    
    
    private Object readObjectInstance(Class cl, ObjectDefinition def)
            throws IOException {
        String type = def.getType();
        String[] fieldNames = def.getFieldNames();

        if (cl != null) {
            Deserializer reader;
            reader = findSerializerFactory().getObjectDeserializer(type, cl);

            return reader.readObject(this, fieldNames);
        } else {
            return findSerializerFactory().readObject(this, type, fieldNames);
        }
    }
           

对于我们的Case 1或Case 2,第一个字符是16进制的43(debug的时候是十进制67,可以调idea View as 进行转化),按hessian2协议【x43 # object type definition (‘C’)】表示C,走到上述代码的266行分支。

接下来readObjectDefinition,读取type为 com.google.common.collect.ImmutableList$SerializedForm ,length为1,有一个字段,字段名为elements。

然后继续readObject,tag为0x60,此时走到上述代码的272行分支,ref为0,从对象定义中拿到对应类型,readObjectInstance读取具体实例,class为null,走到334行,通过SerializerFactory.readObject读为对象。

先按type获取反序列化器Deserializer,JavaDeserializer readObject 读出Object为ImmutableList$SerializedForm(构造函数new个实例),对每个字段获取FieldDeserializer,elements字段的值为Object[];BasicDeserializer readLengthList中的OBJECT_ARRAY。

// # SerializerFactory
    public Object readObject(AbstractHessianInput in,
                             String type,
                             String[] fieldNames)
            throws HessianProtocolException, IOException {
        Deserializer deserializer = getDeserializer(type);

        if (deserializer != null)
            return deserializer.readObject(in, fieldNames);
        else if (_hashMapDeserializer != null)
            return _hashMapDeserializer.readObject(in, fieldNames);
        else {
            _hashMapDeserializer = new MapDeserializer(HashMap.class);

            return _hashMapDeserializer.readObject(in, fieldNames);
        }
    }
    
    
    public Deserializer getDeserializer(String type)
            throws HessianProtocolException {
        if (type == null || type.equals("") || UNKNOWN_CLASSES.containsKey(type)) {
            return null;
        }

        Deserializer deserializer;

        if (_cachedTypeDeserializerMap != null) {
            deserializer = (Deserializer) _cachedTypeDeserializerMap.get(type);

            if (deserializer != null)
                return deserializer;
        }


        deserializer = (Deserializer) _staticTypeMap.get(type);
        if (deserializer != null)
            return deserializer;

        if (type.startsWith("[")) {
            Deserializer subDeserializer = getDeserializer(type.substring(1));

            if (subDeserializer != null)
                deserializer = new ArrayDeserializer(subDeserializer.getType());
            else
                deserializer = new ArrayDeserializer(Object.class);
        } else {
            try {
                Class cl = Class.forName(type, false, _loader);    // 按class获取deserializer
                deserializer = getDeserializer(cl);                // cl=com.google.common.collect.ImmutableList$SerializedForm -> JavaDeserializer
            } catch (Exception e) {
                UNKNOWN_CLASSES.computeIfAbsent(type, s -> {
                    log.error("[EX-DUBBO-0020] Hessian/Burlap: '" + type + "' 是未知的类在"
                            + _loader + ":\n" + e.getMessage(), e);
                    return 1;
                });
            }
        }

        if (deserializer != null) {
            if (_cachedTypeDeserializerMap == null)
                _cachedTypeDeserializerMap = new ConcurrentHashMap(8);

            _cachedTypeDeserializerMap.put(type, deserializer);
        }

        return deserializer;
    }
           

对象反序列化步骤:读对象定义readObjectDefinition、获取对象实例readObjectInstance、各字段反序列化FieldDeserializer。

异常分析

对于Case 1 将ImmutableList作为Request参数进行序列化时,反序列化失败问题进行分析。

ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考

interface java.util.List 获取反序列化器 getObjectDeserializer 得到 CollectionDeserializer,但他未实现 readObject,导致走到AbstractDeserializer中的readObject,从而抛异常 UnsupportedOperationException,Request.list的字段解析失败,异常被 ObjectListFieldDeserializer.deserialize 捕获,打日志:HessianFieldException: Request.list: java.util.List cannot be assigned from null。

ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考

总结

  1. Java中的不可变集合(guava的 ImmutableList、ImmutableSet等,Java9引入的ImmutableCollections相关类型)不能作为POJO参数进行序列化,但是可以直接作为对象进行序列化;
  2. float 反序列化后精度丢失,double 没问题;
  3. keySet()返回的Set未实现Serializable,Hessian反序列化结果为ArrayList;
  4. 二维long数组、LocalDate可以正常序列化反序列化(针对参考资料3中提到的坑试验了下,发现我司使用版本没有该问题,可正常使用二维long数组和Java8引入的LocalDate)。
# 1 ImmutableList、ImmutableSet 作为参数序列化异常,作为对象可以序列化
//        request.setList(ImmutableList.of(1, 2));
//        byte[] bytes1 = serialize(request);
//        object = deserialize(bytes1);        // [x] HessianFieldException

//        byte[] bytes2 = serialize(ImmutableList.of(1, 2));
//        object = deserialize(bytes2);        // [OK]

//        request.setSet(ImmutableSet.of(1, 1, 2));
//        byte[] bytes3 = serialize(request);
//        object = deserialize(bytes3);       // [x] HessianFieldException

//        byte[] bytes4 = serialize(ImmutableSet.of(1, 1, 2));
//        object = deserialize(bytes4);        // [OK]


        # 2 float 精度丢失,double 没问题
//        float f = 3.1415926f;               // [x] 精度丢失变成3.1415925
//        double d = 3.1415926d;              // [OK]


        # 3 keySet()返回的Set未实现Serializable,Hessian反序列化结果为ArrayList
//        Map<String, Integer> map = new HashMap<>();
//        map.put("A", 1);
//        map.put("B", 2);
//        byte[] bytes = serialize(map.keySet());
//        object = deserialize(bytes);          // [x] 反序列化为ArrayList


        # 4.1 LocalDate 可正常使用
//        LocalDate now = LocalDate.now();
//        byte[] bytes = serialize(now);
//        object = deserialize(bytes);      // [OK]


        # 4.2 二维long数组可以正常序列化/反序列化
//        long[][] array = new long[][]{new long[]{1L, 2L, 3L}, new long[]{4L, 5L, 6L}};
//        byte[] bytes = serialize(array);
//        object = deserialize(bytes);        // OK

//        request.setArray(array);
//        byte[] bytes = serialize(request);
//        object = deserialize(bytes);        // OK
           

扩展:Java序列化框架对比

既然hessian2有上述这些问题,那为什么dubbo还将其作为默认的序列化协议?接下来我们扩展对比下hessian2和常用的Java序列化框架,对hessian2的优劣有个更全面的认识。(参考:几种Java常用序列化框架的选型与对比 )

Java序列化是把Java对象转化为字节序列,以便在网络中传输或存到文件里。反序列化是把字节序列恢复为Java对象。

影响序列化协议选型的因素主要有两个:1)序列化后字节序列的大小,如果太大将影响网络传输性能;2)序列化和反序列化过程的性能。

  • 大小和性能
    ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考
    ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考
    jdk和hessian1的序列化大小和性能相对最差;其他几种(kryo, hessian2, protobuf, fst等)差别不大。
  • 扩展性
    ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考
    • kryo不支持字段扩展比较坑,提供出去的rpc接口不能修改字段,如果入参或出参要新增字段则需要提供新接口;
    • FST通过@Version注解能够支持新增字段与旧的数据流兼容,但使用起来比较繁琐。
  • 数据类型支持
    ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考

    Hessian2 自定义类需要实现Serializable接口,这也就是rpc接口的request和response都要实现Serializable接口的原因。

    注:集合类型测试包括如下实现类:

    1. List测试内容:ArrayList、LinkedList、Stack、CopyOnWriteArrayList、Vector。
    2. Set测试内容:HashSet、LinkedHashSet、TreeSet、CopyOnWriteArraySet。
    3. Map测试内容:HashMap、LinkedHashMap、TreeMap、WeakHashMap、ConcurrentHashMap、Hashtable。
    4. Queue测试内容:PriorityQueue、ArrayBlockingQueue、LinkedBlockingQueue、ConcurrentLinkedQueue、SynchronousQueue、ArrayDeque、LinkedBlockingDeque和ConcurrentLinkedDeque。
  • 语法结构支持
    ImmutableList hessian2序列化失败问题分析问题描述环境说明问题复现Hessian 2 协议问题分析源码分析总结扩展:Java序列化框架对比参考
  • 小结
    • hessian2在Java序列化框架对比中的综合表现还是不错的,小巧、好用、性能还算不错。

参考

  1. 几种Java常用序列化框架的选型与对比
  2. Hessian 2.0 Serialization Protocol
  3. 论Hessian的各种坑爹骚操作