#精品长文创作季#

Welcome back to Xiaomi's technology sharing space. Today, we're going to dive into an important part of network communication that can't be ignored – serialization. Without further ado, let's unravel the mystery of serialization together!

background

As an important concept in the field of computing, serialization is rooted in the needs of distributed systems and cross-language communication. With the development of information technology, the exchange of data between different systems has become an increasingly common need. In this context, serialization has emerged as a key technology to realize the transmission and storage of objects in the network.

First, we can trace the rise of distributed systems. With the popularity of computer networks, people want to be able to distribute data and computing across different computing nodes to improve the performance and scalability of the overall system. In a distributed system, nodes need to communicate with each other through the network, which requires data to be effectively transferred between nodes. To achieve this, serialization technology has been developed to enable data to be transmitted over the network by converting objects into byte sequences.

Secondly, the need for cross-language communication is also driving the development of serialization technology. In modern software development, different programming languages are widely used in different fields, and the exchange of data between different languages has become a common situation. However, different programming languages have differences in how data is represented, and a common way of representing data is needed in order to communicate across languages. Serialization technology provides the basis for communication between different languages by serializing objects into a common sequence of bytes.

Serialization and deserialization

Serialization is the process of transforming an object into a sequence of bytes that can be persisted, transferred, or stored. In order to efficiently pass an object in a computer system, we need to convert it into a stream of bytes so that it can be transferred over the network or stored on disk. Serialization is a concrete implementation of this process that converts the state of an object into a sequence of bytes, so that the structure and data of the object can be passed and stored in different contexts.

Deserialization is the inverse process of serialization, i.e., the reconversion of a sequence of bytes into an object. When the receiver receives the serialized byte stream, the original object structure and data can be restored through deserialization. In this way, data can be passed between different systems and objects can be rebuilt at the receiver for seamless transfer and restoration of data.

Java serialization

Next, we'll focus on Java serialization. Java provides its own serialization mechanism, so let's find out.

What is Java Serialization?

Java serialization is an important Java programming technique used to convert objects into a sequence of bytes for transmission over a network or to disk. The key to implementing this mechanism is for the class to be serialized to implement the Serializable interface, which marks an instance of a class that can be serialized.

With Java serialization, we are able to persist the state of an object in the form of a byte stream, so that the object can be passed and stored between different systems. This provides a convenient solution for distributed systems and data storage. For example, when a Java object needs to be transmitted over a network, we can use serialization to convert the object into a stream of bytes, which can then be deserialized on the receiving end, reverting to the original object.

In fact, the principle of Java serialization is to recursively access all the member variables of an object, converting them into a byte sequence. This sequence contains the object's class information, member variable names, and corresponding values. During deserialization, the system reconstructs the object based on the information in the serialized stream. It's important to note that both basic data types and some common object types support serialization by default.

How it works

The implementation principle of Java serialization involves a series of source-level operations, so let's dive into the source code to understand how Java is serialized.

In Java, the key to serialization is to have the class implement the java.io.Serializable interface. This interface is a markup interface, no method is defined, and its existence tells the Java runtime system that this class can be serialized.

During serialization, the ObjectOutputStream class is responsible for writing objects to the byte stream. This class uses a series of methods internally to implement serialization. Among them, the writeObject method is the key serialization method. Inside the method, it first checks whether the serializable interface is implemented on the serialized object, and then recursively calls the writeObject method of the object to write each member variable of the object to the byte stream. In addition, ObjectOutputStream takes special care of static variables and transient-modified member variables, so that static variables are not serialized and transient-modified variables are skipped.

The process of deserialization is done by the ObjectInputStream class. During deserialization, the readObject method is called internally by the class, which recursively calls the object's readObject method to restore the data in the byte stream to an object. It's important to note that the process of deserialization may call a class's constructor, so it's critical to ensure the security of the constructor.

Defects in Java serialization

Although Java serialization provides a convenient way to transfer objects, it also has some problems in practical applications.

Not cross-linguistic

One glaring drawback of Java serialization is that it does not easily cross the boundaries of programming languages, leading to some limitations when used in multilingual environments. This is mainly due to the fact that the byte stream format generated by Java serialization is Java-specific and difficult for other programming languages to parse directly.

This compatibility issue between different languages can become especially acute when building distributed systems or communicating across languages. Objects serialized in Java may contain Java-specific meta information and structures, which can be incorrectly understood by parsers in other languages. This limitation hinders the ability to seamlessly communicate data between different languages, as the receiving end needs to understand and implement the same serialization algorithms as Java.

To address this, developers often need to consider using a more generic, cross-language serialization framework, such as JSON or Protobuf. These formats can be easily parsed by multiple programming languages, allowing for wider language compatibility.

Vulnerable

There are security vulnerabilities in AVA serialization, the most notable of which is deserialization attacks. An attacker can trigger remote code execution by crafting malicious serialized data, causing serious security risks. In this case, the application could inadvertently execute code that is under the control of an attacker during the deserialization phase, leading to potential harm.

To solve this problem, Java 8 introduces the java.io.ObjectInputFilter interface, which provides a mechanism that allows developers to filter out unsafe classes when deserializing. By implementing a custom ObjectInputFilter, developers can control which classes are allowed and which are prohibited when deserialized. This way, even if an attacker constructs malicious serialized data, they can prevent dangerous classes from loading and executing during deserialization.

In addition, some unnecessary deserialization operations can be safely handled by restricting the classpath or using methods such as readObjectNoData. This helps to minimize the potential attack surface and reduce the loading and execution of unknown classes, which improves the security of the system.

The serialization stream is too large

The size of the serialized binary stream reflects the performance of serialization. The larger the serialized binary array, the more storage space it takes up, and the more expensive it is to store the hardware. If we are doing network transmission, it will take up more bandwidth, which will affect the throughput of the system.

Java serialization uses ObjectOutputStream to implement binary encoding of objects, so is there a difference between the size of the binary array completed by the binary encoding realized by this serialization mechanism and the size of the binary encoded array implemented by ByteBuffer in NIO?

Performance: Serialization of Network Communication Optimization

Here are the results:

By comparing the two examples above, we can see that when using NIO for array serialization, the resulting byte encoding length may be shorter, as NIO provides a more low-level and flexible way to process the data. This is especially important for distributed systems that transfer data at scale, as smaller byte streams can reduce network transmission and storage overhead.

The performance is too poor

The speed of serialization is also an important indicator of serialization performance, if the speed of serialization is slow, it will affect the efficiency of network communication, thereby increasing the response time of the system. Let's use the above example to compare the performance of Java serialization with ByteBuffer encoding in NIO:

Here are the results:

By running the above two sample codes, we can observe that the time for array serialization with NIO is relatively short, because NIO provides a more low-level and efficient way to process data. This is especially important in scenarios where large-scale data transmission and frequent network communication occur, as shorter serialization times can improve the performance of the system.

Protobuf 序列化代Java 序列化

In the face of some limitations and performance problems of Java serialization, we can consider using Protobuf (Protocol Buffers) as an alternative to Java serialization. Protobuf is an efficient, lightweight binary serialization framework developed by Google that offers significant advantages in network communication and data storage.

First, the serialization streams generated by Protobuf are typically more compact than Java serialization, which reduces the overhead of network transmission and storage. Protobuf uses binary encoding, which produces a smaller stream of bytes compared to the Java serialized text format, which improves transmission efficiency.

Second, Protobuf's serialization and deserialization are generally fast. This is because Protobuf uses a message-based serialization approach, which does not recursively process all the member variables of an object, but rather passes the data according to a predefined message structure. This makes Protobuf an excellent performer in high-performance demanding scenarios.

In addition, Protobuf is a cross-language serialization framework, which means that data serialized through Protobuf can be easily exchanged between different programming languages, which solves the problem that Java serialization cannot be cross-language.

Protobuf sample code

First, we need to define a Protobuf message type, which can be achieved by writing a .proto file. For example, let's define a simple message type:

Next, use Protobuf's compiler to compile the .proto file into a class-file for the corresponding language. In Java, we can get the Person class.

We can then use the generated class for serialization and deserialization:

With the above example, we not only achieve serialization and deserialization of data, but also avoid a series of problems with Java serialization.

END

In general, serialization in network communication is a part of the development process that needs to be focused on and optimized. Although Java serialization is simple and easy to use, it has certain issues in terms of performance and data size. In order to improve the efficiency of network communication, we can consider using an efficient serialization framework such as protobuf to replace Java serialization to achieve better performance optimization.

If you are interested in network communication optimization or other technical topics, please continue to pay attention to the Xiaomi Technology Sharing Official Account, and we will bring you more wonderful technology sharing from time to time!

If you have any questions or more technical sharing, please follow my WeChat public account "Know its nature and know why it is so"!

Performance: Serialization of Network Communication Optimization