laitimes

The latest trends in optical communication in backbone networks

author:Imagine 008

In today's article, Xiao Zaojun will talk to you about some of the latest technology trends in backbone network optical communication.

400G, it's really coming

Everyone may have heard that since last year, the backbone network of domestic operators has fully opened the curtain of 400G commercialization.

First, a large number of commercial verifications will be carried out in 2023, and then the full launch of centralized procurement. In 2024, it will be the official landing of large-scale commercial use.

Not long ago, in March 2024, China Mobile opened the world's first 400G all-optical inter-provincial (Beijing-Inner Mongolia) trunk line, which is regarded as an important landmark event.

The reason for upgrading the backbone network to 400G is obvious.

On the one hand, the growth of consumer Internet traffic brought about by residents' digital life (high-definition video, remote meetings, online live streaming, online games, etc.) is still continuing.

On the other hand, the entire industry is promoting digital transformation, and the traffic from the industry's digital systems has surged, intensifying the pressure on the backbone network.

There is another key reason for the sharp increase in pressure on the backbone network - the explosion of AI.

After the rise of the AIGC model, it has triggered a wave of AI. In order to meet the needs of AI services, a large number of intelligent computing centers need to be built. The model has developed from 100 billion parameters to 100 trillion parameters, and the GPU computing power cluster has also moved from 10000 calorie clusters to 10,000 calorie clusters or even 100,000 calorie clusters.

As mentioned in a previous article, a GPU computing power cluster is actually an array of massive GPU cards (GPU servers) connected together through high-performance networks (such as InfiniBand and RoCEv2). It has extremely high requirements for network performance and reliability, which directly affects training efficiency and cost.

In terms of the network port speed of the GPU server alone, it has started from 400G for a single port, and even 800G or higher is used.

Previously, GPU computing power clusters belonged to the category of DCN (Data Center Internal Network). Now, as the scale of the cluster continues to expand, the application of distributed intelligent computing centers to model training has begun to be considered.

In other words, several intelligent computing centers in different places will be used together for training.

This puts forward higher requirements for DCI (data center interconnection network), and the optical communication backbone network must be able to meet this demand in terms of technical performance.

Our country's strategy in computing power still adheres to the idea of "national overall planning and overall layout". Starting from February 2022, the mainland has launched the Eastern Data and Western Computing Project to build a national integrated computing power system.

To put it simply, on the one hand, we need to build a large number of data centers (equivalent to power plants), and on the other hand, we must also build a strong backbone transmission network (equivalent to the transmission grid) to "circulate" these computing power to meet the needs of all walks of life.

400G, how is it done?

The current optical communication backbone network, as the foundation of the entire digital society, must have ultra-large bandwidth (400G, 800G or even 1.6T in the future), ultra-low latency (multi-level delay circle), ultra-large-scale networking (serving distributed computing, and the AI cluster just mentioned), ultra-high stability, ultra-high reliability, ultra-high security, ultra-flexible deployment, intelligent operation and maintenance control, and other characteristics.

Today, let's talk about the most important rate bandwidth.

With the development of optical communication technology to the present, if you want to improve the speed, it is nothing more than to make a fuss in the following aspects:

First, there is the baud rate.

The transmission rate, which is the bit rate, is the number of bits transmitted per unit of time, and the unit is bit/s.

Bitrate = baud rate× number of binary bits corresponding to a single modulation state.

Baud rate is the number of symbols transmitted per unit of time. The higher the baud rate, the more symbols are transmitted per second, and of course the greater the amount of information, the rate will come up.

The baud rate is determined by the capabilities of the optical device. The more advanced the device chip manufacturing process, the higher the baud rate, the higher the rate (bit rate).

目前,CMOS工艺从16nm提高到7nm和5nm,波特率也逐渐从30+Gbaud提高到64+Gbaud、90+Gbaud、128+Gbaud。

Now 400G can be commercialized because the baud rate can reach 128Gbaud.

Let's look at the modulation.

The formula just now, the "number of binary bits corresponding to a single modulation state", is determined by the modulation method.

At present, there are three main modulation schemes of 400G technology: 16QAM, 16QAM-PCS (PCS is a probabilistic shaping technology, which will be introduced next time) and QPSK, which are suitable for different application scenarios.

Optical communication is not the same as wireless communication, and it does not blindly pursue high-order modulation.

The lower the modulation order, the lower the requirements for the line and the lower the cost of network construction. Therefore, in the early design stage of the long-haul backbone, the focus was basically on 16QAM and QPSK. Later, with 16QAM-PCS, the competition was also joined.

In the past, there was no mention of "Eastern Data and Western Computing", and operators believed that 400G would not require too long-distance transmission, so the use of low-baud rate devices with more mature technology and lower prices, with 16QAM with higher modulation order, is the mainstream opinion of the industry.

Later, on the one hand, due to the increase in the requirements for transmission distance, from more than 1000 km to several thousand km, and on the other hand, the 128GBaud baud rate device matured rapidly (in the DCN scenario, the rapid rise of 800G stimulated and promoted the industrial chain), creating conditions for QPSK to stand out.

QPSK has a higher tolerance to nonlinearity, and can appropriately increase the fiber insertion power compared with 16QAM-PCS. Second, the back-to-back OSNR thresholds of QPSK are optimized compared to 16QAM-PCS. In addition, the channel spacing of QPSK is set to 150 GHz, so that there is almost no filtering cost during transmission.

These advantages have made QPSK gradually become the industry's unanimous first choice in the backbone network and DCI.

At present, the first two scenarios are considered for more metro or provincial applications.

Third, there is the extended band.

The baud rate and modulation mainly affect the single-wave rate. An optical fiber can have multiple waves, as long as the spectrum range is large enough, it is fine.

Single-wave bandwidth× number of single-fiber waves = single-fiber bandwidth.

As mentioned in the previous table, the channel spacing of the QPSK 400G reaches 150GHz. Neither the traditional C-band nor the extended C-band are sufficient to meet the requirements for spectral bandwidth.

As a result, the C6T+L6T method is gradually adopted, with a total of 12THz of spectrum bandwidth. Calculating, 80 waves, single wave 400G, together is a single fiber 32T capacity. If you sacrifice a little distance and use it to save energy, deploy QPSK or 16QAM-PCS, and the capacity can be larger, reaching 48T.

For a detailed introduction to the bands, you can see here: What are the bands of optical communication?

The biggest problem with the extended band is whether the device can support it and whether the cost is manageable. The devices mentioned here, including ITLA, CDM, ICR, EDFA and WSS, involve optical transceiver and optical path switching and amplification.

In the case of band expansion, there is also a problem involved, and that is integration.

Today's band expansion is actually more like a simple binding of two systems (C and L). The two systems operate independently, transmit through wave combination, and then go to the other end, and then divide the wave, and continue to process each.

The two systems will be larger, the power consumption will be higher, and the design will be more complex. Therefore, the industry needs to study how to integrate devices and truly make a system support different expansion bands at the same time. That is, to achieve true integration.

In addition to optical modules and optical equipment, optical fiber communication also needs to be paid attention to.

The current mainstream optical fiber is G.652D optical fiber. 400G QPSK, on G.652D, with the help of EDFA amplification, can also transmit 1500km.

After years of validation, the industry has decided that G.654E fiber is the new successor. If G.654E is used with better performance, the transmission distance of 400G QPSK can be increased by more than 30% under the same conditions.

G.654E fiber is already capable of large-scale production and will be deployed on a large scale on long-haul trunk lines. Some low-loss optical fibers of the G.654 series have also become the first choice for ultra-long-distance transmission across oceans in submarine cable systems.

In addition to traditional fiber. The industry also believes that multi-core optical fiber and hollow fiber have broad application prospects.

Multi-core optical fiber is a kind of space division multiplexing, in which more cores are stuffed into an optical fiber and fewer modes are used, which can greatly increase the capacity of the optical fiber.

Hollow optical fiber is even more awesome, directly make the optical fiber hollow, and replace the glass fiber core with air.

Air-core fiber has been proven to deliver greater capacity, lower latency, smaller transmission loss, and ultra-low nonlinearity, and is recognized by the industry as one of the most promising technologies in optical communications.

400G的下一步,800G or 1.6T?

After the official large-scale commercial use of 400G, the entire industry will focus on the technical standard system of more than 400G.

As for whether to engage in 800G, 1.2T or 1.6T next, the industry is still stepping up its argumentation.

If you want to achieve a higher rate, you have to continue to make a fuss about "modulation mode + baud rate". 130GBd, or higher 260GBd, is the inevitable direction. A higher baud rate means that related devices must keep up and form a mature industrial chain.

Beyond 400G, QPSK can no longer be counted on. 16QAM modulation is currently a widely accepted option in the industry.

The band also needs to be further expanded. In addition to the expansion of C and L, we will consider expanding to S-band, U-band, E-band, etc. If it is C+L+S, it is 12T+5T, reaching a bandwidth of 17THz.

With the combination of multiple factors, the transmission rate of a single fiber in a single direction exceeds 100Tbps, which is just around the corner.

Inside the data center, 800G (based on baud rates above 100GBd, 100G per lane) has been commercialized. Single-channel 200G, 400G, 800G, but the time is early and late. In this regard, progress abroad is a little faster.

As capacity continues to grow, so do the technical challenges. The development of optical communication, to put it bluntly, depends on devices, chips, processes, and materials.

To meet the requirements of power consumption, security, and O&M, it also depends on a series of innovations such as process, architecture, packaging, artificial intelligence, and digital twins. There is still a lot of work that needs to be done in the upstream and downstream of the industrial chain. There is still a long way to go.

Final words

Optical communication is the digital artery of the entire society. Over the years, many technologies (including 5G) have been questioned, but no one has questioned optical communication because it is a rigid need for social development.

The trend of increasing human data traffic will not change for decades to come. The rapid rise of artificial intelligence technology will further amplify this trend.

The current development of optical communication cannot meet the demand. This means that companies will have a greater incentive to invest resources in R&D in order to make a profit.

It is hoped that the optical communication industry can further explode and pave the way for the development of a digital and intelligent society.

For more information, please click Full-Scenario Live Streaming Solution - Aerospace Cloud Network Solution

Read on