laitimes

Real-time audio and video, the inflection point of the industry has arrived

Real-time audio and video, the inflection point of the industry has arrived

Image source @ Visual China

Text | Industrialist, author | Buckets

The wind outlet brings opportunities, and naturally also brings new challenges.

From the ancient flying pigeon book, to the modern telephone, telegraph, and then to the modern voice and video call, with the iterative upgrading of communication and network technology, people's requirements for information dissemination are getting higher and higher.

Under the high requirements of information dissemination, real-time audio and video have gradually entered people's field of vision, and under the role of the market, it has developed into a track with great potential, which has attracted the attention of countless entrepreneurs and capitals.

In the past decade, not only manufacturers with Internet giants such as AWS, Alibaba Cloud, Tencent Cloud, and NetEase Cloud Letter have made efforts based on real-time audio and video, but also many start-up "rookies" such as Sound Network Agora, Ronglian Cloud, and Rongyun have also appeared.

Nowadays, the application scenarios of real-time audio and video technology have been seen everywhere, from the game scene "eating chicken", the e-commerce scene live broadcast, the education scene teaching answers, and then to the financial scene bank video account opening.

The rise of real-time audio and video tracks is inseparable from technological upgrading and market environment factors. For example, 5G technology has accelerated the improvement of real-time communication infrastructure; the epidemic has spawned the rise of online education and driven the application of online education scenarios.

But in the ever-changing year of 2021, the track is quietly changing.

01 The underlying new infrastructure of the Internet

"People who sell microphones have begun to expand their audio and video business." A netizen who knows how to complain.

Undoubtedly, the real-time audio and video market is on the cusp of the times. However, just as Rome could not be built in a day, the outlet of real-time audio and video could not rise everywhere.

"Real-time audio and video" is not a term that will be deliberately concerned in life, and few people expect that today's real-time audio and video has become an important existence like the Internet "coal and hydropower".

After all, QQ and WeChat also had real-time audio and video functions, and live broadcasts and online courses were also popular throughout the network, but they were not as strong as the current development momentum.

So how did such a subdivided vertical track rise?

The rise of real-time audio and video can be traced back to around 2013, when players represented by SoundNet began to explore real-time audio and video technology. Prior to this, due to the immaturity of network technology and communication technology, the real-time audio and video technology at that time was "unsatisfactory" and was not accepted by the public.

After that, in the period of 2015-2018, PAAS and SAAS companies mushroomed and there was an entrepreneurial boom. The emergence of this business model has allowed entrepreneurs to discover some "niche markets" such as real-time audio and video. Especially around 2015, the amount and amount of investment and financing in the industry reached a local high. Among them, nearly 40 companies such as Instant Purchase, NetEase Cloud Letter, and Polyway entered the real-time audio and video track during this period and completed financing successively.

However, the real industry outbreak is actually a dual force that benefits the market environment and technological innovation.

In terms of market environment, the epidemic has forced enterprises to migrate offline activities online, a large number of enterprises have adopted the mode of remote office instead of offline office methods, and schools have opened "cloud" classrooms. This has led to a surge in the demand for real-time interactions.

According to the data of the consulting company IDC, in 2020, the size of China's video conferencing market increased by 18.9% year-on-year to about 6.52 billion yuan, showing explosive growth.

In addition, the market size of online education scenes has also ushered in an epoch-making growth trend, and relevant data show that the scale of China's education real-time audio and video market in 2020 increased by 46.9% year-on-year compared with 2019, reaching 4.7 billion yuan.

As an important application scenario of real-time audio and video, video conferencing and online education will undoubtedly have a milestone impact on the growth of the real-time audio and video market.

In terms of technological innovation, the Ministry of Industry and Information Technology issued the "5G Application "Sail" Action Plan (2021-2023)" in 2021, with a clear goal of 2023, and the level of 5G application development in China will be significantly improved. Its high-speed, low-latency, high-capacity network features help drive change in the real-time audio and video industry.

In the case of the rapid expansion of the high-definition communication market, it can better meet the demand of the real-time audio and video industry for low-level technologies such as high image quality and low latency.

It can be said that "time, place, people, and people are indispensable", and real-time audio and video service providers are catching up with the good time.

According to the "China Internet Development Report (2021)" released by the Internet Society of China, the real-time audio and video scale of China's Internet network reached 241.2 billion yuan in 2020, including comprehensive video, short video, online live broadcast, online music and other fields.

Overall, the real-time audio and video industry is supporting the e-commerce, social and pan-entertainment industries with a market size of trillions of yuan, and has become one of the new infrastructures at the bottom of the Internet industry.

02 The outlet also corresponds to the crisis

For the audio and video track, which is in the early stages and is still developing at a high speed, there are also many difficulties in the development process.

The same is true of Agora SoundNet, the first share of audio and video. "In terms of technical means, SoundNet is still in a relatively early stage, in fact, the future experience will be more immersive, more on-site, if according to this standard, I think we may have just passed." Founder and CEO Zhao Bin Tony once said.

In fact, not only SoundNet, but also players in the entire audio and video track are facing technical bottlenecks.

The first is low latency, if you want to achieve a relatively smooth real-time interaction, then the one-way end-to-end delay is about 400 milliseconds or less to ensure smooth communication. But in fact, multiple stages of data processing, transmission process will produce delays, this value is difficult to achieve.

For example, in the live broadcast of the Yunqi Conference some time ago, in the same hall where the live broadcast was watched, the audio-visual of the two live screens could not be synchronized, and there was an experience of "echo" in the hearing, and the two live screens were only separated by three or four meters.

In the actual environment, it is also necessary to consider the deployment of edge nodes, backbone network congestion, weak network environment, equipment performance, system performance and other issues, so the actual delay will be greater. Therefore, under the limitation of network conditions, "low latency" is difficult to maximize with current technology.

In addition, there is the problem of echo cancellation, the generation of echo is that the sound played by the speaker is re-collected by the microphone through the environmental reflection and transmitted to the other party, so that the other party will always hear their own echo, and the whole interactive experience will be very poor.

However, it is not simple to solve this problem, and it is very difficult to take the point that the equipment cannot be unified.

The device will greatly affect echo cancellation, such as a domestic mobile phone manufacturer, from the microphone to collect audio data to submit there is a delay of nearly one hundred milliseconds, then how the echo cancellation algorithm adapts to such a long echo delay of the mobile phone is very critical. For example, many users will use external sound cards or even simulators in live broadcasts, which will invisibly bring about echo delays.

In addition to the equipment, there is also a great correlation between the venue, for the ordinary conference room, setting an echo delay of 40 meters may be enough, but some conference venues can reach nearly 100 meters of this echo delay, which is also a challenge. Although major service providers are actively giving specific solutions, most of them are innocuous and the effect is not obvious.

In addition, the audio and video track currently has many technical pain points in many aspects such as fluency and massive concurrency.

Overall, the technical ceiling of the audio and video track has been revealed, and it is difficult to make a breakthrough technology of 0 to 1, so the technical level of the entire audio and video market service providers is not much different. It also means that service providers on this track want to obtain a solid moat through technological innovation.

Although it is difficult to obtain breakthrough innovation in technology, communication technology and network technology are still further mature, and the 5G that is being implemented will ensure the signal transmission range and signal stability.

For example, the WE-CAN global intelligent routing network launched by NetEase Cloud Information some time ago is based on the current relatively mature communication technology, through the establishment of decentralized mesh interconnection, to build the optimal network transmission channel between various regions in the world, to achieve the effect of providing caton-free real-time audio and video services for 99.9% of calls.

In addition, outside of technology, service providers have locked their eyes on application scenarios. It is undeniable that the current mass level and enterprise level have indeed been further accepting the application of real-time audio and video, which is reflected in the aspects of online education, online conferences, telemedicine, live broadcast +, etc. that will explode in 2020.

However, the changes in the market environment cannot be predicted, and it is inevitable that some application scenarios will be affected by policies and other factors.

The most direct and profound example is the "double reduction of education", which directly affects the important development field of real-time audio and video - online education.

Some time ago, the General Office of the CPC Central Committee and the General Office of the State Council issued the "Opinions on Further Reducing the Burden of Students' Homework and The Burden of Off-campus Training in the Compulsory Education Stage", and all discipline-based training institutions must not be listed and financed as a foregone conclusion.

The issuance of this document directly blocked K12 education, which was once the largest market size and the hottest capital development in the field of online education, and lost a promising way for real-time audio and video.

Some time ago, the relevant departments published the relevant provisions of the "Notice on Preventing Minors from Addicting to Online Games" some time ago, which strictly restricted the play time of minor users, and also increased the intervention in the consumption of minors' games, suppressing the barbaric development of some pan-entertainment fields.

Pan-entertainment is the focus of real-time audio and video services. It can be confirmed that after the suppression of pan-entertainment, it is bound to affect the real-time audio and video market in the short term.

As the saying goes, "Success is also Xiao He, and defeat is also Xiao He.". Although the market environment and technological innovation have promoted the explosion of real-time audio and video in the industry, it has also made this track "full of dangers".

03 The inflection point has arrived: running to the "meta-universe"?

As we all know, the problems caused by the technical level require a long period of R&D practice to make breakthroughs. Therefore, if enterprises want to break the situation, they should first look for a breakthrough from the application scenario.

A consensus in the TO B industry is that for customers of real-time audio and video tracks, the cost of system switching is somewhat high, and general customers do not want to change, more and more customers are using 2 mutual backups, large platforms and even more than 3, and some also develop a system.

Based on this, a clearer path is to lay out the main application scenarios early, use the first-mover advantage, and accumulate customers, so that you can obtain more market share.

What cannot be ignored is that although affected by the market environment, audio and video technology is being applied in more and more scenarios, with more and more possibilities.

Among them, the recent hottest is the concept of "metacosmity" pursued by Roblox and Facebook. At present, the latter has become the ultimate fantasy of many people about the virtual world.

"Metaverse" is an extremely broad concept, in which there are many application scenarios, and the services required for each scenario are different, such as social, shopping, finance and all other fields.

For example, in VR/AR/MR application scenarios, users often encounter cross-regional and transnational collaboration, so the network situation will be more complex. Therefore, it is particularly important to provide high-speed network services for users in any country/region around the world.

For example, in a competitive game or virtual office scene, you need to perceive the distance and orientation of other users through the sound of footsteps, and the spatial sound effect can make the user more immersive.

It can be found that the application scenarios under the concept of "metaverse" are inseparable from real-time audio and video technology. The application scenario of the "metaverse" is realized, and real-time audio and video is one of the important foundations.

But in the field of real-time audio and video that has not yet taken shape, there are still many unknown variables.

Among them, there are a large number of cross-border players in the real-time audio and video track, both the Internet giant represented by Tencent Cloud, the CPaaS manufacturer represented by Twilio, the video conference manufacturer represented by ZOOM and other multi-party competition, especially in the context of the "meta-universe" concept fire, the Internet giant will never look at this huge cake as nothing. It is not easy for real-time audio and video service providers to break through.

As we all know, Internet giants often have mature technology, strong ecological construction capabilities, sufficient funds, coupled with strong industry penetration capabilities, it is easy to form a dimensionality reduction attack on start-ups with insufficient "hematopoietic" ability on the track after the end.

As a result, a competitive pattern of "big fish eat small fish" may be formed, and start-ups may face the outcome of survival in the cracks or mergers and acquisitions.

But in the business world, weakness is the original sin. The wind outlet brings opportunities, and naturally also brings new challenges.

As more and more service providers lay out on this "golden track", more new real-time audio and video products will continue to emerge, and service providers who entered the game earlier and mastered the core competitiveness will surely rush into the long stream.

Read on