laitimes

The Phoenix AI engine was released, and real-time audio really looked like it was everywhere

After ChatGPT caught fire, Sam Altman's words began to be dug up, chewed repeatedly, and taken as a guideline.

He has a very simple way to define whether a product or technological innovation has certainty: a small group of people who are first exposed to it will spend a long hour a day immersing themselves in it, and this thing can probably be done; Conversely, if a new thing is still at a stage where even a small group of people cannot be addicted, then a new wave is not yet the time.

The OpenAI founder re-praised the iPhone, stepped on VR again, and then pushed ChatGPT onto that "iPhone moment." But this seemingly idealistic judgment based on common sense is not without counterexamples, such as the Clubhouse, which was briefly popular.

Perhaps there is a longer tail angle judgment idea. For example, when the most solid Nokia users also began to inquire about Jobs, the "iPhone moment" really came; Or when conservative Middle Eastern netizens in white robes and scarves begin to socialize with their mobile phones, the world is sure to have really crossed further into a huge wave of real-time interaction.

Founded in 2016, Yalla Group is currently the largest company in the Yuchat market, becoming the first technology company in the UAE to be listed on the New York Stock Exchange in 2020. Three years after listing, this is already a huge traffic body with nearly 32 million monthly active users and more than 12 million paid users.

In Yalla's chat room, you can often see more than 1,000 people present at the same time. In this online audio space of 2,000 people, many voices are coming from Saudi Arabia, Qatar and even the United Arab Emirates.

This is happening.

A wave of real-time interaction

The wave of real-time interaction is surging in China, and the emergence of countless online concerts proves the technological change of this real-time interaction.

Luo Dayou, who has rarely been seen in the public eye in recent years, completed his first online concert in May last year, singing 21 songs and 42 million people watching the performance through WeChat Channels. On the same day, Sun Yanzi made her first online "singing chat" on Douyin, and the number of views (the same account can be counted repeatedly) reached 240 million.

The outside world attributes the rise of audio and video products around the world to the promotion of the epidemic, and describes the trend of domestic online concerts as a platform battle, but from the perspective of the technical background, the breakthrough of real-time audio and video transmission network (RTN) in terms of delay reduction, weak network confrontation, etc., and even the technological maturity of real-time interaction with audiovisual as the information circulation mode as a whole, is the basis for talking about all this.

In 2011, WebRTC (Web Real-Time Communications) was open sourced, and then a full ten-year cycle, the W3C and IETF standards development organizations announced WebRTC as an official standard in 2021, after which users can support real-time audio and video communication on the network without downloading additional components or separate applications.

"This means that real-time audio and video will be brought anywhere on the Web, bringing the standardization process to WebRTC's first generation technology to a perfect end." Agora CEO Zhao Bin concluded in 2021, he also regarded the moment when WebRTC became an official standard as a starting point, and "the discussion of the next generation of WebRTC technology, industry, standard evolution and other aspects will also be officially put on the agenda."

RTE has gone through the establishment process from 0 to 1 from technology to user mind, and the future evolution direction will be given by the terminal scene. From complex and comprehensive scenes such as the meta-universe to vertical scenes such as online concerts, they have all emerged in recent years. In between, online karaoke is probably one of the most extreme scenarios in real-time interaction.

Its core gameplay has nothing to do with other senses than hearing, which is entirely dependent on the progress of audio capabilities in real-time interactions. According to a research report by iResearch, a delay of 400ms can be regarded as a necessary condition for a strong interactive experience, and when the actual delay reaches 200ms, the real-time interactive experience begins to approach reality. For a demanding scene like a multiplayer karaoke, the 200ms delay already means a sense of dislocation that can't be ignored when singing in chorus. The ideal real-delay threshold for real-time chorus needs to be as low as around 50ms.

George Lucas, the "father of Star Wars", once said: "Half of the effect presented by movies is composed of sound effects". Cinema is the earliest dreamlike invention of immersion, and now that a more intense, virtual and real mutual invasion is about to emerge, the audio ability in real-time interaction is first tested. In the process, real-time audio as a foundational capability is being further atomized.

The Fengming AI engine is rising, and the audio capability is sinking

The traditional RTC concept, from the perspective of information transmission, the audio function only provides simple voice communication, meeting a single scenario and call standard, and there is no high demand for sound quality, which is the so-called "able to communicate". With the emergence of real-time interactive innovation scenarios, users' demand for audio experience has also evolved from quantitative to qualitative change.

Audio entertainment is no longer a unique "demand", but a standard configuration in all pan-entertainment scenarios, which puts forward higher requirements for providers of RTE technology, products and solutions.

For example, in scenarios such as online K songs and online meetings, users' needs have long changed from being able to communicate to wanting to "detach from reality", shield external interference, and achieve pure communication; For scenarios such as metaverse, virtual events, and game competitions, users hope to achieve an "extremely realistic" immersive experience from simple communication.

Adapting to today's real-time audio technology requires both an infinitely close to the real world in terms of hearing and a detachment from reality in terms of experience. The combination of these two is Agora Phoenix AI engine.

Source: Agora Network

On March 23, Agora released a new generation of audio technology intelligent engine "Fengming AI Engine", which includes AI noise reduction, AI echo cancellation, spatial audio, and the best sound effects. Developers and enterprises can flexibly call corresponding components like building blocks, and are widely used in many scenarios such as social conversations, online karaokes, online meetings, game competitions, and virtual activities

From Yalla to Oasis, Agora real-time audio and video technology is providing underlying capabilities. These earliest audio practices accumulated in the chat room scene began to precipitate into sound effect configuration schemes in different scenarios, and these solutions became the voice capabilities of the best sound effect of the Fengming AI engine after productization.

If the problem of sound quality and delay is regarded as the initial problem encountered by real-time audio when restoring reality, then the sense of space that simulates sound has begun to become a new increment. Spatial audio capability is also one of the voice capabilities that Agora AI Engine is eye-catching.

Phoenix Ming Spatial Audio technology can simulate the stereo field of the spherical area of the head, so that the user has a sense of space in the audio hearing. When the user operates the virtual character to move in the virtual scene, it can realize different sound effects according to the face orientation, sound source orientation, distance and height of the virtual character, and perfectly simulate the real auditory feeling.

The spatial audio capability of the Fengming AI engine can complete the reconstruction of a large number of scenes.

Source: Agora Network

During the epidemic, a large number of new scenes of online exhibitions and online museums have emerged, and excellent visual spatial effects can be built in these scenes; On the other line, an interactive podcast led by Clubhouse pushes the spatial sense of sound to the outside world. Such spatial audio effects, if superimposed on existing online scenes, will further subvert the experience of the latter.

For example, when Luo Dayou reappears in the online concert, the audience can hear the difference in the position of guitarist and bassist in the band behind Luo Dayou; Or on an online tour of the museum, visitors can hear others "around" talking about the exhibits as they walk.

At the same time, spatial audio is the best partner for 3D scene gameplay such as metaverse and games, such as werewolf killing, virtual concerts, virtual events and other scenes, which can effectively enhance users' online interaction and auditory experience, and reconstruct users' immersion and presence in the virtual world.

And because Agora 3D spatial audio adopts a pure software algorithm scheme, there is no need to consider hardware device factors for the developers of the call. You can experience the immersive experience on mobile phones and computers through any headset, and support iOS, Android, Mac, Windows, Unity, Unreal and other platforms. Developers don't have to worry about the impact of spatial audio features on user devices. According to the data, after the spatial audio function is enabled, the CPU consumption of the corresponding device increases by the average

, the average increase in memory consumption

Immersion is left to AI

The ultimate sound effects and spatial audio capabilities allow the Fengming AI engine to restore the realism of the sound in real-time interactive scenes as much as possible. On the other hand, AI capabilities make this reality an immersion that is detached from reality.

Keyboard tapping, house renovation movements, or outdoor car noise, these real sounds can affect the formation of immersion. Agora Phoenix AI Engine integrates AI noise reduction capabilities that use algorithms to shield steady-state and unsteady noise, which can strongly suppress 100+ types of burst noise without damage to human voices, and achieve a pure call experience in low signal-to-noise ratio or vocal-intensive scenarios. Agora said that the noise reduction ability of the Fengming AI engine can cover almost all types of noise commonly found in reality.

Source: Agora Network

The improvement in noise is essentially about guaranteeing a clean call experience in real-time audio communication. The flexible noise reduction capability of the Fengming AI engine can ensure strong noise reduction while taking into account high fidelity. This means that it can penetrate scenes such as shopping malls where human voices are extremely dense. And when the interlocutor is temporarily away from the microphone and the sound is blurred, the AI noise reduction algorithm of Agora can also make the user's voice clearly heard by the peer.

In addition, another capability of the Fengming AI engine lies in its powerful echo cancellation ability. In scenarios such as online meetings, online karaokes, and multi-person microphones, the presence of echo is one of the biggest factors affecting call quality and interactive experience. Agora AI echo cancellation technology can use algorithms to effectively suppress the echo reverberation generated in the environment, and can intelligently adapt to various environments and accurately separate different sound sources, eliminate unnecessary remote signals from the mixed near-end signal, and retain the near-end human voice to send to the far end, so as to achieve comprehensive echo elimination and achieve high-fidelity audio experience.

Source: Agora Network

End

Agora released the RTE Vientiane Atlas in 2021, a huge real-time interactive ecological picture in which more than 200 scenarios have appeared around more than 20 industry tracks such as education, pan-entertainment, IoT, enterprise collaboration, finance, and healthcare. As a real-time interactive underlying technology service provider, Agora is driving this wave, and the clear feedback brought is that the improvement of audio experience can enhance the core experience of platform users.

For relevant developers and industry users, this directly means that the suppression of noise can improve the activity and retention of users in the chat room, as well as the call duration of the game hacking scene; Echo cancellation can significantly improve the user's online K song experience; The three-dimensional audio experience with a sense of space can enhance the sense of presence and immersion of users in scenarios such as metaverse social networking, game competition, online meetings, and virtual events.

With the continuous expansion of business boundaries and the increasingly vertical demand for real-time audio and video experience, it is difficult to monetize the previous single-function gameplay, and the integration of more gameplay is the trend, and Fengming AI Engine is an integrated real-time audio solution.

Compared with the innovation of Agora Solo and Nova engines in the audio Codec dimension, the improvement of Fengming Engine mainly focuses on the introduction of 3A, spatial audio and AI methods, which is the result of Agora long-term investment in core technologies in the field of RTC audio. Xu Ran, an expert in Agora algorithms, pointed out that in the future, Agora will form a new generation of RTC audio solutions based on the Fengming AI engine, such as exploring more personalized voice solutions, voice superresolution, and co-experiencing scenarios.

And the Fengming AI engine itself will continue to evolve. Yang Fan, head of Agora audio entertainment products, introduced that the Fengming AI engine is currently developing the function of voice change, and users will be able to experience 20+ voice changing styles and a variety of role-playing fancy conversation scenarios.

With the release of the Fengming AI engine, Agora has further strengthened its role as an underlying technology service provider. Further prosperity in the real-time interactive space will also begin with the integration and modularization of RTE-related technologies. Real-time interaction is starting to really become a daily necessity for the masses in the same way that humans need air and water, and it is everywhere.

Read on