The picture quality voice is clear! Those audio and video black technology on the camera

With the development of the times and the advancement of science and technology, today's camera products have greatly improved in function and performance. In terms of application range, in addition to being used with PCs, cameras are also widely used in commercial fields such as video conferencing and live streaming. In order to improve the experience of using the camera, many manufacturers have added many new technologies to the product, such as making the camera focus faster and more accurate, so that the camera has a better audio experience. In this issue, we will talk about the audio and video black technology on the camera.

▲ Rocware RC08 is the speaker on the left, the camera on the right, and the edge part of the camera is its TOF laser focus system.

Market pain point: Full-featured cameras are rare

Search for "camera" as the keyword on e-commerce websites, and you will find that most of these products only integrate microphone functions, and cameras with speakers are rare. So why don't many cameras themselves have integrated speakers? First of all, when the speaker is integrated, the cost will increase; secondly, the speaker and the microphone are integrated on a camera, when the microphone collects audio, it will encounter the sound emitted by the speaker, which will cause the sound emitted by the speaker itself to be collected again and then played through the speaker, so repeatedly formed an echo, if the adjustment is not good, it will affect the user experience; finally, if the microphone does not support noise suppression, when there is other noise interference in the environment, it will affect the collection of sound quality. The quality of the voices heard by the other party and yourself will be greatly reduced. Noise suppression and echo cancellation are added, which further increases costs, which is why many cameras don't have speakers.

At present, there are a small number of cameras with integrated microphones and speakers on the market, but most of them are small brands that are unknown. Such products are usually less than 100 yuan of surveillance camera scheme, the microphone pickup effect is poor, the speaker is not to talk about sound quality, just to meet the user's "listen to a ring". Due to cost constraints, functions such as professional noise suppression and echo cancellation are not supported.

In addition, in terms of image quality, although there are currently many cameras that have increased the resolution to 4K, there are still great problems in focusing, such as a certain high-end camera that was used for live broadcasting in the MC evaluation room, which often has the problems of slow focus, inaccurate focus, and repeated focus. That is to say, most of the current cameras only focus on the resolution of the shooting, and there are shortcomings in audio, focus and so on.

So are there cameras on the market that have excellent audio quality and functions in all aspects? Yes, but the product is rare. We have found a camera on the market that is more balanced in all aspects: Rocware RC08, which is known as a camera that integrates a full HD camera, an omnidirectional microphone, and an all-frequency speaker, while building a 3A algorithm (AEC/AGC/ANS), TOF laser focus and other black technologies, and supports full-duplex dialogue, which is launched by the famous domestic audio and video communication equipment brand Weiheide. So, can this product really solve the user's pain points or "wang po sell melons"? We will next disassemble it, analyze its internal materials and analyze its AAA algorithm black technology.

▲ Rocware RC08 disassembly diagram

▲Rocware RC08 frame structure diagram

SSC333 main control chip + SC2239 image sensor + HT8693 power amplifier chip

After disassembling, it can be seen that the Rockware RC08 uses a SigmaStar (Xiamen Xingchen Technology Co., Ltd.) SSC333 main control chip, which is widely used in the field of home surveillance and cameras. According to public information, the SSC333 adopts a single-core design with a main frequency of 800MHz and is based on the ARM Cortex-A7 architecture. Although the SSC333 has few cores, it has a very high degree of integration. For example, it has a built-in ISP image signal processor, H.264 and H.265, and an MJPEG video encoder.

It also supports audio output and features peripheral interfaces such as audio analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) for extended flexibility. The SSC333 is compatible with multiple audio encoding formats such as G.711, G.726, and ADPCM, and can also support 3A (AEC, ANS, AGC) audio algorithms, which also lays the foundation for the audio performance of The Rockware RC08. In addition, the SSC333 has a built-in 512MB DDR2 memory, supporting WDR, multi-stage noise reduction, and multiple image enhancement and correction algorithms to provide better image quality.

▲The structure diagram of the SigmaStar SSC333 main control chip used by Rocware RC08

Paired with the SigmaStar SSC333 main control chip is an SC2239 image sensor from SmartSens, which is mainly used in surveillance systems, network cameras, dashcams, action cameras and video conferencing cameras. The SC2239 has 200W pixels, pixel sizes of 2.9μm× 2.9μm, and measures 1/2.8 inches, supporting images up to 1920 ×1080@30fps resolution. The SC2239 has a high light sensitivity and high signal-to-noise ratio of 38dB, and also supports irred lamps with 850nm/940nm wavelengths.

In order to make the camera have better sound performance, Rocware RC08 added an HT8693 mono power amplifier chip produced by Jiaxing Herun Electronic Technology Co., Ltd. This is an audio power amplifier with both Class AB and Class D operating modes, which can continuously output 11W of power under a load of 4Ω in Class D mode. The chip has an anti-breakout output control function, which can automatically monitor the output sound caused by the excessive amplitude of the input sound signal, and can improve the sound quality. In addition, it integrates filterless digital modulation technology to drive speakers directly and minimize distortion and noise issues with the output signal.

▲The HT8693 mono power amplifier chip used by Rocware RC08 supports two working modes of Class AB and Class D.

It can be seen that the most important core chips in the fuselage are the independent products of mainland enterprises, which also shows that the chips in the field of video surveillance and cameras have achieved independent control. Of course, in order to ensure the quality of the product, the chips used in RC08 are the head enterprises in the domestic production, and truly realize the independent substitution.

3A algorithm black technology has greatly improved audio performance

If hardware is the body of the product and software is the soul, then the algorithm is the central nervous system. Excellent algorithms allow the hardware to perform at full play, and in order to make the Rockware RC08 have a better experience in audio effects, it uses algorithmic technology to eliminate noise and echo interference while matching microphones and speakers. I believe many readers wonder how this is achieved? Next we will analyze its algorithmic techniques.

Rocware engineers gave the RC08 excellent 3A algorithms—AEC (Acoustic Echo Cancelling), AGC (Automatic Gain Control), and ANS (Active Noise Suppression). So how does this AAA algorithm work?

▲The essence of the principle of AEC echo cancellation algorithm technology is to compare the sound and then eliminate the noise.

If the speaker of the RC08 is playing a sound, the sound will be collected by its microphone again after it is transmitted through space and reflected, and when it encounters the sound of speaking again, if there is no AEC algorithm, it will cause the call to hear its own echo in a loop. The role of the AEC algorithm is to sift out unwanted echoes from the voice stream, and the most common algorithm is usually counter-elimination. AEC builds a speech model through the speaker signal and the echo signal generated by the speaker, and then estimates the echo through it, and then continuously modifies the filter coefficient so that the established speech model estimate is close to the true echo value. Finally, the echo estimate is canceled out from the input signal of the microphone to achieve the purpose of eliminating the echo. The more accurate the estimates in the algorithm, the better the echo effect of the filter. In addition, AEC can compare the input value of the microphone with the output value of the speaker, and then filter out the delay echo that has been reflected back many times.

▲The role of AGC is to automatically amplify or reduce the voice signal, so that the output voice will not be kept in the human hearing range.

So the question is, when the input voice signal is weak, is the AEC algorithm not working? This is when the AGC algorithm starts working. The size of our voice in daily face-to-face conversations is usually around 40 to 60dB, if two people are a little farther apart, when the sound is lower than 30dB, it sounds more difficult, if the sound is too loud, such as more than 100dB, it will make people uncomfortable. The role of AGC is to adjust the sound to the appropriate range, it is divided into analog adjustment and digital adjustment two ways, analog adjustment is through the acquisition of microphone, digital adjustment is through the sound data stream digital level adjustment. When the input signal is very weak, then the AGC will automatically amplify the voice signal, and when the input voice signal is too large, it will be suppressed, so that the output voice will not be large or small.

When the problem of echo is solved, in fact, there is noise from the external environment, such as noisy human voices in public places, the sound of playing music, etc. If the device also collects these noises, it will seriously affect the quality of the call. At this point, the ANS noise suppression algorithm is required. The role of the ANS algorithm is to suppress and eliminate interfering sound signals, while improving the signal-to-noise ratio and speech intelligibility of voice signals, so that people and machines can hear clearly. There are two types of noise, stationary and instantaneous noise, of which the noise spectrum of the former is relatively stable, while the spectrum of instantaneous noise is short and has no harmonics. Using the characteristics of noise, a reverse waveform is added to the sound data, and the noise is finally eliminated.

The Rocware RC08 filters out noise in the environment by using an omnidirectional microphone to collect voice and noise signals, and then compares the input signals collected by the microphone with the digital signals. This single-microphone approach to capturing and filtering noisy signals uses more complex algorithms and requires more algorithmic techniques.

▲According to the characteristics and type of noise, add a reverse waveform to the sound data and finally eliminate the noise.

TOF laser focus: fast, accurate

We know that whether it's a phone, a camera or even a projector, autofocus is one of the most critical performances. Rocware RC08 In order to obtain better imaging results, it added a TOF laser focus module. There are many ways to focus, such as Phase Detection Auto Focus (PDAF), Contrast Detection Auto Focus (CDAF), Laser Detection Auto Focus (LDAF) and so on. Since phase focus and contrast focus are both focused through the perception of external light, the requirements for ambient light are higher, and if the ambient light is dark, it will cause the focus speed to become slower. The laser focus method does not have this problem, it is by emitting infrared light, with the help of the reflected infrared light to calculate the distance between the photographed objects, and then the focus motor began to move to focus. Fast focusing is possible even in low-light environments, but the distance of infrared light emitted is limited, making it more suitable for use in indoor environments. Cameras like the Rockware RC08 use a combination of laser + TOF, which not only achieves fast focusing speed, but also more accurate. It can calculate the time it takes for light to travel from the light source to the subject or human body, thereby calculating the depth information of the subject or human body.

▲TOF laser principle.

Relatively speaking, the cost of USING TOF laser focus is relatively higher, which is very suitable for professional application scenarios such as video conferencing and live broadcasting. So how to tell if the camera uses laser focus? Taking the Rockware RC08 as an example, after the camera is powered on, the mobile phone camera is aimed at its laser focus transmitter, and a red reflection will appear on the mobile phone screen, which proves that this is a real laser focus camera.

Write at the end

We can see that in fact, there can be excellent audio and video effects on a small camera. However, behind the product, there needs to be solid materials, excellent design and excellent technology to be unique among many products. Rocware RC08 is such a product, the main control, COMS, power amplifier chip are readily available, and through the 3A algorithm (AEC/AGC/ANS), TOF laser focus and other black technology, combined with the international first-line brand audio tuning, so that the picture quality, voice can be clearly presented. Of course, this is only from the hardware and technical aspects of the analysis, in terms of actual performance, we will have a comprehensive experience of RC08 in the next issue, so stay tuned.

The picture quality voice is clear! Those audio and video black technology on the camera

Read on