laitimes

From the Air to the Max, what's the problem with Rokid

Image source: @VisualChina

Text | AR Research Yuan

Potential and time

In the second half of 2022, with such a bleak economic and investment and financing environment, several AR startups raised tens of millions of dollars in financing in China, as if the bubble of 2016 and 2017 reappeared.

In the eyes of capital, smartphone shipments have fallen even less than 10 years ago, patrolling a circle, and the baton of consumer electronics is likely to be only in VR or AR.

The VR ceiling is very obvious, because VR is not safe outdoors, imagining that the VST camera to achieve environmental perspective is destined to be inferior to both eyes, it has delay, can not focus quickly, and there are image noise and overexposure problems under strong ambient light. VR creates a completely wrapped design of visual immersion, and even if the stuffy and heavy helmet form can be improved, it is more static, the range of limb motion is more fixed, and the application scenarios are destined to be limited. Somatosensory consoles are almost the entire imagination of VR commercialization.

In contrast, AR has created a truly digital three-dimensional space, which is based on a high-transmitted, visualized real space, which is closely related to your daily life, and you will be in it all the time in the future. And AR could theoretically include VR, switching to a fully computer-generated virtual visual immersion world as long as the lens transmittance drops to 0.

To C, work, life, learning, entertainment, AR can bring unique virtual and real interactive novel experience; To B, industrial overhaul, horticultural tours, remote guidance, medical training, battlefield information perspective and enhancement... Through digital visual transformation and enhancement, the space is assisted by voice, gesture, ring/EMG interaction to improve efficiency and productivity.

Taking the rumored Apple MR headset as an example, ideally, you can "open" a high-definition display anytime and anywhere in front of you, and then project a virtual keyboard in the environment, high-precision sensors recognize fine movements such as finger tapping and mouse dragging, sitting at Starbucks you can virtually tap the projection keyboard on the small table while drinking coffee, spatial mapping to browse the web, edit documents and modify PPT and other operations, which realizes real AR office. A pair of lightweight AR glasses is enough to replace the daily high-frequency applications of laptops.

AR makes the whole world in front of you intuitive, abstract, static and deducible, and can be deeply enhanced and transformed, with great potential.

But AR is too hard.

What Microsoft Hololens and Magic Leap showed

Full-featured AR glasses Microsoft Hololens 1/2 and Magic Leap 1\2, two companies with sufficient funds and intellectual resources, have been forced to shift from to C to B, from which we can peek at the difficulty of AR.

In imaging, LBS (laser scanning) and LCoS (silicon-based liquid crystal) light engines can modulate and project 1080P and 2K resolution color performance better pictures, but power consumption and volume are difficult to reduce and brightness cannot be improved. Ultra-small size, ultra-low power consumption, ultra-high brightness Micro LED light engine is the future, but AR/VR is currently based on a single-chip integrated process of Micro LED, first, can not do RGB normal color display at the level of source luminescent materials, can only rely on prism color combination, quantum dot color conversion, pixel vertical stacking and other engineering methods. Second, when the size of the LED chip is reduced to about 5-10μm, the luminous efficiency of the red LED is sharply reduced, which cannot realize the theoretical low power consumption and high luminous efficiency, and it is necessary to continue to explore new materials and chip structure schemes for red LEDs.

On the optical display, the diffractive light waveguide is light and thin, high light transmission, can see the natural environment, and can also be adjusted with other technologies, which greatly enhances the practicality of AR glasses. Diffractive light waveguides have also been improving light efficiency, expanding horizontal and vertical FOV, enlarging eyeboxes, eliminating rainbow patterns, achieving better color saturation and color reproduction, and presenting a cleaner and more realistic picture. However, simulating the most natural perspective of the human eye in three-dimensional space environment, virtual vision superimposed on the real environment, will encounter VAC visual radiation adjustment conflicts, multiple depth of field, black visual occlusion and other problems, diffraction light waveguide scheme can not be solved at present. This means that optical waveguide glasses will have light or heavy dizziness and discomfort when worn for a long time, and AR pictures cannot achieve 100% visual integration with the real environment. Magic Leap has spent a lot of energy, resources, and years of research and development of unique technology FSD (optical fiber scanning display), in the hope of achieving multiple depth of field and light field display the most in line with the human eye habits.

The most important function of AR glasses is the fusion of virtual and real reality, Microsoft Hololens or Magic Leap sensors and custom chips are numerous, they have multiple RGB cameras for gesture recognition, as well as SLAM three-dimensional spatial positioning and mapping, one or several infrared cameras for eye tracking interaction and screen rendering. In the past year or two of technical upgrades, the consumer lidar SLAM accuracy of SPAD sensor can be higher, the event camera is used to achieve eye tracking, the power consumption and delay of screen rendering can be significantly reduced, and better RGB cameras and gesture recognition algorithms, or more direct myoelectric interaction, are all visible technical directions. However, progress has been very limited in how these sensors can be miniaturized, low-powered, and highly sophisticated into a pair of lightweight glasses.

Under the existing technical conditions, the full-featured all-in-one AR glasses have a bulky helmet form and a battery life of 1-2 hours, and the application scenarios are limited to industrial and enterprise ends.

In contrast, VR is mostly a headset form, stemming from its complete calculation of a virtual world, to isolate the real physical world to create immersion, large FOV and isolate external sight, optical design is the key reason why VR volume is not small. With fully functional AR, the difficulty is higher. AR optics is not a computer-generated plane two-dimensional, the real three-dimensional space to do virtual vision, interaction and dynamic deduction, the requirements for sensing, calculation, and optics are all-round interlocking and ultra-difficult, and the existing technical conditions can only make trade-offs.

AR relies too much on optics, displays, high-precision SLAMs and advances in new interactive sensors, proprietary computing chips and algorithms, and must also be firmly limited to the preconditions of extremely low power consumption and extremely small size. Further, consumer AR glasses are best made into the weight and volume of ordinary glasses, so that the end market can accept it to the greatest extent.

Where is the way out?

Domestic Rokid leads, from Air to Max

At this stage, most of the consumer-grade AR is a compromise product solution, removing local computing power, batteries and storage, without sensing, gesture recognition and interaction, only the most static virtual and real picture fusion.

Split AR glasses featuring C-end, thanks to the maturity of high-brightness and high-saturation color Micro OLED micro-display screens in recent years, key manufacturers Sony and Seeya have achieved silicon-based Micro OLED high-yield mass production and cost reduction, which has brought qualitative changes to BirdBath/freeform AR glasses. From the end of 2021 to the beginning of 2023, Rokid, Thunderbird, Nreal, and even Huawei, Meizu, ZTE, and Honor behind are doing the AR viewing market little by little.

At the same time, monochrome Micro LED + optical waveguide + lightweight SLAM AR glasses featuring information prompts, INMO, Xiaomi, Li Weike, and OPPO are also asking for directions in the consumer market.

At 8 p.m. on March 21, I watched Rokid's press conference in front of the computer, and I personally felt that from the launch of Rokid Air at the end of 2021 to the Rokid Max in 2023, in addition to the increase in the size of the virtual projection screen, the progress was completely lackluster. Here are a few inferences based on the content of the launch product:

1. The projection screen size is enlarged, relying on the purchase of Sony's higher-specification Micro OLED micro-display screen

2. Near-eye display optical design has shortcomings, relying on supply chain manufacturers, lack of in-depth design capabilities 

3. Judging by the thickness of the top of the glasses, the Rokid Max optical machine is likely to follow the previous generation of Air small eyebox and a short pupil distance, making IPD design more difficult. IPD adaptation problems lead to double shadowing in both eyes, short pupil distance leads to VAC visual adjustment eyeball radiation conflict is more obvious, long-term wear dizziness and discomfort are aggravated 

4. The key optical module MTF parameters, distortion correction from the edge of the picture to the center are not disclosed, and the optical design improvements to reduce chromatic aberration, filter glare and stray light are not explained, these two very affect the real picture display quality. The only optical introduction is actually the "sunglasses" design similar to Nreal Air, which reduces light leakage on the front and pays attention to personal privacy.

5. The spirit of the conference demonstration, the interaction accuracy of creation, user learning cost, practicality, and experience are doubtful

6.Rokid Max Pro contrasts the surrounding environment in black and white to highlight the color visual enhancement information, which has no technical content and is even a step backwards

What's wrong with Rokid

Rokid CEO, who always shows off in a black sweatshirt, is a maverick style Rockie, but this is just the surface.

The two founders of Nreal and Rokid were founded by young geeks who returned from overseas, but their professional backgrounds and product directions are different.

Chi Xu, the founder of Nreal, participated in the development of a new generation of GPU computing platform at NVIDIA, and then joined Magic Leap, responsible for the implementation of head tracking positioning algorithms and embedded optimization. Rokid founder Mingming Zhu graduated from UC Berkeley with a Ph.D., and founded Mammoth Technology, which was wholly acquired by Alibaba, and was mainly responsible for the research and development of deep learning, vision and natural language processing in Ali M Studio.

When the media portrayed Xu Chi, he did not have "wild technical ideals and enthusiasm" like Zhu Mingming, but was more pragmatic and rational, and his understanding of products, technologies and business was more "combined":

"I am very glad that after graduating from Zhejiang University, I did not start a business on impulse. Like our friends, most of us have entrepreneurial dreams. For me, when I realized my lack of work experience, I decided to learn and accumulate while working, and then look for a suitable development direction. ” 

"My initial idea was also extremely simple, naïve, and even idealistic, but this is an unavoidable state, and every entrepreneur will go through such an ideal to the actual landing process, which is very normal." 

"In fact, I just connect technology and market, because this will attract more and more excellent talents to return to China, so that everyone can make their careers bigger and stronger together."

In contrast, the overall style of the Rokid company controlled by Zhu Mingming is more Silicon Valley fan, the elite taste is stronger, and a lot of financing can be created. The product content of the official website and the marketing materials on the street can be seen that the brand vision is more prominent, and the disadvantage is that it is not landing enough, not "localized" enough, and not fully targeting users.

"The C-end product is not representative of the entire Rokid at this stage"

"Be sure to take a look at the exhibition equipment on the first floor of the company, which is also something that represents Rokid (AI, AR technology and museum together)" 

"We are a company that does human-computer interaction." 

Rokid does a little wide, from AI, speakers to AR, AR glasses are also the first cut enterprise scene, as we all know that enterprises and government institutions AR glasses are aimed at specific scenes, not high-frequency use, the pursuit of product functionality, better than daily experience. Rokid Air is such a product with a strong industrial design style, paranoid technical details, but does not pay attention to the overall experience and lacks a certain C-end product thinking.

The main BirdBath glasses for watching movies should pursue the most comfortable design for near-eye display, which requires a certain eye distance, that is, the optical design to make the pupil distance larger, and the large eyebox can maximize the flexibility of wearing glasses. There is also more attention to correcting distortion and chromatic aberration at the edges of the picture to present the best display quality, which means doing a good job in the MTF value of the lens module, and studying it in depth in the details that matter, such as light engine projection light modulation and optical lens coating.

But the two indicators of the pupil distance and eyebox of the previous generation Rokid Air are actually the smallest compared to Nreal Air and Thunderbird Air / Air 1S. The lens MTF value is also the worst.

This generation of Rokid Max conference highlights the privacy preference design of myopia adjustable function and reducing front light leakage, without spending a little space on MTF, eyebox, pupil distance and other optical designs. And Rokid's key marketing of myopia adjustable function, whether Air 0-500 degrees diopter adjustment or Max 0-600 degrees plus lossless picture, why Nreal and Thunderbird do not use this scheme from supply chain manufacturers, honestly choose myopia lenses, optical design to choose the appropriate pupil distance to facilitate users to wear glasses?

BirdBath's current myopia diopter adjustment scheme adjusts the picture, not the optical level of myopia glasses, and it cannot adjust astigmatism, and the practicality is doubtful. There is also the impact of picture focus on IPD, which may have the side effect of binocular ghosting of the human eye, which Rokid product manager may not really delve into.

From the official website of Huiniu Technology

There is also a detail, Rokid Max's eye brightness has come to 400 nits, and the peak is 600 nits, thanks to Sony's higher-spec Micro OLED micro-display screen, or is it achieved with a small area eyebox optical design? AR viewing glasses are mostly relatively static and private scenes, and whether they really need high brightness in the eyes is worth careful scrutiny. Thunderbird glasses seem to have only 200nits-400nits eye brightness, which is a cut, but the eyebox is very large, Nreal eye brightness is about 400nits, and the eyebox size is between Thunderbird and Rokid Air. And Nreal and Thunderbird are now competing for the first place in sales.

Whether the product is intentional or unintentional, I can't tell. Rokid to AI, to B, to C or to VC, only felt a little pull between geek and reality. After experiencing the Jobs-style product Moonstone was crushed by dozens of Baidu cheap smart speakers, the company almost pushed down the big layoffs, transformed AR to B and then to C, and Rokid CEO, who has never been a low-level employee, seems to still "not forget the original intention".

Raise money, live, tell good stories, and wait for spring to bloom.

As for the product, making the concept first rather than the shipment may be the real thing Rokid is really thinking about at the moment.

Something

AR has the potential to become a time killer device, because daily glasses are worn, the surrounding environment is seen, visual enhancement and virtual and real interaction are carried out anytime, anywhere, information prompts, learning, work, entertainment, the use of high frequencies, and the scene is not limited.

The key to the popularization of AR in the consumer market lies in virtual and real interaction and optical display, relying on sensors, computing power, and smaller volume of high-transmittance and high-brightness optical design, especially eye tracking, gesture recognition, SLAM function sensors. Whether SLAM can achieve high precision, whether eye tracking can be low-power and low-latency fast screen rendering, whether gesture recognition can achieve the degree of aerial typing, and the limitations of the entire volume, battery life, and energy consumption, one can actually be done well, but the difficulty of AR is that any of these key technologies can not have shortcomings.

AR or wait for the maturity of hardware technology, in fact, everyone generally complains about the content and application is nothing to worry about, the hardware will be popularized as quickly as possible, as for the time point, several key technologies have made rapid progress in recent years, and the initial mature hardware may come out in 2025-2027.

Hope is always there.

Read on