laitimes

Leveraging the algorithm, Haitian AAC drives the "new engine" of autonomous driving

The Heart of the Machine is original

Author: Wu Xin

In March 2022, it may be written into the annals of autonomous driving history - the United States promulgated driverless regulations, China may allow L3 autonomous passenger cars to go on the road from the policy level, and the first expressway supporting automatic driving in China will be opened to traffic... There is no doubt that autonomous driving has gradually moved from rapid iteration to large-scale landing. At this critical moment of "one foot in the door", if you want the algorithm of the car itself to handle more and more complex scenes, it is indispensable to support massive scene data.

Through the dialogue with Haitian AAC, the only A-share listed data service provider in China, haitian AAC, we will look at the opportunities and challenges faced by AI data on the road to promoting the realization of autonomous driving.

First, autonomous driving data ushered in an explosive period

2022 will be a turning point in the commercialization of autonomous driving.

Back at the end of 2021, Beijing became the first city in China to explicitly recognize the commercialization pilot of "RoboTaxi", marking the commercial operation stage of the domestic autonomous driving track.

Leveraging the algorithm, Haitian AAC drives the "new engine" of autonomous driving

Relevant data and pictures from the "2022 China Autonomous Driving Industry Research Report" released by 36Kr and Hanergy Investment

Leveraging the algorithm, Haitian AAC drives the "new engine" of autonomous driving

Gartner's Top 10 Business Trends in the Automotive Industry in 2022

With the launch of regular self-driving car services, challenges ensued and is one of the business trends to watch

At present, the mainstream algorithm model of automatic driving is mainly based on supervised learning, which requires a large amount of labeling data to train and tune the model. Only through the iteration of data in various scenarios can autonomous driving really land.

Whether or not a large amount of labeling data can be obtained efficiently directly determines whether each company can obtain the first-mover advantage in the autonomous driving market.

Although some industry leaders have set up data annotation teams internally, training data service providers are still a presence that cannot be ignored behind them.

As a leader in the domestic training data industry, Haitian AAC has also begun to cooperate with some traditional car companies, new car manufacturing forces and head autonomous driving technology companies in recent years to explore how to help partners maximize the value of automatic driving data.

Leveraging the algorithm, Haitian AAC drives the "new engine" of autonomous driving

IDC predicts that by 2025, the market size of Chinese intelligent data acquisition services will reach 12.34 billion yuan. The driving force for the development of the market comes from the rapid development of the artificial intelligence market on the one hand, and from the increased data collection efforts of industry users on the other hand.

Leveraging the algorithm, Haitian AAC drives the "new engine" of autonomous driving

According to IDC data, autonomous driving is also the industry with the most growth potential in the AI infrastructure services market

Second, do a good job of the "first pass": the design ability of data solutions

Compared with vertical fields such as smart home, new retail, and security, the data demand for autonomous driving is particularly "demanding", which also poses new challenges to data service providers.

For example, at the data sample richness level, a comprehensive data set with scene coverage is critical to the safety of autonomous driving systems. Imagine if a group of wild elephants suddenly appeared on the highway, or someone suddenly crossed the road, how would self-driving cars respond?

This kind of Corner case data is very difficult to collect, and we can't really find a group of wild elephants to walk on the highway and then collect data.

For these real-life situations that are entirely possible, if the system does not recognize, it will lead to serious consequences and even loss of life. Therefore, the corresponding data is indispensable, whether it is synthesized or simulated by technical means, the design ability of the data scheme is particularly important.

At this time, the significant advantages of Haitian AAC as a comprehensive data service provider are highlighted: in general, the industry has more customized requirements for training data collection and labeling at this stage, and Haitian AAC's years of accumulation in basic research have enabled it to better grasp the existing technical direction, deeply understand the customer's application logic and demand pain points for training data, and ensure that the customer algorithm achieves the best possible landing effect.

Specific to the needs of the autonomous driving project, Haitian AAC will arrange a professional team to design the training data set structure according to the needs before the project starts, so that the limited capacity training data set can cover as many phenomena as possible, and formulate a corresponding reasonable data ratio.

For example, if the project involves trucks, the proportion of coverage of the highway scene will be very high, and related scenes such as upper and lower gates should also be taken into account; if it involves passenger cars in the city, the design plan needs to cover various intersections, such as the intersection of the three forks road, the turning line, etc., whether there is a U-turn, or not a U-turn on both sides, and even rare cases such as the left turn lane on the far right, which must be covered.

In order to make the dataset more complete and rich, situations such as scenes on both sides of the road, obstacles on the road, crowded vehicles, sparseness, and the number of pedestrians should also be considered in advance, especially some contingencies, such as sudden crossing the road, although the coverage of such scenes will be more difficult.

Leveraging the algorithm, Haitian AAC drives the "new engine" of autonomous driving

The main content of Haitian AAC's autonomous driving data business

Sometimes, it's not clear to the customer what kind of data scheme is better suited to the algorithm's needs. For example, compared with the relatively experienced Internet manufacturers, traditional car companies need service providers with rich experience in program design to help guide, sort out and refine specific needs.

For example, in the face of unexpected situations, how many seconds of data before the sudden braking is more valuable for the automatic driving decision system? How much data do I need for a low-visibility severe weather scene? How do I need to collect it? How many seconds per frame?

Through repeated communication between technology and algorithms, Haitian AAC can help customers find data solutions that are more suitable for use scenarios, reduce the research and development cycle, accelerate the landing process, and avoid customers from spending more costs.

Third, the accuracy, efficiency and scale under the "human-machine coupling"

High-quality autonomous driving training data, in addition to the "difficulties" from the sample richness, also faces the challenges brought about by the high-precision and efficient large-scale operation of the labeling process.

For example, the same 99% accuracy is good enough for most scenarios in speech synthesis tasks, but for autonomous driving scenarios, it is very likely to lay a safety hazard.

Based on the strict requirements for safety, intelligent driving data (mainly outside the cabin) is developing in the direction of multimodality, the so-called multimodal, refers to the perception and integration of multi-dimensional time, space and environmental data. For example, a car may be equipped with as few as 4-5 cameras, as many as a dozen cameras, plus radar (lidar, millimeter wave radar, ultrasonic radar, etc.).

The lidar used in the market is basically 64 lines or more, limited by various hardware devices, and it is difficult to fully synchronize the data transmitted back. Since point clouds are a continuous frame concept, if the label multiple data is inconsistent, it will affect the algorithm model training. On the other hand, how to synchronize 3D lidar data and 2D ordinary camera data annotation is also a difficult point. These have become a problem in the face of the need for high-precision labeling.

Corresponding to this is the backwardness of the productivity of the labeling link.

Leveraging the algorithm, Haitian AAC drives the "new engine" of autonomous driving

Schematic diagram of the training data production process

The "2019 Chinese Intelligent Basic Data Service Industry White Paper" pointed out that the demand for data labeling in the early days of 2010-2016 surged, coupled with the low entry threshold, poured in a large number of players, and the fish and dragons were mixed. To this day, the vast majority of data service providers are still in the stage of solving the problem of "data annotation tools are not available".

Many teams rely on open source tools to complete the vast majority of projects, and not only do they have no point cloud annotation tools, but also no basic process management (e.g., what kind of data should be screened?). What should I do if I mark the quality of non-conformity? It is simply impossible to deliver the high-quality, high-precision data sets needed for autonomous driving.

With the in-depth landing of AI in the field of travel, the higher the level of intelligent driving, the more sensors are required, the higher the accuracy requirements, and the corresponding amount of data will increase sharply, and the amount of data processed by a project with millions of pieces has long been not a workshop operation.

For example, Waymo Open Dataset has 16.7 hours of video data, 3,000 driving scenes, 600,000 video frames, nearly 2 million 3D polygons and 22 million 2D polygon labels, and this is just a fraction of Waymo's massive private autopilot dataset.

The rapidly changing market environment and the requirements for data delivery schedule are also more stringent, and only more automated, intelligent and platform-based data services can better meet customer needs.

As the head service provider of the infiltration industry for more than ten years, Haitian AAC has been exploring the possibility of human-machine collaboration in all links since its inception, and achieving the best balance between service quality, speed and scale of data annotation.

Leveraging the algorithm, Haitian AAC drives the "new engine" of autonomous driving

Schematic diagram of the integrated data processing platform

More than ten core technologies are applied to the design, collection, processing and quality inspection of training data production, independent research and development of an integrated data processing platform, the project process management, quality control, data security management into it, and embedded thousands of self-developed and accumulated tools suitable for training data processing needs in various business scenarios, fully improving the production efficiency and quality control level of data training.

Specific to the automatic driving scenario, in the eyes of the average person, the 3D point cloud is just a point object, and it is difficult to intuitively see what it is. However, senior annotators will look back and forth between the front and back frames of the same continuous frame, and sometimes look at it together with 2D pictures, "brain supplementing" the parts of the point cloud data that cannot be presented.

The Haitian AAC Autonomous Driving Annotation Platform has a tool called "Assisting the Construction of Object Brain Frames", which can help labelers perform more accurate "brain repair". For example, after pulling the frame, the system will automatically prejudge the contents of the first and last frames of the same continuous frame, and will also give some references such as the size of the vehicle.

Secondly, this labeling platform covers the labeling tools of different types of data in automatic driving scenarios, which can greatly improve the labeling efficiency. For example, the platform supports 3D point cloud annotation, 3D dot cloud continuous frame annotation, 3D continuous frame and 2D joint annotation, 3D semantic segmentation, etc., and can also develop the platform tools according to the individual needs of customers, which is industry-leading.

Leveraging the algorithm, Haitian AAC drives the "new engine" of autonomous driving

Haitian AAC 3D point cloud labeling platform

When a 3D point cloud is marked in consecutive frames, the automation tool automatically predicts the position of the object on the third frame of the object marked in the first two frames. Since the algorithm will intervene first to make judgments, the work of the labeler is largely a correction process, and the efficiency and accuracy are greatly guaranteed.

Finally, the advanced nature of the integrated data processing platform not only provides a unified entrance and a unified style, which helps to improve the efficiency of data collection and processing, but also precipitates the company's many years of industry deep experience to the platform, which not only simplifies and unifies the training data production process, but also the modular project generation and management methods enable production personnel to organically combine and flexibly adjust according to the actual project needs.

Fourth, the systematic platform: foothold quality assurance and safety

In addition to precision, efficiency and scale, the all-in-one platform can also escort the quality of data annotation.

The concept of quality inspection and control is embedded in the tools of each link of the platform: for example, in the collection link, the collection tool can carry out real-time quality inspection of the original data quality, and the original data that does not meet the requirements will be prompted by the system tool to be unable to enter due to a certain requirement that it is not up to standard; the mid-end processing link, using automatic labeling tools + manual proofreading inspection methods, check the data annotation situation and improve data quality; in the back-end large-scale quality inspection link, the use of automatic verification technology to achieve large-scale training data set 100% quality inspection requirements.

At present, the Haitian AAC integrated platform has precipitated hundreds of quality inspection points, which can meet the needs of all daily business scenarios, such as checking whether the image and video file formats are correct, whether the number of objects is up to standard, and whether the accuracy of marking and framing meets the requirements.

In fact, high-efficiency and high-quality data annotation under human-computer collaboration is only part of the overall process of data service. Data set management, project management, team personnel management, etc. are closely related to data security and compliance, and cannot be ignored.

In conjunction with the integrated data processing platform, Haitian AAC has also established a full-time log database and an end-person management system to achieve operational traceability, transparent traceability and strict permission grading of different roles within the platform to ensure data security.

For different levels of safety, Haitian AAC can provide different levels of solutions. Customers can put the data on the Platform of Haitian to do it, deploy it to their own servers, and even provide admission annotations.

With the continuous improvement of the state's large-scale cultivation of the data element market and data circulation, and the successive release of higher-level laws such as the Cybersecurity Law, the Data Security Law and the Personal Information Protection Law, data security and privacy protection have also received more and more attention from all walks of life.

Haitian AAC has also taken the lead in achieving ISO/IEC 27701 certification, which means that its ability to manage and protect personal privacy information in the process of data production meets the "important global privacy protection standards" and has also passed the "most stringent qualification audit".

Nowadays, personal information from design, collection, processing, quality inspection to delivery has been strictly controlled within the scope of the platform, and the security of personal privacy information is guaranteed through standardized data desensitization, strict terminal person management system, privacy degree classification and permission isolation, full-time automatic monitoring and other measures.

In the long run, only by setting a benchmark for data services from the perspective of security and compliance can we achieve the elimination of bad money in the industry and truly make artificial intelligence the engine of a new round of technological revolution.

Fifth, face the unknown: the ability to feel the bottom of the river by feeling the stones

Strict control of data production efficiency, data quality assurance and data privacy security helps Haitian AAC stand out from the market competition.

In addition to these, there is also an extremely important underlying ability – the strength and courage to cross the river with customers and explore new business challenges together.

20 years ago, the commercial landing of artificial intelligence was still in its infancy, and the generalization ability in real scenarios was limited.

After thousands of projects, Haitian AAC has served more than 500 large technology companies, leading AI companies and scientific research institutes around the world, accumulated a large number of industry Know-how, and explored the technology and solution capabilities to help AI projects significantly shorten the landing cycle and reduce costs, which is also where it can explore the "unknown" field.

As far as the autonomous driving data labeling market is concerned, one of the pain points of most customers at present is how to achieve data closed loop in driving, the requirements of these data service providers have far exceeded the simple collection and labeling capabilities, and the data service providers that integrate technology, capital, experience and other comprehensive strengths are needed to explore and trial and error with customers.

Like the steam engine in the steam age, the generator in the electrical age, the computer in the information age, and the Internet, artificial intelligence is becoming a decisive force in propelling human beings into the age of intelligence.

With the artificial intelligence entering the new generation of "data + knowledge" two-wheel drive, the market position of data elements is highly recognized, and the market space is vast. According to the research of third-party institutions such as iResearch and IDC on the AI training data industry, the size of the Chinese market is expected to reach 10 billion + billion in 2025, and the global training data market is about 50 billion.

McKinsey's digital consulting business in China recently predicted that the commercialization of autonomous driving in China will come sooner than expected in the next few years.

On the journey towards commercialization, Haitian AAC will work with enterprises to explore the best path to accelerate model training, product landing and iterative update, so as to better serve the future society.

Read on