10,000 words to interpret the challenges of autonomous driving testing and verification

In general, software testing is often just looking for bugs, rather than ensuring product quality through sophisticated experiments. A better approach than simple system-level testing, i.e. test-fail-patch (improve)-testing, has begun to be deployed on driverless cars on a large scale.

ISO26262's v model establishes a framework that links each type of test to the corresponding design or requirements documentation, but the model encounters challenges in dealing with a variety of new testing issues faced by autonomous vehicles. Based on the V model of autonomous vehicles, this paper identifies five main challenge areas in the test: driver out of the loop, complex requirements, non-deterministic algorithms, inductive learning algorithms, and fault operating systems.

Promising common solutions in these different challenge areas include phased deployment using step-by-step reduced constraint scenarios, using observer/actuator architectures to separate complex autonomous driving functions from relatively simple safety functions, and using fault injection to perform more edge case testing. While there are still significant challenges in providing safety certification safety for high-level autonomous driving algorithms, it seems possible to adopt existing software security approaches to build systems and design processes that correspond to them.

Self-driving cars have become a hot topic, but the technology behind them has actually evolved for decades, dating back to automated road systems projects. From these early demonstrations, autonomous driving technology has evolved into full-fledged driver assistance systems (ADAS). Things like automatic lane keeping and intelligent cruise control are already standard in many vehicles. In addition to this, there are now many projects of different levels of autonomous vehicles at different stages of development.

If anyone believes the so-called experts, then driverless cars seem to be on the verge of coming out. But in fact, those involved in the traditional automotive industry know that there is a huge gap between building several cars to operate under reasonable and favorable conditions and running millions of cars in an unfettered world with professional safe drivers.

One could argue that successful demonstrations of [self-driving cars] and successful driving ranges of thousands of kilometers (or even hundreds of thousands of kilometers) mean that autonomous driving technology is essentially ready for full deployment. However, it is difficult to draw a conclusion from such tests alone that this is sufficient to ensure security. In fact, at least some developers are already doing more relevant testing work, but the question is how much more work remains to be done and how can we know that vehicles are safe enough to hit the road now?

In this article, we'll explore some of the challenges that are waiting to be addressed by developers who are trying to develop fully autonomous L4-class vehicles and deploy them at scale. Therefore, we skipped the semi-automated approach that might have been used in the past, where in this case the driver was not responsible for operating the vehicle at all. We further limited the scope of the study subjects, considering only vehicles designed and validated within the ISO26262v framework. The reason for this restriction is that it is an acceptable framework for ensuring security.

A completely computer-based system should be unsafe, which is a verifiable one, unless you have a convincing argument to the contrary. In conclusion, self-driving cars cannot be considered safe unless they can be shown to meet or can be mapped to ISO26262 or some other suitable, widely accepted software security standard.

10,000 words to interpret the challenges of autonomous driving testing and verification

Fully tested for non-feasibility

It has long been known that it is not possible to thoroughly test a system to ensure its reliability (redundancy). For example, suppose you have a fleet of 1 million cars running for an hour a day. If the safety goal is for this fleet to have a catastrophic computational failure every 1,000 days, i.e. the average time between each catastrophic failure should be 1 billion hours, which is comparable to the allowable failure rate of an aircraft.

To verify the frequency of catastrophic accidents, at least a billion hours of testing, or even several times the time, can be statistically significant by repeating multiple tests. But even so, it assumes that the test environment is based on a real-world deployment, is highly representative, and causes the wrong environment deployment to appear in a random, stand-alone manner.

But it is impractical to build a fleet large enough to operate for billions of hours in a representative test environment without endangering the public. As a result, we need alternative verification methods that may include simulation, fault injection, adaptation to increasing fleet sizes, actual working conditions for components in non-extreme conditions, and manual review. (Component-level testing also plays a role, but accumulating billion hours of pre-deployment testing for physical hardware devices is still impractical.)

Considering that testing autonomous systems is more difficult than testing everyday software systems, this can make things worse.

For relatively non-critical computing systems, testing can be used as the primary basis for confirming their security level. This is because low-severity and low-exposure failures are more likely to occur than catastrophic failures. For example, if it is acceptable to have a specific type of failure every 1,000 hours (because such a failure results in a lowest-cost accident or minor outage), then the failure rate of that failure can be reliably verified through thousands of hours of testing. This is not to say that all the (necessary) software quality steps for these systems can be left alone, but that a suitable testing and failure monitoring strategy may be able to verify the fact that a component of the right quality has indeed been confirmed to have an acceptably low failure rate if the mean time between failures is relatively loose.

The V model as a starting point

Because system-level testing doesn't do the job, we need more testing, which is what it makes sense to create a development framework for more robust security software. The V software development model has long been applicable to cars. It is one of the development reference models incorporated into the MISRA guidelines more than 20 years ago. Recently, it has been promoted as a reference model that forms the basis of ISO26262.

In general, v-models represent an organized creation and validation process. The left side of V includes the process from requirements to design to implementation. At each step, the system is divided into typical subsystems for parallel processing (for example, there is a set of system requirements, but each subsystem has a separate design). The right side of V iteratively validates and validates larger and larger systems, which is a process for transitioning from small components to system-level evaluation. Although ISO2626262 elaborates on this model, we retain some common frameworks here to facilitate discussion and extension.

While ISO 26262 and its V-framework have made clear the accepted practices in ensuring automotive safety, there are still many challenges in mapping the technology of fully autonomous vehicles to the V-method.

The driver is not in the ring

In fully autonomous vehicles, perhaps the most obvious challenge is that drivers are no longer actually driving cars. This means that, by definition, the driver can no longer be expected to provide control inputs to the vehicle during operation.

Controllability challenges

For low-integrity devices, typical automotive safety may depend on the driver's ability to control. For example, in an Advanced Drive Assist System (ADAS), if a software failure results in a potentially dangerous situation, the driver may be expected to bypass the software function and directly restore the vehicle to a safe state. Drivers are also sometimes hoping to recover their cars from serious mechanical failures, such as a burst tire. In other words, in a human-driven car, the driver is responsible for taking the right corrective action. But in the current situation, the driver does not have the ability to take corrective action, that is, the vehicle lacks controllability, so a higher level of vehicle safety integrity, or car safety integrity level, must be designed.

A car that is fully autonomous does not expect the driver to deal with special situations. When a failure occurs and when a situation exceeds the specified operating conditions that can handle it, the computer system must be designated as the fault handler. Putting the computer in charge of exception handling significantly increases the complexity of automation compared to ADAS systems. The combined ADAS system technologies, such as lane keeping and intelligent cruise control, seem very close to fully automated operation, however, fully autonomous vehicles have to deal with all possible problems, and it has a higher level of complexity. Because, now, there are really no drivers to take over the steering wheel and slam on the brakes.

Autonomous architecture approach

In an autonomous architecture approach in an ISO2626262 environment, having a computer responsible for managing requires one of two strategies to assess risk.

One strategy is to set the controllability portion of the risk assessment to C3, which is difficult or uncontrollable. If severity and exposure are low, this may be a viable option. However, in cases of moderate or high severity, the system must be designed to a higher level of automotive safety integrity (automotive safety integrity level). (One might argue that there should be a higher controllability classification C4, as automated systems have the potential to take proactive and dangerous actions rather than simply fail to provide safety features.) But let's assume that the existing C3 is sufficient.

Another way to deal with the autonomous functionality of potential automotive high safety integrity levels is to use decomposition, which is followed by a combination of observer/actuator architecture and redundancy. An observer/actuator architecture is defined as having one module (the actuator) performing the execution while the other module (the monitor) performs acceptance testing or other behavioral validation. If the actuator behaves improperly, the observer shuts down the entire function (two modules), creating a fault-silent system.

If the observer/actuator pair (Figure 2) is designed correctly, the actuator can be designed to be low automotive safety integrity level as long as the observer has a sufficiently high automotive safety integrity level and detects all possible failures. (Potential failures in the observer are also required to avoid problems where the damaged observer fails to detect a drive failure.) If you can make the observer much simpler than the actuator, you can reduce the size of the observer and allow most of the complex functions to be placed in a lower automotive safety integrity level actuator, an architectural pattern that is quite advantageous.

The advantage and disadvantage of the observer/actuator pair is that it creates a failure silent building block (it closes if there is an error). The use of heterogeneous redundancy, i.e. the use of observers and actuators, is to prevent dangerous commands from being issued due to actuator failure.

At a minimum, providing faulty operation behavior requires more redundancy (i.e., multiple observer/actuator pairs) and diversity so that software design failures of the same pattern do not cause the entire system to fail. This is important to avoid a situation like the loss of Arianne 5 Flight 501, which is due to failures caused by the same unhandled abnormal operating conditions as both the primary and backup systems.

It should be noted that because of issues such as vulnerability when implementing different components, achieving diversity is not necessarily simple, as is the case with non-autonomous software. It is also important to note that the fault silence requirement for a fault/actuator pair is based on the assumption of fault independence.

A key point is that no matter what method is used, there always needs to be a way to detect when autonomous functions are not working properly (whether due to hardware failures, software failures, or requirements defects) and somehow move the system to a safe state.

Complex requirements

A basic feature of the V development model is that the right side of V provides a traceable way to check the results on the left side (validation and validation). However, the concept of such checking is based on the assumption that the requirements are actually known, correct, complete, and explicitly specified. This assumption poses challenges for self-driving cars.

As mentioned earlier, removing the driver from the control system means that the software must be able to handle anomalies, including weather, environmental hazards, and equipment failures. There will be many different types of breakdowns, from bad weather (flood, fog, snow, smoke, tornadoes), traffic rules violations (a car on the wrong road, other drivers running red lights, stolen traffic signs), to local driving conventions (drive on the left), animal hazards (locusts, deer, armadillos). Anyone who drives long enough to drive will have their own story, and they will tell the weird events they've seen on the road. Overall, a large number of vehicles is likely to experience all such events, or even more. Worse still, a combination of vicious events and driving conditions can arise, which in the classic written requirements specification is simply too much. If the results of these combinations may be harmless, then it may not be necessary to cover all of these extremely rare combinations, but the requirement should be clear what is the fault to be handled within the scope of the system design and what is not. Therefore, it seems unlikely that the classic V process, starting with a document listing all the system requirements, would extend to autonomous vehicle exception handling systems in a narrow way, at least for the near future. One way to manage the complexity of requirements is to constrain the operational concept and scale the requirements in stages. This has already been done by developers, who may focus on road testing in specific geographic areas (for example, only daytime driving on highways in Silicon Valley, where precipitation is limited and the weather is cold). The idea of adopting the concept of constraints can be extended from multiple dimensions, and the typical conceptual axes (dimensions) that can be manipulated using restrictions include:

Road accessibility: restricted access to highways, shared car lanes, rural roads, suburbs, closed campuses, urban streets, etc.;

Visibility: daytime, night, fog, haze, smoking, rain, snow;

In-vehicle environment: no other vehicles move in a closed garage, only lanes that allow autonomous driving, marker transponders on non-autonomous vehicles, etc.;

External environment: infrastructure support, pre-planned roads, cars driven by people;

Speed: Lower speeds may result in smaller failure consequences and more room for recovery.

While there are still many combinations of the above degrees of freedom (and of course many more are unthinkable), the purpose of choosing from possible operational concepts is not to increase complexity, but to reduce complexity. Reducing the complexity of requirements can be achieved by applying autonomous systems in specific situations where the requirements should be fully understood (and ensure that the identification of these valid operating conditions is correct). As a result, the concept of restrictive operations becomes a guided strategy for deploying more complex technical capabilities in increasingly complex operations. Once the requirements for a particular business concept are fully understood, there will be more similar needs. Over time, operational concepts can be added to expand the scope of allowed automation scenarios. But this doesn't completely eliminate the problem of complex requirements, but it can help mitigate the needs and exceptions of a combination explosion.

Security requirements and constraints Even if a restricted operating concept is used, it may seem impractical to use a traditional approach to safety-related requirements. This approach is more or less done in such a way that the functional requirements are created first. After performing a number of risk assessment processes, security-related requirements are noted. Assign these security-related requirements to safety-critical subsystems. Design safety-critical subsystems to meet the needs of distribution. Finally, the recurrence is done to find and reduce the performance of emergency subsystems that are not expected. Commenting on security-critical requirements can be impractical for autonomous applications for at least two reasons. One reason is that many requirements may be only partially security-related and inextricably linked to functional performance. For example, many of the conditions for using parking brakes when a car is in motion may be an initial (basic) set of requirements. However, some aspects of these requirements are actually critical to security, and these aspects are largely affected by the urgency of other interactive features. Considering the case of parking brakes, parking brakes are likely to be described by many functional requirements. But let's simplify the problem, the only safe key operation in deceleration mode may be that the wheels must be avoided during the emergency interaction of other needs. The second reason why requirements annotations for identifying security-related requirements can fail is that comments are not possible when using machine learning techniques. This is because the requirements take the form of a set of training data that enumerates a set of input values and the correct system output. These are often not in the form of traditional requirements, so a different approach to requirements management and validation is required. Rather than trying to distribute functional requirements between security and non-secure subsystems, create a set of independent, parallel requirements that are strictly security-related. These requirements often take the form of constraints that specify the state of the system required for security. This approach solves performance and optimization problems (what is the shortest path?). Or what is the optimal rate of fuel consumption?) From a safety standpoint (will we hit something?) Using this approach, you can divide the requirements set into two parts of the V model. The requirements of the first set will be a set of non-security-related functional requirements, which may be in a traditional format or a non-traditional format such as a machine learning training set. The second set of requirements will be a set of purely secure requirements that define exactly what security means to the system, relatively independent of the details of optimal system behavior. This requirement can take the form of a safe operation envelope, suitable for different operating modes, and the system can freely optimize its performance within the operation envelope. Obviously, such envelopes can be used at least in certain situations (e.g., to perform speed limits or set a minimum follow distance. The concept is rather general, but it also suggests that this could be the future of work. A compelling reason to employ a set of security requirements that are cross-aligned with functional requirements is that this approach can be clearly mapped to an observer/actuator architecture. Functional requirements can be assigned to low automotive safety integrity level actuator function blocks, while safety requirements can be assigned to high automotive safety integrity level monitors. This idea has been used informally for years as part of the monitor/actuator design pattern. We recommend elevating this approach to a primary strategy for designing the design, requirements, and safety scenarios for autonomous vehicles, rather than downgrading it to a detailed strategy for implementing redundancy.

Non-deterministic and statistical algorithms

Some of the technologies used by autonomous vehicles are statistical in nature. In general, they tend to be uncertain (non-repeatable) and may only give probabilistically correct answers in cases where one probability can be assigned. Validating such a system presents a number of challenges that are not usually found in the more definitive, traditional car control systems

Challenges of the random system

The challenges of non-deterministic computation of stochastic systems include algorithms such as programming algorithms, which may work by ranking the results of many randomly selected candidates. Since the core operation of the algorithm is based on random generation, it is difficult to replicate. While techniques such as using repeatable pseudo-random number streams in unit tests may be helpful, it may be impractical to create fully deterministic behavior in an integrated system, especially when small changes in initial conditions cause divergent states of system behavior. This means that despite attempts to use nominally identical test cases, each vehicle test can lead to different results.

Successful perceptual algorithms are also often probabilistic. For example, the evidence grid framework accumulates diffusion and accumulation of readings from individual, indeterminate sensors, allowing robots to increasingly build a more detailed map of their surroundings. This approach yields the possibility of an object's existence, but time cannot be fully guaranteed. In addition, these algorithms are based on previous sensor physics models (such as multipath returns) and noise models that are inherently probabilistic and sensitive to small changes in environmental conditions.

In addition to geometric modeling of the environment, other algorithms extract labels from perceived data. Prominent examples include pedestrian detection. Such a system can have potentially unpredictable failure modes, even in the absence of noisy data. For example, the visual system may have difficulty eliminating color changes due to shadows, and in the presence of large reflective surfaces, it may be difficult for the visual system to determine the position of objects. (To be fair, these are all challenges for humans.) In addition, any classification process shows a trade-off between false negatives and false positives, with the fewer numbers of one, the greater the number of the other. The implication of the tests is that such algorithms will not be effective for 100% of the time, and depending on the construction, they may report that a particular situation is real, and the probability that this situation is actually true is only medium.

Non-determinism in testing

Dealing with non-determinism in tests is difficult for at least two reasons.

The first is that it is difficult to implement specific edge situations. This is because it is only possible for a system to operate in a way that activates edge cases only when it receives a very specific sequence of inputs from the world. Because of some of the factors discussed earlier, such as the fact that the timer's response to small changes in input can vary significantly, it is difficult to design a situation in which the environment reliably provides the appropriate conditions to run a particular desired test case. As a simple example, vehicles may prefer to take a more circuitous route on a wide road than to take shortcuts in a narrow alley. To evaluate the performance of navigating in narrow roadways, testers need to design a situation where a wide roadway is not attractive to the planner. However, doing so requires an additional cost to the test plan and possibly (manually) moving the vehicle to situations where it would not normally enter to enforce the desired response.

The second difficulty with uncertainty in testing is that it is difficult to assess whether the test results are correct, because there is no unique correct system behavior for a given test case. Therefore, the correctness criterion may have to take a form similar to the security envelope discussed earlier, passing if the final system state is within an acceptable test pass envelope. In general, multiple tests may be required to establish trust.

The behavior of probabilistic systems presents a similar challenge to validation, since passing a test once does not mean passing a test every time. In fact, with probabilistic behavior we might think that at least some types of tests will fail to some extent. Therefore, the test may not be to determine whether the behavior is correct, but to verify that the statistical characteristics of the behavior are precisely specified (for example, the false-negative detection rate is not greater than the detectability rate assumed in the relevant safety parameters). This may require more testing than simple functional validation, especially if the behavior in the problem is a critical security component and there is an expected very low failure rate.

Obtaining extremely high performance from a probabilistic system may require multiple subsystems to have a low failure rate in the event of a completely independent failure. For example, composite radar and vision systems can be combined to ensure that there are no missing obstacles in the extremely low probability range. This approach applies not only to sensing modes, but also to various other algorithmic schemes in planning and execution. If such an approach is successful, then the probability of failure is so low that testing to verify composite performance is almost impossible. For example, if two systems must miss an obstacle every billion detections, then billions of representative tests must be run to verify this performance. To verify the low failure rates of the composite different algorithms, you can try to verify the more frequent allowable failure rates of each algorithm separately. But this is not enough. We must also verify the assumption that there is an independence between failures, which will most likely have to be based on analysis and testing.

Machine learning systems

Self-driving cars are only likely to behave correctly if a complex set of perception and control decisions are made correctly. To achieve this, it is often necessary to make appropriate adjustments to the parameters, ranging from a calibrated model of each camera lens, to the appropriate weighting of the risk of turning or stopping to avoid obstacles on the highway. The challenge here is to find the ratio of the calibration model or weights so that some of the error functions are minimized. In recent years, most robotics applications have turned to machine learning to achieve this, as the complexity of multidimensional optimization makes manual work less likely to produce the desired level of performance.

There are many details of machine learning methods, for example, supervised learning, reinforcement learning, etc., but in short, all of these methods involve inductive learning, in which the training set is used to derive a model.

For example, consider detecting pedestrians in a monocular image. Using a large set of image training sets, the classifier can learn a decision rule that minimizes the probability that a pedestrian will be detected in a separate set of image validation sets. For our purposes, a basic element is that the training set is actually a collection of requirements for the system, and the rules are the result of the system design. (Both the machine learning algorithm itself and the classifier algorithm are easier to modify than traditional verification techniques.) However, these are general-purpose software engines, and the final system behavior is determined by the training data used for learning. )

You can try to avoid the problem of training set data forming actual requirements by creating a set of requirements for training data. But in the end, it just pushes the same challenge to an abstract level. The requirements should not be a set of functional requirements of a typical V system itself, but in the form of a set of training data or a plan for collecting training data sets

Validate the challenges of inductive learning

The performance of inductive learning methods can be tested by keeping some samples from the collected overall dataset and using those samples for validation. Assuming the following situation, if you look at the training set as a system requirement (V to the left), you can use an independent set of validation data to ensure that the test requirements (V's right side) are met. The training data must not have unexpected correlations with the desired behavior, otherwise the system will be overfitted. Similarly, the validation data must be independent of the training data and different from the training data in all respects except for the desired characteristics, otherwise an overfitting would be detected during validation. Of course, it is not clear how to argue that the machine learning system has not produced a fit.

An important limitation of machine learning in practice is that each data point can be expensive if labeled data is used. (Creating a label must be done by someone or something.) Unsupervised learning techniques are also possible, but require a clever mapping to solve specific problems. In addition, if there is a problem with the training set or the rules learned, then more validation data needs to be collected and used to validate the updated system. This is necessary because even a small change to the training data results in a completely different set of learning rules.

Since the needs of autonomous systems are so complex, it is likely that there will be some rare edge cases where learning will have problems. However, due to their scarcity, collecting data describing this unusual situation can be expensive and difficult to measure. (Simulation and synthesis data can help solve this problem, but there is a risk of bias in the simulation data, as well as overfitting of the simulated workpiece.) )

Another problem with validating machine learning is that, in general, humans can't intuitively understand processes. For example, the internal structure of convolutional neural networks may not allow human observers to more intuitively understand the decision-making rules that have been learned. While there may be some special cases, in general, the legibility problem of machine learning, that is, the ability to explain the behavior of a system in human terms, has not yet been solved. This makes it difficult to predict how the technique will be applied to the validation of machine learning systems other than expensive brute force testing. (Perhaps some organizations do have the resources to conduct extensive brute force testing.) But even in this case, the accuracy, validity, and representativeness of the training data must prove to be part of the safety argument. )

Because machine learning systems are generally less readable, and because the danger of overfitting is real, there are failure modes in such systems that seriously affect security. Of particular interest is the occasional correlation that appears in the training set data, but humans often fail to notice this correlation. For example, consider a method of using a trained deformable part model to detect pedestrians in an image, which has proven to be quite effective in real-world datasets. If there are no (or few) images of pedestrians in wheelchairs in the training dataset, such a system would most likely incorrectly associate pedestrian labels with people walking on two legs.

Solutions for inductive learning

Inductive learning how to perform tests is difficult. The first is the black swan problem, the story goes that before the end of the 18th century, all swans observed in Europe were white, so an observer using inductive logic would conclude that all swans were white. However, the observer will experience a challenge of this belief when visiting Australia, where there are plenty of black swans. In other words, if the system does not see a particular situation, it cannot learn the situation. This is a fundamental limitation on inductive learning methods. In addition, due to the severe lack of legibility of machine learning, it is difficult or impossible for human censors to imagine black swan-like biases in such a system.

Validating an inductive learning system can seem like an extremely challenging problem. We may use extensive testing, but need to guarantee the assumption of random independent reach of black swan data and test datasets of the corresponding size. This may be feasible if there are enough resources, but there will always be new black swans, so a large number of operational scenarios and input values need to be evaluated probabilistically to ensure a low level of system failure.

Another way to validate inductive learning systems to high automotive safety integrity levels is to pair an inductively based low automotive safety integrity level algorithm that sends commands to an actuator and uses a monitor based on a deduction based on a high automotive safety integrity level. This will sidestep most of the validation problems that drive the algorithm, as the failure of the induced algorithm that controls the actuator will be captured by a non-inductive monitor based on concepts such as deductively generated secure envelopes. Therefore, the failure of the executor algorithm will be an availability issue (assuming sufficient failover capability and the system shuts down safely), not a security issue.

Operational requirements for critical tasks

Now let's go back to where we discussed earlier, where computers ultimately control vehicles, not people. This means that at least part of the vehicle must be fault-based operation rather than stop-up in the face of a fault.

The challenge of faulty operating system design

Faulty operating system design has been successfully completed in aerospace and other environments for decades, but it is still difficult for several reasons. The first reason is obvious, that redundancy must be provided so that when one component fails, another component can take over. Achieving this requires at least two separate, redundant fail-stop behavior subsystems.

Implementing a failed operating system in turn requires at least three redundant failed arbitrary components in order to determine the source of the failure in the event that incorrect output is issued rather than a silent failure at the component level. For systems that must tolerate arbitrary error failures, depending on the relevant failure model, a complex fault-tolerant system with 4 redundant components may be required.

The structure of redundancy varies depending on the design approach, and may include configurations such as three redundant system members (in which case the members must ensure that they are not subject to a single point of failure), or two two-to-two systems, or the use of four computers. In addition to the obvious overhead introduced by these approaches, there is a testing problem of how to ensure the effectiveness of failure detection and recovery efforts, ensure fault independence, and ensure that all redundant components are trouble-free at the beginning of the drive task. Redundancy may seem unlikely to be avoided, but to ensure safety, it may reduce the complexity and cost of providing sufficient redundancy.

In a typical faulty operating system, such as an airplane, all redundant components are essentially the same, and they are all capable of performing extended tasks. For example, commercial aircraft are typically equipped with two jet engines, each with at least one dual redundant computer control. If two computers on one engine are shut down due to a constant cross-detection failure, there will be a second independent engine to keep the plane flying. Even so, the requirements for engine reliability are very strict, because after the first engine failure, the aircraft may take several hours to fly to reach the nearest airport. This places significant reliability requirements on each engine, increasing component costs.

While cars are notoriously price-sensitive, they also have the advantage that failover tasks can be short-lived (for example, parking the car on the side of the road or parking on a tourist line if necessary), and the duration of failover tasks is measured in seconds rather than hours. In addition, the failover task for stopping the vehicle may have much less functionality than fully autonomous operation. This simplifies requirements complexity, computational redundancy, sensor requirements, and validation requirements. (As a simple example, the failover task control system may not support lane changes, which greatly simplifies sensor requirements and control algorithms) Therefore, designing an autonomous vehicle with a fault stop master controller and a simpler fail-over controller may be attractive in terms of both hardware costs and design/verification costs.

It may also not be based on a completely autonomous system, but rather a completely autonomous system with a detector in a fully autonomous system. This would give the fault detector itself a high level, but could allow normal autonomous functions for a low automotive safety integrity level. This approach maps well to the monitor/executor architecture of the main autonomous system. The autonomy of failover must also be designed in a secure manner and employ an appropriate architectural approach based on its complexity and computational reliability needs. If the likelihood of failure in a short failover task that lasts only a few seconds is low enough, you can even use a single-channel failover system.

Non-technical factors

Some of the challenges encountered when deploying autonomous driving are non-technical, such as the often-cited issue of liability (who is responsible for compensation when an accident occurs?) and how the law generally treats ownership, operation, maintenance and other aspects of the vehicle. Delving into this issue is clearly beyond the scope of this article. However, resolutions on non-technical challenges are likely to have an impact on technological solutions. For example, there may be legal requirements for autonomous systems for accident reconstruction data that require careful analysis of the sources of this data to ensure that the data is used correctly. As a simple example, assuming a radar detection probability of 95%, its output may still be recorded in the system to determine whether an obstacle was detected, ostensibly implying certainty of detection.

It is important to ensure that the method analysis takes into account that just because the radar does not detect a pedestrian, it does not mean that the pedestrian is not there (for example, 95% detection may mean that one in every 20 pedestrians will not be detected).

It seems that due to the inherent complexity of autonomous vehicles, and the inability to prove completeness through testing, developers must create safety assurance parameters in the form of assurance cases. Such a guarantee argument is necessary to maintain and explain the integrity of the system, and it can reliably explain the system's response to unavoidable dangerous situations. Ensuring the integrity of the evidence is a particular issue that should be addressed, as it is possible to determine whether accidents caused by exceptional circumstances are unavoidable. Other important issues to focus on are whether the accident was caused by a defect in the requirements of the system, a reasonably foreseeable and avoidable design defect, a defect in the implementation, or other causes.

Fault injection

It is clear from the previous discussion that traditional functional testing can have difficulties in dealing with a complete system, especially under unusual operating conditions, where it is difficult to perform unusual combinations. While testers can define some non-nominal test cases, there are issues with the scalability of the test due to the explosion of a combination of anomalies, operating scenarios, and other relevant factors.

In addition, research has shown that even very good designers often have blind spots and miss anomalies in relatively simple software systems.

Fault injection and robustness testing are more mature techniques for evaluating system performance under abnormal conditions, and can avoid blind spots for designers and testers when testing abnormal responses. Traditional fault injection involves inserting bit flipping into memory and communication networks. Recent techniques have added layers of abstraction, including data type-based fault dictionaries, and ensured representation of faults. These technologies have been successfully used to discover and characterize flaws in self-driving cars.

This is a promising way to help validate autonomy by performing failure injection at the abstraction level of a component as part of a strategy to attempt to forge security claims. This involves not only simulating objects from the main sensor inputs, but also inserting anomalous conditions to test the robustness of the system (for example, inserting invalid data into the map). The purpose of performing such failure injection is not to verify the function, but to activate the weakness by exploring unforeseen circumstances. This fault injection can also be performed across layers during ISO 26262.

Safe autonomous vehicles developed under the V-process face many big challenges. But to ensure that the vehicle is safe, it is still necessary to follow the ISO26262v process, or to demonstrate an equally rigorous set of processes and technical practices. Assuming that the V process is applied, there are three methods that seem to make more sense.

Phased deployment

In an unrestricted real-world environment,( including special cases), developing and deploying a self-driving car to handle all possible combinations of scenarios seems impractical. Today, as is common in automotive systems, a phased deployment approach based on current developer practices seems to be a more reasonable approach.

You can bind a phased deployment to a V process that limits the scope of operations by specifying good operational concepts, thereby limiting the necessary range of requirements. This includes limitations on environmental, system, and operational constraints that must be used to satisfy autonomous operations. Verifying that these operational constraints are being executed is an important part of ensuring security, and in the V process it must appear as a set of mechanisms that include operational requirements, validation, and potential runtimes.

For example, monitoring at runtime requires not only monitoring whether the system state allows autonomous authorization, but also assuming that the parameters of the safe operating scenario are actually met and whether the system operating the scenario actually considers it to be satisfied.

One aspect of the concept of restricted operation that requires special attention is the need to ensure that safety is maintained in the event of a sudden failure of the operating scenario, for example, due to unexpected weather events or infrastructure failures. When the system is offset outside the allowed autonomous operation scenario, the abnormal conversion mechanism needs to successfully perform the system recovery or failover task, and it is not clear whether the phased deployment approach provides a complete path to autonomous driving. But such an approach provides at least one way to make progress, and it brings some benefits, as the system sees more of the real world. This approach helps us to gain a better understanding of difficult border situations and unanticipated situations.

Monitoring/Actuators

Using a monitoring/actuator architecture is a common approach that can help alleviate many of the challenges of autonomous vehicle safety. As discussed, this architectural style addresses high complexity requirements (only monitors are inherently perfect) and deploys inductive algorithms (by limiting the use of actuators and using deductive-based monitors). In addition, using a failover task policy allows Autonomous System Monitor to detect primary system failures without having to ensure faulty operational behavior. Simpler, highly complete failover autonomous systems can bring the vehicle to a safe state. Such a system may have a failover task short enough to minimize redundancy for failover operations as long as it ensures that the entire system is fault-free when the failover task is initiated.

To ensure the reliability of the system, fault injection testing alone is not feasible. Self-driving cars add to the complexity of the problem, as self-driving cars can automatically react to highly complex environments and introduce technologies such as machine learning that are difficult to test and expensive to test. Therefore, due to the lack of human driving supervision, the autonomous driving system must have a high level of automotive safety integrity. Making ordinary system tests, it seems difficult to obtain a guarantee of a reasonable level. Fault injection can play a considerable role as part of a validation strategy that includes traditional test and non-test validation. This is especially important when fault injection is applied to multiple levels of abstraction, not just at the level of electrical connectors.

Future work

This article discusses how to achieve the safety of autonomous vehicles within the V framework based on ISO 26262. However, the fact that using schema patterns (such as monitor/executor methods) and validation through fault injection will limit operational performance. In other words, we may need to adapt to the limitations of today's testing technology by tightening the capabilities of self-driving cars. If these restrictions are to be eased, progress needs to be made in describing the coverage of machine learning training data that is consistent with the intended operating environment, confidence in safety requirements based on abnormal driving conditions, and being able to verify the independence of failures in inductive systems with redundancy.

Write at the end

About submissions

If you are interested in contributing to the "Nine Chapters of Intelligent Driving" ("Knowledge Accumulation and Collation" type article), please scan the QR code on the right and add the staff WeChat.

Note: When adding WeChat, be sure to note your real name, company, and current position

As well as information such as intention positions, thank you!

"Knowledge Accumulation" manuscript quality requirements:

A: The information density is higher than the vast majority of reports of the vast majority of securities companies, and it is not lower than the average level of "Nine Chapters of Intelligent Driving";

B: Information is highly scarce, more than 80% of the information needs to be invisible in other media, and if it is based on public information, it needs to have a particularly strong and exclusive point of view. Thank you for your understanding and support.

10,000 words to interpret the challenges of autonomous driving testing and verification

Read on