Over the past years, the automotive industry has gone through significant changes, enabling the development of advanced driver-assistance systems and autonomous driving. The automated driving system (ADS) must evolve in an environment which is almost infinite and where it must handle every driving situation involving other actors while ensuring the safety of all traffic participants. This is a whole new level of complexity! Despite the remarkable progress and promising developments we see in autonomous vehicles (which we compared to the visible part of an iceberg), there seems to be a lack of systematic approaches when it comes to testing autonomous driving systems that serve homologation purposes.
At BTC Embedded Systems, we have been working over the past years on the validation of SAE Level 4 driving functions with homologation as purpose (example of public ongoing project). We leverage from our 24 years’ experience developing advanced verification techniques for safety critical software, and from our strong relationship with industrial and academic partners, to solve the challenges of testing autonomous driving software. On our journey, we have identified five challenges which we compare to the submerged portion of the iceberg. We propose to solve these challenges with a scenario-based testing approach that deals with complexity, ensures safety and provide relevant metrics to serve for homologation.
1. Is there a development standard out there?
ADSs are composed of a bunch of E/E systems that in combination realize the autonomous function. At the first place, ISO 26262 sounds as a good supporting standard to develop an ADS. It is! However it is not sufficient. Indeed, ISO 26262 focuses on a systematic approach to identify and minimize the risks induced by hardware and software faults. But, assuming the E/E related risks are safe, how do we make sure the ADS satisfies the intended functionality in all conditions, and how will it react in front of unforeseen exceptional situations? This is covered by the recent standard ISO 21448 (Safety of the Intended Functionality aka SOTIF). SOTIF addresses the potential hazards and risks associated with the insufficient functional specification and performance limitations of the system. It proposes a scenario-driven approach to detect and reduce as much as possible the functional insufficiencies that could lead to severe hazards, while showing that the system behaves as specified.
By applying SOTIF, we first define the Operational Design Domain (ODD) of the system. This sets a super boundary of the operating environment and conditions of the system. Within the ODD, we specify the intended functionality for all possible scenarios. Then, we need to evaluate the conditions under which the system may not operate as planned and which could cause hazardous effects. The analysis can reveal an incomplete system specification or physical/technological limitations. Hence, we can update the system specification or restrict the operating conditions to minimize the risks. In case there is remaining uncontrollable or severe risks, we must define acceptance criteria. Once the hazardous scenarios are identified, we can develop a V&V strategy to assess that the system induces a sufficiently small risk in known scenarios and that all acceptance criteria are met. However, we must acknowledge that unknown scenarios may also contain risks. Therefore, we must assess the acceptance criteria across a broader range of scenarios including those that are not explicitly identified.
Ultimately, the safety proof derived from the V&V strategy really depends on the ability to identify and cover the known scenarios as well as reducing as much as possible the space of unknown scenarios. For this, we need new validation techniques and new tooling that enable us to enforce and evaluate the completeness of scenarios.
2. How to ensure completeness?
Ensuring completeness in the verification and validation (V&V) strategy requires a set of scenarios that accurately represent the Operational Design Domain (ODD). To achieve a comprehensive safety assessment, we need first to formally define the ODD attributes, such as road and environmental conditions, allowed traffic objects, etc. This can be done by referring to established frameworks like the PEGASUS project’s taxonomy or the ISO standard 34503. Once the ODD is defined, we can develop a strategy to create a set of test scenarios that adequately covers it.
How to get the scenarios?
There are two approaches: recording real-world scenarios or generating synthetic (virtual) scenarios.
- Real-world scenarios offer 100% realism but are expensive, time-consuming to acquire, and challenging to reproduce.
- Virtual scenarios on the other hand can be used during early development and testing phases, are reproducible, and can explore a wide range of ODD conditions
Note: Virtual testing is widely accepted to adequately fulfill the validation needs of ADSs, including by regulatory organizations such as UNECE and NHTSA. It’s even considered as one of the fundamental V&V pillars.
Achieving scenario completeness is challenging with either approach alone. Millions of miles of recorded scenarios do not guarantee the observation of all possible scenarios, and generating synthetic scenarios with all parameter combinations is not realistic (E.g. Combining 100 ODD parameters having 3 values each leads to 3^100, being ~10^48 combinations. Plus among them, many will be physically impossible). To address this, we need to combine the strengths of both approaches as a practical way to ensure completeness. Regardless of whether scenarios are derived from real-world or synthetic data, their exhaustiveness within the ODD can only be evaluated with a certain level of statistical certainty. Therefore, completeness, referred to as “scenario coverage,” is expressed in probability and is compared to a reference distribution set.
Initially, we need a set of abstract scenarios that represent possible dynamic situations within the ODD. These scenarios are described in abstract or logical manner (e.g., “cut-in” or “cut-out”) and carry probabilities of occurrence within the ODD. We can create the scenarios using the graphical scenario language Graphical Traffic Scenario (GTS) developed by BTC (compatible with OpenScenario 2.0). By varying the ODD attributes, we obtain a set of “ODD-characteristic” scenarios. These scenarios ultimately serve as a reference basis that should be covered by real-world or virtual tests.
Using the complete set of scenarios, we can then test the ADS and evaluate its performances against safety objectives
3. Manage complexity through smart parameter variation
The next step is to bring the ODD-characteristic scenarios into simulation. With their abstract and logical nature, they capture an infinite set of possible trajectories. To derive the concrete scenarios (actual trajectories), we propose to generate them automatically from the ODD-characteristic scenarios using a smart variation of ODD and scenario parameters.
“Less is more”: The aim to test mainly the relevant scenarios while ensuring safety!
As mentioned previously, we avoid using a brute-force approach that would result in an explosion of tests and include numerous irrelevant scenarios. In the ODD-characteristic scenarios, we assign statistical distributions to the ODD and scenario parameters. The distributions can be derived from real world data or from domain specific reliable sources. We use them to refine the parameter space with a statistical computation to reveal the likely and unlikely combinations. As part of the variation strategy, we also define criticality criteria to identify critical scenarios so in the end, we can focus the test efforts on the most probable and most critical scenarios even when some critical scenarios turn out to be unlikely. This initial step helps reducing the test combinations. The resulting refined scenarios are called “Test scenarios”, and are suitable to bring to simulation. Please note that Test scenarios are still logical scenarios containing value ranges. They are not yet concrete scenarios, and we explain below how to automatically execute them. With the reduced set of relevant test scenarios (compared to the brute-force approach), we can proceed with simulation to evaluate the SOTIF objectives, but we need to answer two more questions:
1. Will the simulation behaves as specified in the test scenario?
While the ADS performs autonomously, the simulation must control the other traffic participants under the given test scenario specification. In conventional simulation methods, we would assign parameter values at predefined time stamps and thus generate predefined trajectories. However, predefined trajectories assume a certain behavior of the ADS. If the (autonomous) ADS performs differently, the predefined traffic behavior would deviate from the specification leading to irrelevant scenario execution. Therefore we propose a different approach to, calculate and adapt the trajectory of each traffic participant over time during simulation. For this, we use the technology Reactive Traffic Control (RTC) developed by BTC that takes the parameter ranges specified in the test scenario and use them as constraints to calculate the overall trajectory. The RTC controls all traffic participants simultaneously during simulation and ensure a reliable execution of the test scenarios.
2. How to identify the potential hazardous situations among the test scenarios?
While the test scenarios include a subset of critical situations, it is uncertain whether their execution will immediately lead to a hazardous case. Additionally, although the parameter ranges in a test scenario are refined, they still span a continuous parameter space. One could think (again) of using a localized brute-force approach find the hazardous combination but reportedly to all test scenarios, this is not scalable. To automatically find hazardous cases, we have developed a technology called Weakness Detection. It automatically explores the parameter space to identify combinations of parameters where hazardous situations occur. It requires a formal description of the hazardous situation, known as a “weakness function” (e.g., Time-To-Collision < 1s or a complex safety rule that combines different scenario or ODD parameters). By utilizing this technology, the manual and random effort required to search for edge cases during simulation is replaced with a smart AI-powered optimization algorithm.
4. Automate test and verdict calculation
In order to efficiently manage, process, and analyze the vast amount of data, we need a significant degree of automation.
- Automatic derivation of test scenarios: From abstract and logical scenarios, we can automatically derive test scenarios focusing on likely and critical situations.
- Automatic generation of concrete scenarios: To avoid creating predefined trajectories of the surrounding traffic and save manual effort as well as avoid deviating executions, we can use the Reactive Traffic Control to execute the test scenarios.
- Automatic search for hazardous situations: By employing Weakness Detection, hazardous events can be automatically identified without resorting to brute-force methods. This technique uses evolutionary algorithms to navigate through the parameter space, rewarding progress towards identified weaknesses until they are reached. It helps explore the space of hazardous scenarios effectively and the absence of weakness is demonstrated in probabilistic terms.
- Automatic assessment of Safety of the Intended Functionality (SOTIF) objectives: Safety requirements, intended behavior, and acceptance criteria can be made machine-readable. Safety requirements can be expressed using the formal notation called Universal-Patterns developed by BTC, intended behaviors can be specified using a conditional scenario language and finally, acceptance criteria can be specified either using formula combining parameter values or even comparing the system-under-test with a reference driver model in specific situations. These machine-readable objectives enable an automatic and efficient assessment during simulation.
- Automatic assessment of regulatory rules: Compliance with regulatory requirements and traffic rules within the ODD is essential to get the ADS approved. Using the language Universal-Pattern, dedicated groups of rules can be created and observed during simulation, covering both ADS-specific rules (e.g., speed limits) as well as the ADS’s relationship with other entities in the driving environment (e.g., safety distance, lane keeping).
- Automatic measurement of scenario coverage: To make sure the scenarios are correctly executed, scenario observers are generated from the scenario language to assess whether the obtained trajectories of each traffic actor fulfill the scenario specifications and constraints during simulation. The observers measure and aggregate coverage from the leaf test scenarios, weighted with their probabilities, up to the abstract ODD-characteristic scenarios. This enables an automatic judgement of how well the overall simulation covers the ODD space which is crucial for the SOTIF argumentation.
Additionally, a test management tool serves to aggregate the various scenarios, rules, acceptance criteria, weakness functions, and orchestrate the different V&V steps. In the end, it provides relevant metrics required to demonstrate the absence of unreasonable risk, as demanded by SOTIF.
5. Enable homologation
With the introduction of autonomous vehicles, the traditional concept of driver responsibility becomes blurred, requiring legal frameworks to address issues related to liability and insurance coverage. OEMs will have to secure the liability and responsibility terms. To support this, they must apply a SOTIF compliant process. The process must show that rigorous verification and validation tasks have been performed to demonstrate the absence of unreasonable risk while performing the ADS in a sufficiently representative operating environment and conditions. The absence of unreasonable risk shall be documented in quantitative and quality manners. For instance, the remaining risks can be compared with the general statistics of human drivers in the same ODD, augmented with less degree of severity of the accident that may still occur. As part of the safety argument, we must also prove the trustworthiness of the simulation results. This can be supported by tool qualification where on the first hand, the simulation environment utilizes qualified driver and vehicle models that act physically as in the real world, and on the second hand, the ADS virtually behaves as a digital twin of the deployed vehicle.
In general, we see that many projects developing ADSs do not yet apply a systematic V&V approach demanded for homologation. While ISO 26262 provides a framework for addressing hardware and software faults, ISO 21448 (SOTIF) focuses on the safety of intended functionality and addresses the potential risks associated with insufficient system specifications. The combination of both standards is the way forward to achieving the confident level of safety required to develop autonomous vehicles.
To ensure completeness in the validation and verification strategy, a combination of real-world and synthetic test scenarios is necessary, with scenario coverage measured in statistical terms. Managing complexity involves generating concrete scenarios from abstract scenarios, by first focusing during scenario variation on the most probable and most critical ones and using automated techniques such as Weakness Detection to identify the hazardous situations in the remaining parameter space. Automation plays a crucial role in various aspects, including test scenario generation, efficient traffic control, weakness detection, SOTIF objective assessment, and regulatory rules verification. The goal is to achieve a SOTIF-compliant release that shows the absence of unreasonable risk and ensures liability and responsibility terms for autonomous vehicles. Tool qualification and trustworthiness of simulation results are other essential elements in the safety argument.