Safety and Scenario Based Testing for Autonomous Vehicles

Fully functioning autonomous vehicles (AVs) will bring about a significant change in our society’s transportation systems. Fully functioning can be described as L4 level (and above) of driving automation, according to the SAE J3016 taxonomy The graphic below summarizes the automation levels. An app-based cab service with fully functioning AVs would not need to pay the driver thereby dropping prices significantly. Cruise, an AV company (owned by GM) currently operates in San Francisco as an app-based transport service. The company reports an example of a 1.3 mile trip costing $8.72 (compared to an Uber costing $10.41 that this article author reports). Typically in such apps, pricing depends significantly on factors which change with time such as the ratio of cab and rider availability, location, tax rates among other factors. Probably, the most promising implication of widespread adoption of L4+ AVs would be reduced traffic accidents.

SAE levels of automation. Image source: SAE International

A barrier in widespread adoption of these technologies (Automated Driving Systems or ADSs) is proving their safety in places where vehicles currently operate. To make a scientific argument for safety, we would need evidence of an AV not behaving dangerously on the road. The strength of the argument is dependent on the strength of the evidence. A scientist would design an experiment to collect accurate evidence. However, here the evidence itself is difficult to define. Evidence of an AV not behaving dangerously is a very difficult to acquire. Road here would mean any road anywhere in the world. Vehicles are driven in other places too such as parking lots and off road instances such as unpaved roads or farms. An AV would need to behave safely there as well. Behave safely or not behave dangerously is hard to define as well. Just the obvious collisions would not be sufficient. Would collision with a plastic bag on the road count? Would a near miss count? What is even a near miss? Would breaking of traffic rules count? The notion of safe behavior would in turn be highly subjective. Therefore, no argument will be sufficient. The simple, obvious argument stated in this paragraph would need to be replaced by several smaller, more nuanced arguments with clear limits to the validity of each one. This kind of a research problem falls well within the definition of a wicked problem or messy science. Here, Schikore writes, “Wicked problems cut across different disciplines, engage different stakeholders (including non-scientists), are fluid, and cannot even be clearly formulated. They are urgent and need to be addressed before sufficient evidence is in.”

A simpler argument would be a claim that falsifies a claim of safety. Instead of proving that the behavior is not dangerous, even showing one example of dangerous behavior will falsify claims of safety. This would resolve the difficulty of obtaining high quantities of evidence needed in the approach above by requiring very few samples. Combined with the requirements of greater nuance and clear limits to validity of smaller arguments, we have a better understanding of the requirements from research experiments. An experiment would look for dangerous behaviors from an AV in specific situations. The term situation is being used loosely here. This kind of experimentation is scenario-based testing of autonomous vehicles.

A commonly accepted definition of scenario that represents this general idea is currently in use in the AV research community. Loosely, it includes static and dynamic aspects of an environment that a vehicle may find itself in. ASAM (Association for Standardization of Automation and Measuring Systems) has created several data standards that as a group are called the ASAM OpenX standards. Once data formats are defined clearly and commonly accepted, writing software that reads and/or writes this data becomes possible. Another similar term, Operational Design Domain (ODD) is used to represent this general idea of the situation that the vehicle finds itself in. In the SAE J3016 taxonomy, ODD is defined as “Operating conditions under which a given driving automation system or feature thereof is specifically designed to function, including, but not limited to, environmental, geographical, and time-of-day restrictions, and/or the requisite presence or absence of certain traffic or roadway characteristics.” There is some overlap between ODD and scenario. The corresponding OpenX standard, OpenODD contains a clarification for these, regarding dynamic elements, a scenario would contain behaviors of entities while an ODD would not e.g. an ODD would state that an environment contains cars while a scenario would define exactly how the cars move in high resolution. Regarding static elements, an ODD would only define it at a low resolution, while a scenario would define the same at a low resolution, down to measurements of width, road geometry function parameters and much more.

In addition to defining the situation, having a better understanding of safe and unsafe behaviors would be important as well. Two entities making significant progress towards L4+ automation are the companies Waymo and Cruise. As of me writing this text here, Waymo operates an app-based ride hailing service in parts of Phoenix, Arizona, USA and San Francisco, California, USA. Cruise currently operates an app-based ride hailing service in San Francisco, Austin and Phoenix. Both companies have released detailed safety reports with Waymo having released one in 2021 and Cruise in 2022. Both reports have similar language. Regarding safety, several standards have been adopted, either in principle or completely.