A Flawed Safety Narrative That Begins with Test Grounds and Mileage Accumulation
Across China, the United States, and Europe, autonomous-driving regulatory frameworks are unfolding in strikingly similar ways: by constructing closed “autonomous-driving test grounds” and opening limited-scope “intelligent connected vehicle pilot zones,” autonomous vehicles are allowed to operate under controlled conditions, while the accumulation of driving mileage is used as the primary means of substantiating system safety and maturity.
Within this framework, safety is reduced to an apparently intuitive—but deeply misleading—metric: the number of kilometers driven without serious accidents. Test grounds, pilot zones, and accumulated mileage together form a set of safety credentials that are quantifiable, reportable, and easy for the public to understand.
Today, I will make a blunt statement upfront: this method is ineffective when confronted with the real world.
Two recent incidents serve as direct empirical counterexamples to this validation logic.
First, following a large-scale power outage in California, all vehicles operated by a major autonomous-driving company were forced to shut down and were unable to resume operation, eventually causing large-scale static road blockages in urban streets and becoming a source of traffic risk themselves. This was not a “traffic accident” in the conventional sense, but it clearly exposed the system’s fragility under critical infrastructure failure.

Second, in Zhuzhou, China, an autonomous vehicle struck and dragged two pedestrians standing by the roadside under the vehicle, resulting in severe injuries and both individuals being admitted to intensive care. This incident did not occur under “extreme test conditions,” but rather in what appeared to be an ordinary urban driving environment.

Prior to these incidents, the public narratives of both companies were remarkably similar: in press conferences, media interviews, and official materials, they repeatedly emphasized that their systems had completed tens or even hundreds of millions of kilometers of testing, presenting this as the core evidence that they were “already safe.”
Yet the facts demonstrate that large amounts of accumulated driving mileage do not prevent these systems from failing at critical moments.
To use a simple analogy: repeatedly practicing that “1 + 1 = 2” will never teach you calculus.
Increasing mileage merely repeats a world that has already been encountered; it does not equate to understanding complexity that has yet to emerge.
Have We Overestimated “Validation” and Underestimated the Complexity of the World?
From the perspective of the philosophy of technology, the current safety-validation framework for autonomous driving rests on a belief deeply rooted in traditional engineering practice: that sufficient validation of system behavior can establish confidence in its future performance.
This belief holds true for static systems, closed systems, or highly controllable systems. Bridges, aircraft, and industrial equipment can undergo rigorous testing and receive safety certification precisely because the physical laws they obey, their operating boundaries, and their environmental variables are highly stable.
Urban traffic systems, however, do not belong to this category.
The real world is not a finite set of enumerable states, but a continuously evolving complex system. The behavior of traffic participants changes, road conditions change, patterns of social activity change, and infrastructure itself degrades or fails over time. Many risks do not arise from “rare scenarios,” but from the superposition of multiple seemingly normal factors at specific moments.
In such systems, “validation” can only ever cover situations that have already occurred; it cannot provide guarantees for combinations that have yet to form.
This is the most critical yet least openly discussed issue in today’s safety discourse: we are using a methodology suited to finite worlds to validate a world that is fundamentally infinite.
The Structural Limits of Segmented, Rule-Based AI
From a technical-architecture perspective, today’s modular, rule-driven autonomous-driving systems rely on the following operational logic:
when the system encounters a failure in real-world or test environments, engineering teams retrospectively analyze the incident, abstract it into a new “scenario” or “case,” and then patch the system by adding data, adjusting rules, or tuning model parameters.
In the early stages of engineering development, this approach is efficient and rational. It allows systems to improve incrementally within controlled boundaries and to fix localized defects quickly.
However, it also carries an unavoidable structural limitation: the system itself does not evolve; it is only corrected after the fact.
The system’s capability boundary depends on whether humans have already recognized a problem, defined it as a “case,” and prepared corresponding rules or data. Once an issue does not manifest in a clear or repeatable form, the system struggles to develop a response.
As a result, safety within this architecture is inherently a lagging attribute, trailing behind reality itself.
By contrast, what end-to-end large models represent is not merely “more complex algorithms,” but a system form that is structurally closer to the nature of a complex world.
End-to-end models do not rely on explicit decomposition of the world into rules and scenarios. Instead, they learn the overall statistical structure of traffic behavior from large-scale data, forming a continuous, high-dimensional understanding of the environment. This enables the system to retain a degree of generalization and self-adjustment even when facing previously unseen combinations.
The key here is not whether the system will ever make mistakes, but whether it possesses evolvability—whether it can absorb change through its own structure rather than waiting for humans to manually complete missing rules.
From a safety standpoint, what truly matters is not how much testing a system has already passed, but whether it maintains behavioral stability and risk-containment capacity under unknown conditions.
Safety Is Not Proven Once; It Must Be Continuously Maintained
In a highly complex and continuously changing real world, safety cannot be an attribute permanently confirmed by a single round of testing.
Test grounds and pilot zones are necessary in the early stages of technological development. They provide a minimum capability threshold before systems enter real environments and give regulators an operational starting point. But passing a test ground does not mean a system is qualified to be “permanently safe” in the real world.
What truly matters is not whether a system has “already been validated,” but whether:
- the system has a technological path for continuous evolution,
- regulators possess long-term, dynamic mechanisms for evaluation and constraint, and
- safety is understood as a process that must be continuously observed and corrected.
In a complex world, safety is not a conclusion proven once and for all, but a capability state that must be continuously maintained.
If we continue to mistake “testing mileage” for “safety itself,” the risks faced by autonomous driving will not disappear as testing increases; they will merely be deferred—until they surface at moments that are far harder to control.
And this is precisely why the validation logic of autonomous driving must be fundamentally re-examined today.