Every six months or so, Nvidia’s head of automotive, Xinzhou Wu, invites CEO Jensen Huang to go for a ride in a vehicle equipped with the company’s hands-free autonomous driving system. But only when Wu has “good confidence” in the system’s driving capabilities.
Recently, the two went for a drive from Woodside, California, to downtown San Francisco in a Mercedes CLA sedan with MB.Drive Assist Pro, a hands-free driver-assist system partly designed by Nvidia that’s similar to Tesla’s Full Self-Driving. The mood was light, even if the traffic was pretty heavy.
“Let me know when you’re in autonomous mode,” Huang said to Wu, according to a video of the ride provided to The Verge, “then I can be less concerned about my safety.”
Over the course of the 22-minute video, the Mercedes navigates Huang and Wu through a series of everyday obstacles, like construction sites, double-parked cars, and lanes narrowly channeled through rows of orange cones. Nvidia’s system seems quite capable, though the video is edited and not presented in real time. (Nvidia spokesperson Jessica Soares later said there were no disengagements during the ride.)
Still, it seemed not dissimilar from my own experience last year riding shotgun with Nvidia executives in a Mercedes with the hands-free driving system activated. I was impressed by the system’s ability to handle traffic signals, four-way stops, double-parked cars, unprotected left turns, and all the pedestrians and cyclists and scooter-riders that San Francisco can throw at you. If Tesla can do it with a bit of silicon and a bunch of cameras, it stands to reason that the world’s most valuable company could figure it out too.
‘The ChatGPT moment for physical AI’
After years of operating behind the scenes, Nvidia is attempting to stake out a more prominent leadership position on autonomous driving. Not only is it supplying the chips to companies like Tesla, but it’s also offering its own AI-powered driving features to partners like Mercedes, Jaguar Land Rover, and Lucid. At CES earlier this year, Huang unveiled Alpamayo, a portfolio of AI models, simulation blueprints, and datasets, that can give vehicles Level 4 autonomy, allowing them to fully drive themselves under specific conditions. Huang touted the announcement as “the ChatGPT moment for physical AI.”
In the car with Wu, Huang is less bombastic and more introspective — but no less bullish on the technology’s future. “I think the challenge, of course, is Alpamayo, as incredibly smart as it is — and it can reason about the circumstance — we don’t know what it can’t do,” he said. “And so that’s the challenge, and that’s the reason why our classical stack is so incredibly important.”
After years of operating behind the scenes, Nvidia is attempting to stake out a more prominent leadership position on autonomous driving
Huang boasts that Nvidia’s approach to autonomous driving is “unique” because it combines an end-to-end AI model with a traditional, human-engineered “classical” stack. Pure end-to-end models are difficult to verify for safety, he theorizes. In contrast, the classical stack follows well-established engineering protocols and processes that make it easier to verify certain behaviors are safe enough. By combining both approaches, Nvidia’s system can benefit from a human-like driving style while still maintaining a safety framework grounded in traditional rules thofe road.
Huang’s claim of a unique approach in the industry doesn’t completely hold up; other AV operators also utilize end-to-end neural networks in tandem with explicit safety rules governing how a vehicle should respond. But it is certainly true that end-to-end learning, which tends to be more humanlike in its driving and less robotic, is becoming more in vogue. Waymo relies on a hybrid system, while Tesla exclusively relies on end-to-end neural networks.
In an interview, Wu said that end-to-end models are better able to respond to things like speed bumps or lane changes without feeling mechanical or overly robotic. “That’s why it’s really the ChatGPT moment,” he said. “It’s like only when your car really drives with confidence … then basically customers will feel more willing to use it.”
Tesla and the high cost of self-driving
I asked Wu how he thought Nvidia’s approach compared to Tesla’s Full Self-Driving, which has driven over 8.5 billion miles but has been implicated in a number of troubling safety incidents, including 23 injuries and at least two fatalities. Last December, a Nvidia executive told me that the company had tested the two systems against each other. The number of driver takeovers for Nvidia’s system was comparable, he said, sometimes favoring one system, sometimes the other.
Wu declined to directly comment on Tesla’s safety record, but explained that Nvidia distinguishes itself through the use of multiple sensors, including cameras, radar, ultrasonic sensors, and — at higher configurations — lidar. Nvidia believes that redundancy and diversity in sensing technologies are critical for handling difficult edge cases and achieving higher levels of safety, Wu said.
“It’s like only when your car really drives with confidence … then basically customers will feel more willing to use it.”
— Xinzhou wu
Additional sensors means additional costs. The inclusion of lidar, in particular, suggests that Nvidia’s safest system would only be accessible to rich Mercedes owners. But Wu believes that Nvidia’s vertically integrated approach allows it to deliver the required safety performance at the lowest feasible cost.
Nvidia’s DRIVE Hyperion platform is designed with multiple configurations in mind. The base version uses a simpler and more cost-effective sensor setup, primarily relying on cameras and radar. These sensors have become dramatically cheaper over the past decade due to mass production; ultrasonic sensors are already extremely inexpensive. For higher levels of autonomy, the platform can add lidar sensors, and given the declining cost of lidar, Wu said he believes that vehicles priced around $40,000 to $50,000 could realistically include the full sensor stack needed for advanced autonomy.
Data advantages and disadvantages
I asked Wu about recent safety incidents involving Waymo vehicles, such as the company’s robotaxis blocking intersections during a blackout in San Francisco. He said that Nvidia was already running similar edge cases through its simulators. In fact, the company relies heavily on synthetic driving data to account for its disadvantages in real-world testing. Tesla has billions of real-world driving miles, thanks to its vast fleet of customer cars. Waymo has logged nearly 200 million fully autonomous miles on public roads. How can Nvidia ever hope to catch up?
“The big infrastructure play is really simulation,” Wu said. Nvidia is taking two approaches to this. One is neural reconstruction, or NuRec, in which the company’s engineers re-create real-world driving scenarios using sensor data collected from vehicles in the field. The other is augmentation, which modifies elements within a reconstructed scene to explore different potential outcomes. This allows engineers to probe how the autonomous system behaves under slightly different circumstances and identify rare edge cases that might be in the original dataset.
“We can make a pedestrian come out faster, slower, at different place,” he said. “This is what we call blurring of the dataset.”
Nvidia has acquired dashcam footage from its partners to feed into the data it uses in simulation. It also re-creates the edge cases from these Waymo incidents, like the blackout, and trains its system to respond without blocking intersections.
But the ultimate goal is building a system that uses reasoning to avoid these edge-case traps — thus obviating the need for the real-world driving data in the first place. Wu’s team is working on something it calls the Vision Language Action model, which will put this theory into practice. These models combine visual perception, language understanding, and physical action into a unified architecture, drawing on large foundation models already trained on internet-scale datasets. Wu likens it to driver’s ed.
“When we teach a kid how to drive, they read a rule book and then get 20 hours of practice behind the wheel,” Wu said. “Usually, they aren’t bad drivers to start with — though, obviously, it takes experience to improve. Ultimately, we want the model to function the same way: In the future, with just a rule book and 20 hours of training data, it will learn how to drive.”






