Elon Musk has been talking about Dojo for years. It is the AI machine that will be the core of Tesla’s AI plans. It’s so important to Musk that he just recently said that the AI team at the company will “double down” on Dojo as Tesla gets ready to show off its robotaxi in October.
But what does Dojo really mean? Why is it so important to Tesla’s long-term plan?
In short, Dojo is Tesla’s own machine that it uses to train its “Full Self-Driving” neural networks. Tesla wants to make Dojo even better so that it can fully drive itself and bring a robotaxi to market. About 2 million Teslas currently have FSD installed. It can do some driving jobs automatically, but someone still needs to be paying attention behind the wheel.
It was supposed to be August when Tesla showed off its robotaxi, but now it will be October. Both Musk’s public statements and information from inside Tesla suggest that the goal of automation isn’t going away.
To do that, Tesla is likely to spend a lot of money on AI and Dojo.
The History Of Tesla’s Dojo
Musk does not want Tesla to only make cars or even sell solar panels and energy storage systems. Instead, he wants Tesla to be an AI company that has worked out how to make cars drive themselves by copying how humans see things.
Most other companies working on self-driving car technology use lidar, radar, cameras, and other devices to understand the world around them, along with high-definition maps to figure out where the vehicle is. Tesla thinks that cameras alone will be enough to record visual data, and then the company will use advanced neural networks to process that data and make quick decisions about how the car should act.
Andrej Karpathy, who used to be Tesla’s head of AI, said at the company’s first AI Day in 2021 that the goal is to create “a synthetic animal from the ground up.” (Musk had been hinting at Dojo since 2019, but Tesla made it public at AI Day.)
A more traditional approach using sensors and machine learning has helped companies like Alphabet’s Waymo bring Level 4 autonomous vehicles to market. The SAE defines a Level 4 autonomous system as one that can drive itself without human intervention in certain situations. Tesla still hasn’t made an autonomous system that doesn’t need a person to drive it.
About 1.8 million people have paid the high price of $8,000 to subscribe to Tesla’s FSD. The price has gone as high as $15,000. The pitch is that soon, over-the-air updates will send AI software trained by Dojo to Tesla users. Because FSD is so big, Tesla has been able to record video footage of millions of miles that it uses to train FSD. It is thought that the more information Tesla can gather, the closer the company can get to making cars that can drive themselves completely.
Some experts in the field, though, say that just adding more data to a model might not be enough to make it smarter all the time.
Anand Raghunathan, a professor of electrical and computer engineering at Purdue University in Silicon Valley, told TechCrunch, “First of all, there’s an economic constraint. Soon it will just get too expensive to do that.” Another thing he said was, “Some people say we might run out of useful data to train the models on.” There isn’t always more information in more data, so it depends on whether the data has information that can be used to make a better model and whether the training process can actually turn that information into a better model.
Raghunathan said that even though there are doubts, the trend toward more data seems to be here to stay, at least for now. And because there is more data, Tesla’s AI models need more computing power to store and process it all. That’s where the supercomputer Dojo comes in.
What Does A Supercomputer Do?
Dojo is Tesla’s supercomputer system that is meant to be a place where AI, specifically FSD, can learn. The name comes from the place where martial arts are done.
A supercomputer is made up of many smaller computers, which are known as nodes. There is a CPU and a GPU in each of those nodes. The former handles overall management of the node, and the latter does the complex stuff, like splitting jobs into multiple parts and working on them simultaneously. GPUs are essential for machine learning operations like those that power FSD training in simulation. They also power large language models, which is why the rise of generative AI has made Nvidia the most valuable company on the planet.
Even Tesla buys Nvidia GPUs to train its AI (more on that later).
Why Does Tesla Need A Supercomputer?
Tesla’s vision-only method is the main reason Tesla needs a supercomputer. The neural networks behind FSD are trained on vast amounts of driving data to recognize and classify objects around the car and then make driving decisions. That means that when FSD is activated, the neural nets have to collect and process visual data constantly at speeds that match the depth and velocity recognition capabilities of a person.
In other words, Tesla means to build a digital duplicate of the human visual cortex and brain function.
To get there, Tesla needs to store and process all the video data gathered from its cars around the world and run millions of simulations to train its model on the data.
Tesla’s present Dojo training computer seems to be powered by Nvidia, but the company doesn’t want to put all of its chips in one basket, in part because Nvidia chips are pricey. Tesla also wants to make something better that has more speed and less latency. The AI division of the car company chose to make its own hardware program so that AI models can be trained faster than with regular systems.
At that program’s core is Tesla’s proprietary D1 chips, which the company says are optimized for AI tasks.
Tell Me More About These Chips
Tesla is of a similar opinion to Apple, in that it thinks hardware and software should be designed to work together. That’s why Tesla is working to move away from the standard GPU hardware and make its own chips to power Dojo.
AI Day in 2021 was when Tesla showed off its D1 chip, a square of silicon about the size of a palm. The D1 chip has been made since at least May of this year. The chips are being made by the Taiwan Semiconductor Manufacturing Company (TSMC) with 7 nm semiconductor nodes. Musk says the D1 has 50 billion processors and a big die size of 645 millimeters squared. Everything above means that the D1 should be very strong, work well, and be able to finish complicated jobs quickly.
“We can compute and transfer data at the same time, and our custom ISA (instruction set architecture) is fully optimized for machine learning workloads,” Ganesh Venkataramanan, who used to be the senior head of Autopilot hardware, said at Tesla’s 2021 AI Day. “This is a machine that only learns by itself.”
A100 chip from Nvidia is still more powerful than the D1. Both chips are made by TSMC using a 7 nanometer method. The A100 works a little better than Tesla’s D1 because it has 54 billion transistors and a die size of 826 square millimeters.
Tesla’s AI team put 25 D1 chips together to make one tile that works as a single computer system. This gave them more speed and computing power. Each tile can handle 9 petaflops of computing power and 36 terabytes per second of bandwidth. It also has all the gear it needs to power up, cool down, and send data. The tile is like a computer that is made up of 25 smaller computers that work together. One rack is made up of six of those tiles, and a cabinet is made up of two racks. An ExaPOD is made up of ten boxes. Elon Musk said at AI Day 2022 that Dojo would grow by using more than one ExaPOD. The machine is made up of all of these things.
A new generation of D2 chips is also being made by Tesla. These chips are meant to speed up the flow of information. The D2 would put the whole Dojo tile on a single wafer of silicon instead of joining the chips one at a time.
Tesla hasn’t said for sure how many D1 chips it has ordered or when it expects to get them. It’s also not clear how long the company thinks it will take to get Dojo supercomputers running on D1 chips.
A June post on X said, “Elon is building a huge GPU cooler in Texas.” Musk answered that Tesla’s goal for the next 18 months or so was to have “half Tesla AI hardware, half Nvidia/other.” Musk said in January that the “other” could be AMD chips.
What Does Tesla Stand To Gain From Dojo?
Since Tesla is now making its own chips, it might be able to quickly and cheaply add a lot of computing power to AI training programs. This is especially true as Tesla and TSMC make more chips.
It also means that Tesla might not have to use Nvidia’s chips in the future, which are getting more expensive and harder to get.
Musk said that this is “so high that it’s often difficult to get the GPUs” during Tesla’s second-quarter earnings call. He said, “I’m pretty worried about being able to get steady GPUs when we need them. Because of this, I think we need to put a lot more effort into Dojo to make sure we have the training capability that we need.”
Even so, Tesla is still getting Nvidia chips to train its AI. Musk wrote on X in June:
“About half of the roughly $10B I said Tesla would spend on AI this year is internal. This includes the Tesla-designed AI inference computer and sensors that are in all of our cars, as well as Dojo.” Nvidia hardware makes up about two thirds of the cost of building AI training superclusters. My best guess right now is that Tesla will buy Nvidia for $3B to $4B this year.
The AI calculations that Tesla cars do in real time are called “inference compute.” This is different from “training compute,” which is what Dojo is in charge of.
As a hedge against the risk of Dojo, Musk has said more than once that Tesla might not succeed.
In the long run, Tesla’s AI section might help the company come up with a new way to make money. According to Musk, the first version of Dojo will be designed to name and train Tesla’s computer vision. This is great for FSD and for training Optimus, Tesla’s humanoid robot. It’s not very useful for anything else, though.
Musk has said that later versions of Dojo will be better for training AI in general. It’s possible that this will not work because almost all AI software was made to work with GPUs. To teach general-purpose AI models with Dojo, the software would have to be rewritten.
Unless Tesla rents out its computing power, like AWS and Azure do with their cloud computing power. Musk also said during the Q2 earnings call that he sees “a way for Dojo to be competitive with Nvidia.”
Morgan Stanley said in a study in September 2023 that Dojo could raise Tesla’s market value by $500 billion by opening up new ways to make money, such as through robotaxis and software services.
To put it simply, Dojo’s chips are a protection policy for the car company that might pay off.
How Far Along Is My Dog Dojo?
Last year, Reuters said that Tesla started making Dojo in July 2023. However, Musk wrote in June 2023 that Dojo had been “online and running useful tasks for a few months.”
At about the same time, Tesla said that Dojo would be one of the five most powerful supercomputers by February 2024. This achievement has not been made public yet, so we don’t think it has actually happened.
The company also said that by October 2024, Dojo should have 100 exaflops of computing power. One exaflop is equal to one quintillion processes per second. The company would need more than 276,000 D1s, or about 320,500 Nvidia A100 GPUs, to hit 100 exaflops if each D1 can handle 362 teraflops.
In January 2024, Tesla also said it would spend $500 million to build a Dojo supercomputer at its Buffalo, New York, gigafactory.
Musk said in May 2024 that the back part of Tesla’s gigafactory in Austin would be used for a “super dense, water-cooled supercomputer cluster.”
Musk wrote on X right after Tesla’s second-quarter earnings call that the company’s AI team is training with Nvidia GPUs on the Tesla HW4 AI computer (renamed AI4), which is the hardware that’s in Tesla cars. He said that there are about 90,000 Nvidia H100s and 40,000 AI4 computers.
Also Read: Tesla Will Spend $500 Million to Bring Its Dojo Supercomputer Project to Its Plant in Buffalo
“And by the end of the year, Dojo 1 will have about 8k H100 worth of online training,” he said. “Not huge, but not small either.”
What do you say about this story? Visit Parhlo World For more.