OpenAI CEO Sam Altman spoke during a keynote speech announcing the integration of ChatGPT for Bing at Microsoft in Redmond, Washington, on February 7, 2023.
Jason Redmond AFP Getty Images
Before OpenAI’s ChatGPT appeared and captured the world’s attention for its ability to create interesting sentences, a small startup called Latitude wowed consumers with an AI Dungeon game that allowed them to create fantastical stories based on their requests.
But as AI Dungeon became more popular, Latitude CEO Nick Walton noticed that the cost of maintaining the text-based role-playing game began to skyrocket. AI Dungeon text generation software is the GPT language technology offered Microsoft-AI research lab powered by OpenAI. The more people who play AI Dungeon, the bigger the bill Latitude has to pay OpenAI.
Adding to the predicament is that Walton also discovered that content marketers were using AI Dungeon to generate promotional copy, a use of AI Dungeon that the team never predicted, but ended up adding to the company’s AI bill.
At its peak in 2021, Walton estimates that Latitude will spend nearly $200,000 a month on OpenAI’s so-called generative AI software and Amazon Web Services to keep track of the millions of user queries it needs to process daily.
“We joked that we have human employees and we have AI employees, and we spent about a lot on each of them,” Walton said. “We’re spending hundreds of thousands of dollars a month on AI and we’re not a big startup, so it’s a huge expense.”
By the end of 2021, Latitude will switch from using OpenAI’s GPT software to a cheaper but still capable language software offered by startup AI21 Labs, Walton said, adding that the startup is also integrating open source and free language models into its services to reduce costs. Latitude’s generative AI bills have dropped to $100,000 per month, Walton said, and the startup charges players a monthly subscription for advanced AI features to keep costs down.
Latitude’s expensive AI bill underscores the unfortunate reality behind the recent boom in generative AI technology: The cost of developing and maintaining software can be prohibitively expensive, even for companies that develop basic technologies, commonly called language models or bases. , and which use AI to power their own software.
The high cost of machine learning is an unfortunate reality in the industry as VCs eye companies that may potentially have trillions and large companies such as Microsoft, Metaand Google use sufficient capital to develop technologies that smaller challengers cannot.
But if the margin for AI applications is smaller than the previous software-as-a-service margin, due to the high cost of computing, it could lead to the current boom.
The high cost of training and “inference” – actually running – large language models is a structural cost different from the previous computing boom. Even when the software is built, or trained, it still requires a huge amount of computing power to run large language models because they do billions of calculations every time they return a response to a prompt. By comparison, serving an application or web page requires less computation.
These calculations also require specialized hardware. While traditional computer processors can run machine learning models, it is slow. Most training and inference is now done on graphics processors, or GPUs, which were originally intended for 3D games, but have become the standard for AI applications because they can perform many simple calculations simultaneously.
Nvidia makes most of the GPUs for the AI industry, and their data center workhorse chips in particular cost $10,000. The scientists who created this model often joke that they “merge GPUs.”
Training model
Nvidia A100 processor
Nvidia
Analysts and technologists estimate that the critical process of training a large language model like GPT-3 could cost more than $4 million. Advanced language models can cost more than “high single-digit million” to train, said Rowan Curran, a Forrester analyst who focuses on AI and machine learning.
The largest LLaMA model Meta released last month, for example, used 2,048 Nvidia A100 GPUs to exercise on 1.4 trillion tokens (750 words about 1,000 tokens), taking about 21 days, the company said when it released the model last month.
It takes about 1 million GPU hours to train. with dedicated prices from AWS, it will cost over $2.4 million. And at 65 billion parameters, it is smaller than the current GPT model in OpenAI, like ChatGPT-3, which has 175 billion parameters.
Clement Delangue, CEO of the AI startup Hugging Face said that the process of training the company’s large language model Bloom takes more than two and a half months and requires access to a supercomputer that is “like the equivalent of 500 GPUs.”
Organizations building large language models need to be careful when retraining software, which helps software improve its capabilities, because it costs a lot, he said.
“It is important to realize that these models are not trained all the time, like every day,” said Delangue, noting that some models, like ChatGPT, do not have knowledge of recent events. ChatGPT’s knowledge stops in 2021, he said.
“We are currently training for Bloom version two and it will cost no more than $10 million to retrain,” Delangue said. “That’s why it’s something you don’t want to do every week.”
Inference and who pays
Bing with Chat
Jordan Novet | CNBC
To use a trained machine learning model to make predictions or generate text, engineers use the model in a process called “inference”, which can be more expensive than training because it may need to be run millions of times for popular products.
For a product as popular as ChatGPT, the investment firm UBS estimates have reached 100 million monthly active users in January, Curran believes that OpenAI may need $40 million to process the millions of requests given to the software that month.
Costs add up when the device is used billions of times a day. Financial analysts estimate that Microsoft’s Bing AI chatbot, powered by the OpenAI ChatGPT model, will require at least $4 billion in infrastructure to respond to all Bing users.
In the case of Latitude, for example, when the startup does not have to pay to train the basic OpenAI language model it accesses, it has to account for the inference cost equal to “half a cent per call.” in “several million requests per day,” said a spokesperson for Latitude.
“And I’m relatively conservative,” Curran said of the calculations.
To sow the seeds of the current AI boom, venture capitalists and tech giants have invested billions of dollars into startups specializing in generative AI technology. Microsoft, for example, is investing up to $10 billion into OpenAI watchdog GPT, according to media reports in January. SalesforceIts venture capital arm, Salesforce Ventures, recently debuted a $250 million fund that provides generative AI startups.
As an investor Semil Shah from the VC firm Haystack and Lightspeed Venture Partners explained on Twitter, “VC dollars are moving from subsidizing taxi rides and burrito deliveries to LLMs and generative AI computing.”
Many entrepreneurs see the risk of relying on a subsidized AI model that they cannot control and only pay per use.
“When I spoke to AI colleagues at startup conferences, this is what I said: Don’t just rely on OpenAI, ChatGPT or other big language models,” said Suman Kanuganti, founder of personal.ai, the chatbot that exists today. beta mode. “Because businesses are moving, they’re all owned by big tech companies, right? If they cut access, you’re gone.”
Companies like tech company Conversica are exploring how they can use the technology through Microsoft’s Azure cloud service at discounted prices today.
While Conversica CEO Jim Kaskade declined to comment on the amount of the initial payment, he acknowledged that the subsidized fee was received for exploring how the language model could be used more effectively.
“If they actually try to cancel, they’ll charge more,” Kaskade said.
How to change

It is not clear that AI computing will remain expensive as the industry evolves. Companies that make basic models, semiconductor manufacturers, and startups all see business opportunities in reducing the price of AI software.
Nvidia, which has about 95% of the market for AI chips, continues to develop more powerful versions designed specifically for machine learning, but improvements in total chip power across the industry have slowed over time.
Still, Nvidia CEO Jensen Huang believes that in 10 years, AI will be a million times more efficient due to improvements not only in chips, but also in software and other computer parts.
“Moore’s law, on the best day, will deliver 100x within a decade,” Huang said last month on an earnings call. “By creating new processors, new systems, new interconnections, new frameworks and algorithms, and working with data scientists, AI researchers on new models, in all these ranges, we have made the process of large language models a million times faster.”
Some startups have focused on the high cost of AI as a business opportunity.
“No one said you had to build something designed for inference. What would it look like?” said Sid Sheth, founder of D-Matrix, initially building a system to save money on inference by doing more processing in computer memory, as opposed to on the GPU.
“People are now using GPUs, NVIDIA GPUs, to do most of their inference. They’re buying DGX systems that NVIDIA sells that cost a ton of money. The problem with inference is that the workload increases rapidly, which is what happened in ChatGPT. , so like a million users in five days. There’s no way your GPU capacity can keep up because it’s not built for that. It’s built for training, for graphics acceleration,” he said.
Delangue, CEO of HuggingFace, believes that more companies will focus more on small and specific models that are cheaper to train and launch, rather than large language models that get the most attention.
Meanwhile, OpenAI announced last month that it is reducing the cost for companies to access the GPT model. Now charging one-fifth of one cent for about 750 words of output.
OpenAI’s lower price has attracted AI makers Dungeon Latitude.
“I really have to say that there are big changes that we’re excited to see happening in the industry and we’re constantly evaluating how we can provide the best experience for our users,” said a Latitude spokesperson. “Latitude will continue to evaluate all AI models to make sure we have the best games out there.”
Watch: AI’s “iPhone Moment” – separating ChatGPT Hype and reality
