laitimes

IEEE: SambaNova's new chip can run models more than twice as large as OpenAI's ChatGPT Premium

author:Zhongguancun Online

As companies rush to join the AI bandwagon, chips and talent are in short supply. Startup SambaNova claims that its new processor can help companies get their own Large Language Model (LLM) up and running in a matter of days.

IEEE: SambaNova's new chip can run models more than twice as large as OpenAI's ChatGPT Premium

SAMBANOVA

The Palo Alto-based company, which has raised more than $1 billion in venture capital, will not sell chips directly to the company. Instead, it sells access to its custom technology stack with proprietary hardware and software specifically designed to run the largest AI models.

Following the company's introduction of the new SN40L processor, the technology stack is now significantly upgraded. Each device is manufactured using the 5-nanometer process of Taiwanese chip giant Taiwan Semiconductor Manufacturing Co., Ltd., with 102 billion transistors, distributed across 1,040 cores, at speeds of up to 638 trillion times. It also has a novel three-tier memory system designed to handle the huge data streams associated with AI workloads.

“A trillionparameters is actually not a big model if you can run it on eight [chips].” —Rodrigo Liang, SambaNova

SambaNova claims that a node consisting of just eight chips is capable of supporting models with up to 5 trillion parameters, which is almost three times the size reported by OpenAI GPT-4 LLM. The sequence length (a measure of the length of input that the model can handle) is up to 256,000 tokens. CEO Rodrigo Liang said that using industry-standard GPUs also requires hundreds of chips, meaning the total cost of ownership is less than 1/25th of the industry-standard method.

"If you can run a trillion parameters on eight chip sockets, it's not actually a big model," Liang said. We're dismantling the cost structure and really reimagining how people think about it, rather than treating the trillion-parameter model as something inaccessible. ”

The new chip uses the same data streaming architecture as the company's previous processors. SambaNova's basic argument is that existing chip designs focus too much on simplifying instruction flow, but for most machine learning applications, the efficient movement of data is a bigger bottleneck.

To solve this problem, the company's chip uses a tiled array of memory and computing units connected by a high-speed switching fabric, which makes it possible to dynamically reconfigure how the units are connected based on the problem at hand. This works in tandem with the company's SambaFlow software, which analyzes machine learning models and figures out the best way to connect units to ensure seamless data flow and maximize hardware use.

In addition to the transition from a 7-nanometer process to a 5-nanometer process, the main difference between the company's latest chip and its predecessor, the SN30, is the addition of a third layer of storage. Early chips had 640 megabytes of on-chip SRAM and 1 megabyte of external DRAM, but the SN40L will have 520 megabytes of on-chip memory, 1.5 megabytes of internal memory, and an additional 64 megabytes of high-bandwidth memory (HBM).

Memory is increasingly becoming a key differentiator for AI chips as the resulting AI models continue to expand, meaning that moving data tends to be a greater drag on performance than raw computing power. This has prompted the company to increase the amount and speed of memory on the chip. SambaNova isn't the first company to turn to HBM to combat this so-called memory wall, with its new chip having less memory than its competitors — NVIDIA's industry-leading H100 GPU will have 80GB of memory, while AMD's upcoming MI300X GPU will have 192GB. SambaNova wouldn't disclose bandwidth data for its memory, so it's hard to tell how it compares to other chips.

Liang said that while SambaNova relies more on slower external memory, its technology is a software compiler that intelligently distributes the load between the three memory layers. The proprietary interconnect between the company's chips also allows the compiler to treat the setup of eight processors as a single system. Liang said: "The performance in training will be fantastic. ”

SambaNova is also wary of how to handle another hot topic in AI chips — sparsity. Many weights in LLM are set to zero, so performing operations on them is wasted calculations. Finding ways to take advantage of this sparsity can provide significant speedups. SambaNova claims in its promotional materials that the SN40L "provides dense and sparse computing." Liang said this is achieved in part through scheduling and how data is brought to the chip at the software layer, but he also declined to discuss hardware components. "The sparseness problem is a battleground," he said, "so we're not ready to reveal how we did it." ”

Another common trick to help AI chips run large models faster and cheaper, is to reduce the precision with which parameters are represented. The SN40L, which uses the bfloat16 digital format invented by Google engineers, also supports 8-bit precision, but Liang says low-precision computing isn't their focus because their architecture already allows them to run models on a smaller footprint.

Liang said the company's tech stack is clearly focused on running the largest AI models — their target audience is the 2,000 largest companies in the world. The sales pitch is that these companies sit on a lot of data, but they don't know what most of it is saying. SambaNova says it can provide all the hardware and software needed to build AI models, unlocking that data without companies having to fight for chips or AI talent. "You can be up and running in days, not months or quarters," Liang said. Now every company can have its own GPT model. ”

Gartner analyst Chirag Dekate said one area where the SN40L could have a significant advantage over competing hardware is multimodal AI. He says the future of generative AI is large models that can handle a variety of different types of data, such as images, videos, and text, but this leads to highly variable workloads. Dekate says the rather rigid architecture in GPUs isn't quite suitable for this kind of work, but that's where SambaNova's focus on reconfigurability lies. "You can tune the hardware to meet the requirements of the workload," he said.

However, Dekate said, custom AI chips like those made by SambaNova do come with a trade-off between performance and flexibility. Although GPUs may not be as powerful, they can run almost any neural network out of the box and are supported by a strong software ecosystem. Dekate noted that SambaNova has been building a catalog of pre-baked models that customers can leverage, but NVIDIA's dominance in all aspects of AI development is a significant challenge.

Dekate said: "This architecture is actually superior to traditional GPU architectures. But unless you put these technologies in the hands of customers and enable mass consumerization, I think you're likely to get stuck. ”

Dylan Patel, principal analyst at consulting firm SemiAnalysis, said it will be even more challenging as NVIDIA is also entering the full-stack AI-as-a-Service market with its DGX cloud offering. "The chip is an important step forward," he said.

(8359992)

Read on