laitimes

Demystifying Meta's new weapons to catch up with AI: two self-developed chips, and a supercomputer

Focus:

  • 1Meta has been slow to adopt AI-friendly hardware systems, which has weakened its ability to compete with the likes of Google and Microsoft, which has developed two chips in-house and built AI supercomputing.
  • 2Meta has released a training and inference accelerator chip called MTIA, which can be used to train AI models and support running them, with plans to launch in 2025.
  • 3Meta is also developing a chip called MSVP to handle specific types of computing workloads. It is the first dedicated integrated electrical solution developed in-house by Meta and is designed for the processing needs of video-on-demand and live streaming.
  • 4Meta is assembling an AI supercomputer internally and using it to train LLaMA, a large-language model.

Tencent Technology News May 19 news, in the past few years, Facebook's parent company Meta has invested heavily in the meta-universe and has continuously worked on the development of related hardware and software, and may even ignore the latest trends in the field of artificial intelligence. But with the explosion of generative artificial intelligence, Meta seems to have readjusted the company's direction and begun to make efforts in the field of artificial intelligence. On Thursday, local time in the United States, Meta released two self-developed chips for artificial intelligence and revealed its latest progress in artificial intelligence supercomputing.

At Thursday's virtual event, Meta showcased its internal infrastructure for AI workloads, including support for running generative AI, a new technology the company integrates into its new ad design and creation tools. This is an attempt by Meta to show its strength. Previously, the company has been slow to adopt AI-friendly hardware systems, which has weakened its ability to keep pace with competitors such as Google and Microsoft.

Alexis Björlin, Meta's vice president of infrastructure, said: "Building our own hardware capabilities allows us to control every layer of the stack, from data center design to training frameworks. This horizontal vertical integration is necessary to drive AI research forward. ”

Over the past decade or so, Meta has spent billions of dollars recruiting top data scientists and building new types of AI, including the one that now powers discovery engines, moderation filters, and ad recommendations in its apps and services. But the company has struggled to turn many ambitious AI research innovations into products, especially in generative AI.

Until 2022, Meta ran its AI workloads by using CPUs and custom chips designed to accelerate AI algorithms. But Meta canceled custom chips originally planned for mass rollout in 2022 because it would require a major redesign of several of its data centers, and instead ordered billions of dollars worth of Nvidia GPUs.

AI accelerator chip

To turn things around, Meta plans to start developing a more ambitious in-house chip, which is scheduled to launch in 2025. Such chips can be used to train AI models and support running them.

Meta calls the new chip the Meta Training and Inference Accelerator, or MTIA for short, and classifies it as a "family of chips" that accelerate AI training and inference workloads. "Inference" refers to running a trained model. MTIA is an application-specific integrated circuit (ASIC), a chip that combines different circuits on a single circuit board, allowing it to be programmed to perform one or more tasks in parallel.

Demystifying Meta's new weapons to catch up with AI: two self-developed chips, and a supercomputer

Figure 1: AI chips tailored for AI workloads

Björin continued, "To achieve better efficiency and performance for our important workloads, we needed a custom solution co-designed with the model, software stack, and system hardware. This provides a better experience for our users on various services. ”

Custom AI chips are increasingly becoming a staple of big tech companies. Google has developed a processor TPU (Tensor Processing Unit) for training large generative AI systems such as PaLM-2 and Imagen. Amazon offers proprietary chips to AWS customers for training (Trainium) and inferentia (Inferentia). Microsoft is reportedly working with AMD on an in-house AI chip called "Athena."

Meta said the company developed the first generation of MTIA (MTIA v1) in 2020 and produced it using the 7-nanometer process. It can scale from 128 MB of memory to 128 GB, and in benchmarks designed by Meta, Meta claimed that MTIA can handle "low complexity" and "medium complexity" AI models more efficiently than GPUs.

Meta said there is still a lot of work to be done in the areas of chip memory and networking, which still have bottlenecks as AI models grow in scale, requiring workloads to be spread across multiple chips. Coincidentally, Meta recently acquired the Oslo-based AI network technology team of British chip unicorn Graphcore. For now, MTIA's focus is on rigorous reasoning on the "recommended workload" of the Meta family of apps, rather than training.

But Meta stressed that the ever-improving MTIA has "tremendously" improved the company's efficiency when it comes to running recommended workloads, allowing Meta to run "more enhanced" and "cutting-edge" AI workloads.

Artificial intelligence supercomputer

Perhaps one day, Meta will hand over most of its AI workloads to MTIA. But for now, the social networking giant relies on its research-focused supercomputer, Research SuperCluster.

Research SuperCluster, which debuted in January 2022 and is assembled in collaboration between Penguin Computing, NVIDIA and Pure Storage, has completed the second phase of construction. Meta says the Research SuperCluster now contains a total of 2,000 Nvidia DGX A100 systems with 16,000 Nvidia A100 GPUs.

So why is Meta building supercomputers in-house? First, there's pressure from other tech giants. A few years ago, Microsoft hyped up its AI supercomputer it developed in partnership with OpenAI, and recently said it would work with Nvidia to build a new AI supercomputer on the Azure cloud. At the same time, Google is also touting its own artificial intelligence supercomputer, which has 26,000 Nvidia H100 GPUs, far more than Meta's supercomputing.

Demystifying Meta's new weapons to catch up with AI: two self-developed chips, and a supercomputer

Figure 2: Meta's supercomputer for AI research

But Meta says that in addition to keeping pace with other peers, Research SuperCluster also allows its researchers to train models using real-world examples from the Meta system. This is different from the company's previous AI infrastructure, which could only leverage open-source and publicly available datasets.

A Meta spokesperson said: "The Research SuperCluster AI supercomputer is used to advance AI research in several areas, including generative AI. This is actually closely related to the efficiency of AI research. We want to provide AI researchers with state-of-the-art infrastructure that will enable them to develop models and provide them with a training platform that facilitates the development of AI. ”

At its peak, the Research SuperCluster could reach 5 exaflops in computing power, which Meta claims is one of the fastest computers in the world. Meta says it uses Research SuperCluster to train LLaMA, a large-language model. Earlier this year, Meta opened up access to "closed release" big language models to researchers. Meta said the largest LLaMA model was trained on 2,048 A100 GPUs and took 21 days.

A Meta spokesperson said: "The Research SuperCluster will help Meta's AI researchers build new and better AI models that can learn from trillions of examples, work across hundreds of different languages, seamlessly analyze text, images and videos, and develop new augmented reality tools." ”

Video transcoder

In addition to MTIA, Meta is also developing another chip to handle specific types of computing workloads. Dubbed the Meta Scalable Video Processor, or MSVP, the chip is Meta's first in-house developed application-specific integrated circuit (ASIC) solution designed to handle the processing needs of video-on-demand and streaming.

Some may recall that Meta started thinking about custom server-side video chips years ago and announced an ASIC for video transcoding and inference in 2019. MSVP is one of these efforts and the result of a renewed push for competition in the streaming space.

Meta's head of technology, Harry Krishna Reddy, and Yunqing Chen wrote in a blog post they co-authored: "On Facebook alone, people spend 50% of their time watching videos. We need to serve a variety of devices around the world (e.g. mobile devices, laptops, TVs, etc.), such as videos uploaded to Facebook or Instagram being transcoded into multiple bitstreams, which have different encoding formats, resolutions, and quality, MSVPs are programmable and scalable, and can be configured to effectively support the high-quality transcoding required for VOD, as well as the low latency and faster processing times required for live streaming. ”

Demystifying Meta's new weapons to catch up with AI: two self-developed chips, and a supercomputer

Figure 3: Meta's custom chips are designed to accelerate video workloads such as streaming and transcoding

Meta said the company's plan is to eventually shift the majority of its "stable and mature" video processing workloads to MSVP, using software video encoding only for workloads that require specific customization and "significant" quality improvements. Meta also said MSVP's work on improving video quality continues through pre-processing methods such as intelligent denoising and image enhancement, as well as post-processing methods such as artifact removal and super resolution.

"In the future, MSVP will enable us to support more of Meta's most important use cases and needs, including short videos, enabling the efficient delivery of generative AI, AR/VR and other virtual reality content," said Reddy and Yunqing Chen. ”

AI focus

If there's one common thread in the latest hardware announcements, it's that Meta is desperately accelerating the pace of AI, especially when it comes to generative AI.

In February, Meta CEO Mark Zuckerberg reportedly made boosting Meta's AI computing capabilities a top priority, announcing the creation of a new top-of-the-line generative AI team that, in his words, will "turbocharge" the company's research and development. Andrew Bosworth, Meta's chief technology officer, also recently said that generative AI is an area where he and Zuckerberg spend the most time. According to Yang Likun, chief scientist at Meta, the company plans to deploy generative artificial intelligence tools to create items in virtual reality.

In April, Zuckerberg said on Meta's first-quarter earnings call: "We are exploring chat experiences with WhatsApp and Messenger, visual creation tools for posts and ads on Facebook and Instagram, and video and multimodal experiences over time." I want these tools to be valuable to everyone, from ordinary people to creators to businesses. For example, I expect that once we get that experience, there will be a lot of interest in AI agents in business messaging and customer support. Over time, this will also extend to our work in virtual worlds, where it will be easier for people to create avatars, objects, worlds, and the code that ties them all together. ”

At some point, Meta is feeling increasing pressure from investors that the company isn't moving fast enough to capture a piece of the huge potential market for generative AI. Currently, the company has not launched a product that can compete with chatbots such as Bard, Bing or ChatGPT. There's also not much progress has been made in image generation, another key area of explosive growth.

If these predictions are correct, the total target market size for generative AI software could reach $150 billion. Goldman Sachs, the US investment bank, predicts that this will increase GDP by 7%.

Even if part of the prediction comes true, it could make up for Meta's billions of dollars in lost investments in metaverse technologies such as augmented reality headsets, conferencing software, and Horizon Worlds. Meta's augmented reality division, Reality Labs, lost a net loss of $4 billion last quarter and expects operating losses to continue to increase throughout 2023. (Golden Deer)

Read on