laitimes

The first brother of cloud computing once again predicted the industry

author:虎嗅APP
The first brother of cloud computing once again predicted the industry

Visual China

In the cloud computing industry, where "service" is the core, the enterprises with the most customers are usually the most sensitive to the market and are most likely to become the target of other vendors. This happened repeatedly with Amazon Web Services until generative AI exploded.

People suddenly feel that the absolute number one brother in the cloud computing market does not seem to be leading in technology this time. Will the intelligent era be reshuffled? Will Amazon Web Services become another giant enterprise that has been folded away by new technologies? All kinds of speculations are rampant. And it turns out that everyone is worried - even when hundreds of large models around the world start a scuffle, Amazon Web Services still focuses on services. Moreover, the investment in technology has also paid off, and after the release of Claude 3, no one questions the underlying capabilities of Amazon Web Services at the AI level.

Unlike Microsoft Azure's Copilot strategy, Amazon Web Services' updated ToB is more colorful and more suitable for public cloud scenarios. Under the definition of Amazon Web Services, the concept of "service" in today's cloud computing has changed in the generative AI scenario, and it has gradually evolved into three dimensions: making full use of the latest hardware performance, making full use of large model technology updates while avoiding the risk of industry turmoil, and fully adapting to the new application thinking in the generative AI era.

This makes Amazon Web Services a differentiated competitive advantage over Microsoft Azure to some extent.

Encapsulating the hardware industry with the concept of "service".

In the current hardware market, business operators are worried about hardware replacement.

All of this is more or less due to NVIDIA's ingenuity, which keenly captures the cost problems caused by large models, and therefore refuses to limit the customer's purchase reason to "the new card is more performant".

In fact, except for a few leading basic model developers in the world, as well as institutions and giants that focus on meteorology and pharmaceuticals in the field of scientific research, most customers have not ushered in the skyrocketing demand for computing power, and their revenues have not yet reached the point of large-scale iteration of GPUs on an annual basis. So, one of the key topics that GTC leads every year is that the energy consumption of new cards, and the consequent training costs, are rapidly decreasing. The GB200 NVL72 released in the 2024 GTC has a 30-fold increase in inference performance, which is very powerful, but in the author's opinion, the cost and energy consumption are reduced by 25 times, which is even more tempting.

In addition, NVIDIA's market share of more than 90% has not led to too many antitrust investigations, and a core reason is that the current competition in the chip market still objectively exists. Google, Intel, Qualcomm and other companies jointly established the UXL Foundation, and decided to break through the software moat that Nvidia built around CUDA first. Since last year, AMD has been using MI300 as the main force to try to challenge Nvidia head-on. From a longer-term perspective, the "wooden sister" of the Ark Fund believes that Nvidia is risky, much like Cisco twenty years ago. As a result, on April 19, Nvidia fell 10%.

In a sense, this strengthens the sense of uneasiness of incubating innovative businesses based on GPU computing power - the ROI of using old cards is rapidly decreasing, but the risk is rapidly increasing. And once it enters the mode of "buying a card to start a business", the AI project will immediately become an asset-heavy attribute, which further exacerbates the risk.

The anxiety of enterprises is often also an opportunity for cloud computing, which is accustomed to treating everything as a "service". Therefore, Amazon Web Services has made a complete layout around the construction of underlying computing power.

First of all, Amazon Web Services, as the head customer of NVIDIA, has a high degree of brand binding between the two parties. Huang appeared at last year's Amazon Web Services re:Invent, while Amazon Web Services CEO Adam Selipsky appeared at GTC 2024, highlighting the more than 13-year partnership. And Amazon Web Services also announced the availability of Amazon EC2 instances based on NVIDIA Grace Blackwell GPUs, the first cloud AI supercomputer powered by NVIDIA's Grace Hopper superchip, and the first NVIDIA DGX cloud powered by NVIDIA GH200 NVL32.

But on the other hand, Amazon Web Services does not put all its eggs in one basket. The latest Amazon Web Services self-developed Trainum 2 for AI training and Inferentia 2 for AI inference have been released. The former can support hundreds of billions or even trillions of parameter scale model training, and UltraClusters, which is used to guide large-scale parallel computing, is also quite mature.

Chen Xiaojian, general manager of the product department of Amazon Web Services Greater China, revealed that UltralClusters can support up to more than 100,000 of the latest Trainium2 for parallel training. This is reminiscent of NVLink and NVSwitch, where connectivity and orchestration capabilities in clustering scenarios become quite important as the performance of a single chip approaches the physical limit of the wafer. A single chip is not enough to reassure people about Amazon Web Services' "service" that shields the turbulent hardware industry, but with the addition of UltralClusters, it may be.

At the hardware level, Amazon Web Services focuses on the core anxiety of startups, replacing the asset-heavy model with the cloud model, shielding the complex side of generative AI infrastructure construction. Cloud computing companies are always doing services, not technology, which is the key point that people must keep in mind when trying to understand Amazon Web Services.

"Shelf" is the best form of entry into the GenAI B-end market

This kind of implementation of services becomes more obvious after moving away from the IaaS layer and entering the PaaS layer.

Since the beginning of 2023 in the industry, the news layout of model release has been quite fixed: we have released xxx LLM, parameter scale xxx, and the performance is close to or exceeds GPT-3.5/GPT-4.

After Amazon Web Services invested $4 billion in Anthropic, Anthropic came up with a completely different answer from the above general layout: Anthropic released three large models of Claude, but including three versions - Opus, Sonnet, and Haiku, and there are significant differences in the performance of the three, ranked from high to low. According to Chen, customers can choose the most suitable combination of intelligence, speed and price according to their business needs. Opus is the most intelligent model for complex open scenarios, Sonnet is smart and fast for most scenarios, and Haiku is the most cost-effective model for high-volume use cases.

To put it simply, Haiku is the lowest performer, and it lags behind GPT-4 in many reviews, but it's cheap.

This is really a typical presentation of the concept of "service".

Not only that, in the PaaS layer, Amazon Web Services accurately hits too many pain points. For example, there is no general model that is the number one in terms of performance and ROI in all scenarios. Claude 3 Sonnet outscored GPT-4 in the "Multilingual Mathematics" dimension, but scored lower than GPT-4 in the "Math Problem Solving" dimension. When it comes to specific use cases, the problem can become even more complex.

This provides a great angle for Amazon Web Services to build Amazon Bedrock.

As a focus on the intermediate tooling layer, Amazon Bedrock provides users with easy access to 27 leading foundational models from AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon. Of course, Bedrock can also import its own custom models, on the one hand, to increase flexibility, and on the other hand, to take into account the current stage of the industry's model application progress.

Amazon Web Services has creatively built a concept of "large model shelf" in the stage of white-hot competition of large models, and enterprises choose large models, which is a bit like picking different flavors of Lay's potato chips, so that it once again creates a differentiated competitive advantage while maintaining a sense of oppression on Azure.

If you think about it, it is more logical for companies that want to innovate and start businesses in the generative AI trend, or companies that want to use AI capabilities to complete intelligent transformation of their business.

For the former, there are still barriers to the current understanding of generative AI. For example, Stable Diffusion is not an LLM, which goes against many people's general perception of the so-called large model (not to mention the industry's own vague definition of this series of concepts). This cognitive barrier will eventually translate into labor costs, which are reflected in the company's project schedule. Unless it is in the form of a "service", it is encapsulated and shielded.

For the latter, different business units of a business may need generative AI capabilities, but the size of a single team is not large. In the Chinese market, a company with more than 300 employees is defined as a large company. In a so-called large company, the R&D team may only have 50 people, the design team may only have 10 people, and the marketing team may not even have 10 people. Whichever model is introduced separately, serving the teams with a completely different focus above is a bit of a cost-off.

Finally, there is no big model that can pat its chest and say that it has an absolute competitive advantage. OpenAI has had a fierce "palace fight", Stable.AI was revealed to be seeking a sale, and Robin Li said that the open-source model will eventually be defeated by the closed-source model, but then it was hit by the "boomerang" shot by Llama 3. A series of turbulence shows that the market is far from stable, its competition is fierce and invisible, and any prediction is at risk of failing. Introducing a large model in a non-cloud environment, setting up a technical team for special training and fine-tuning, once there is a change, the loss will be huge.

All in all, the "big model shelf" is a must here, it is not a marketing concept, but a product concept.

In addition to the concept of "shelf", Amazon Web Services is rapidly completing the abstraction and productization of customer needs. They launched Model Evaluation, which supports automatic and manual evaluation of model performance, so as to prevent the current lack of unified testing standards from misleading enterprise decision-makers, and to prevent some companies that do targeted optimization for a certain test set from disrupting the market, and they launched Guardrails to ensure that the output content of large models is compliant.

They even took into account the biggest hot scene of the moment: image generation. Amazon Web Services launched the Titan Image Generator, which can generate images based on prompt words, and includes hidden watermarks to support watermark detection.

This judgment of Amazon Web Services is largely based on extreme attention and understanding of customers. According to industry insiders, the product releases of Amazon Web Services over the years are less affected by strategic planning and greater by customer feedback. This is almost identical to the later emerging software company Datadog (which solves observability problems in microservices architectures) - Datadog's product iteration was born entirely out of a Slack user group with tens of thousands of customers. To a certain extent, this has become a kind of "internal strength" for established and emerging head companies.

In addition to the direct battlefield of cloud computing, the increase in PaaS layer services has also indirectly killed a number of competitors who vainly try to become large model integrators, thus further expanding the depth of competition in Amazon Web Services. Today's OpenAI is folding some small businesses from the technical dimension, while Amazon Web Services is folding at the service dimension. As a direct result, one year after the opening of the large-scale model market, there is no longer the de facto "Easy Money", and "innovation" has become the most important competitiveness.

Burial of the old era of SaaS services

So, who will have the crucial ability to innovate in the next wave of startups?

No one can assert that, according to the current consensus logic of experts in the industry, any current perception of generative AI SaaS applications is influenced by the inherent mindset of the mobile Internet era, and is not native AI SaaS.

The only thing that is certain is that when a new native SaaS emerges, a batch of old SaaS will be buried.

In terms of external publicity, Amazon Web Services defines Amazon Q as an enterprise-level generative AI assistant that can connect to company data, information, and systems, and can be customized according to the customer's business, and marketers, project managers, and sales representatives in the enterprise can use Q to customize conversations, solve problems, generate content, take action, and so on. Many have defined it as the most weighty launch on the 2023 re:Invent.

That's because Amazon Q is a mix of so many different products — it's Copilot, which generates and interprets application code, and updates drafts and documentation for code packages, repositories, and frameworks, and it's also a digital employee that has been talked about a lot over the past few years, such as when companies can ask Q to analyze which features users are having problems with and how they can improve them. In the internal definition of Amazon Web Services, the concept of Amazon Q is more advanced and broader than that of AI Agent.

To some extent, the presentation model of cloud services, since the emergence of Amazon Q, has begun to diverge into completely different routes. Amazon Q has almost buried the last shovel of dirt in the "graveyard" of low-code, zero-code in the old era, and at the same time, it has also put the business of customizing digital employees for enterprises into jeopardy.

The core lies in whether you will treat the interactive capabilities of generative AI as a product and simply sell it to users, or treat it as a service and present it in a package. The former is more mainstream, but profitability is worrying, and traffic costs are more than 10 times higher than in the mobile era, while the latter seems more reasonable, but there need to be enough scenarios to close the business loop on other products.

In any case, the future development of Amazon Q may soon usher in a phased conclusion. With the pursuit of model parameters, the control of hardware costs gradually reaching a balance, and the gradual stabilization of the technical system, the concept of service will increasingly affect the competition pattern of the B-end market.

Read on