laitimes

Specially Curated丨Full-Stack Innovation: How Cloud Computing Platforms Accelerate Enterprise Deployment of Generative AI

Specially Curated丨Full-Stack Innovation: How Cloud Computing Platforms Accelerate Enterprise Deployment of Generative AI

Generative AI will be an important source of competitive advantage for businesses in the future. The key issue facing enterprises now is not whether to use the big model, but how to make the big model land and create real value for the enterprise. As a product of big data and large computing power, large models "naturally grow on the cloud", and the implementation of large models depends on the quality of cloud services.

As one of the leaders in cloud computing and generative AI, Amazon Web Services unveiled the latest AI full-stack innovation and supporting toolkit at its 2023 re:Invent conference, with the aim of helping more enterprises innovate with generative AI.

As the U.S. market is about 6-12 months ahead of China in the field of generative AI, the competitive landscape of the basic large model track has basically taken shape, more innovation is converging to the application/AI-native application layer, and the awareness and acceptance of AI by commercial organizations is correspondingly higher. The basic consensus in the U.S. business community, especially management, is that generative AI will have a disruptive impact on the competitiveness of companies in the coming years. Therefore, more and more enterprises, especially large enterprises/institutions with strong financial and technical strength, are experimenting with generative AI.

The conference revealed some recent information, such as BlackStone, the world's largest alternative asset investment and management company, is partnering with Amazon Web Services and vector database provider Pinecone to launch a generative AI-based solution to empower investment teams.

Hotel group Marriott International is partnering with Amazon Web Services, consulting firm Deloitte and software vendor Palo Alto Networks to more securely leverage AI to deliver digital housekeeping, according to Marriott International's Chief Information Security Officer, Arno Van Der Walt.

However, for most businesses, the transition to generative AI can involve high time and technology costs, as well as security and privacy challenges. In order to help more customers solve transformation challenges and minimize the application cost of large models, Amazon Web Services also released its full-stack innovation for generative AI at this year's re:Invent conference, covering the underlying infrastructure layer responsible for training and inference, the intermediate model tool service layer, and the upper generative AI application layer, with the aim of helping enterprises implement large models at all levels of the full stack.

Infrastructure Layer:

Cooperate with Nvidia to build H200 supercomputing

Large model training and inference require huge computing power, and the ability of the basic computing power layer also determines the ability of large models, which is self-evident.

One of the highlights of the conference was Amazon Web Services CEO Adam Selipsky and Nvidia CEO Jensen Huang on stage. 13 years ago, Amazon Web Services was a global cloud service provider that provided GPU computing power, and now Amazon Web Services has deployed more than 2 million GPUs (H100) with Hopper architecture, which is equivalent to providing computing power for 3,000 supercomputers. The two companies then announced that they would expand their collaboration to deploy the latest Crace Hopper, the H200 chip, on Amazon Web Services.

In addition to state-of-the-art GPUs, a good distributed computing architecture is also needed to form powerful computing power. Amazon Web Services and Nvidia announced that they will join forces to build the world's first H200-based cloud supercomputer, which will integrate up to 16,385 H200 computing power with the help of Amazon Web Services EC2 UltraCluster. This will easily meet the pre-training of large models at the trillion-parameter level.

In addition, Amazon Web Services also released the latest machine learning Xi chip, Trainuim2, which is reported to be four times faster than the first generation of training. It is capable of being deployed in EC2 UltraClusters with up to 100,000 chips, allowing large language models and base models to be trained in less time while increasing energy efficiency by nearly two times. The ultra-high energy consumption of large models has always been a problem that enterprises seeking carbon neutrality must face, and Trainuim2 will effectively help them reduce the carbon footprint caused by the application of large models.

Model Tool Service Layer:

Announced support for almost all major open source models

With enough computing power, the model also needs to be trained, fine-tuned, and continued to be trained. Amazon Web Services' generative AI service, Bedrock, also announced new features. One of them is to provide customers with more model options for building and scaling generative AI applications. This includes new models from Anthropic, Cohere, Meta, and Stability AI. For example, Anthropic's Claude 2.1 provides an industry-leading 200K token contextual input library while also improving inference accuracy. Claude 2.1 reduces hallucinations in open conversations by 50% and misrepresentations by a factor of 2, both of which are core barriers to AI adoption by businesses, according to Anthropic CEO at re:Invent 2023. Amazon Web Services also entered into a strategic partnership with Anthropic this year, and Amazon Bedrock customers will also have exclusive early access to Claude's customization and fine-tuning model capabilities that are not available elsewhere.

Specially Curated丨Full-Stack Innovation: How Cloud Computing Platforms Accelerate Enterprise Deployment of Generative AI

A problem faced by many enterprises is how to choose their own basic model, and Amazon Web Services has also launched the Bedrock model evaluation service in response to this pain point.

There are two types of model evaluation: automatic evaluation and manual evaluation. In automated assessments, developers can use the Amazon Bedrock console to select the model they want to evaluate, such as Amazon Llama, Amazon Claude 2, or Amazon Stable Diffusion. Amazon Bedrock evaluates the model's performance metrics such as robustness, accuracy, and security in tasks such as summarization, text classification, question answering, and text generation.

During the evaluation process, Amazon Web Services provides test datasets, but enterprises can also import their own data into the benchmarking platform to better match their own business scenarios and select the most suitable model. As for manual evaluation, customers have the option to work with the Amazon Web Services assessment team or with their own team to ensure a thorough review and evaluation of model performance.

The conference also introduced Amazon Web Services' Amazon Titan large model series, which allows enterprises to choose the appropriate model service according to business scenarios. Taking e-commerce as an example, customers can first use the text embedding model to convert service terms and after-sales policies into vectors, which is convenient for customer service personnel to conduct fuzzy searches.

Customers can also translate product details, including materials and features, into Prompts and input them into Amazon Text Express models to generate detailed product descriptions. The model can also generate a variety of trial scenarios for us, making the product description closer to life and easier for customers to resonate.

The marketing department needs to generate a series of search keywords for this shoe, go to the search engine to buy the keywords, and bring better traffic to the website. Obviously, this simple task can be completed with a small Amazon Text Lite model, which is less expensive than calling a larger model.

With the Amazon Titan multimodal embedding model, companies can convert images and text descriptions of the new shoes into vectors and store them in a vector database. So let's say a customer sees someone else wearing the shoes on the street, and he only needs to take a photo and don't need to know the brand to retrieve the shoes.

Finally, the website designer needs to market the shoe with a custom promotional image, such as adding a background to the product. Amazon Titan Image Generator can generate such images in natural language.

Specially Curated丨Full-Stack Innovation: How Cloud Computing Platforms Accelerate Enterprise Deployment of Generative AI

Generative AI Application Layer:

智能助理Amazon Q问世

As the highlight of this year's re: Invent, Amazon Web Services has grandly released its own enterprise generative AI application: Amazon Q, an AI assistant.

According to Amazon Web Services CEO Adam Selipsky, employees can use Amazon Q to have conversations, solve problems, generate content, gain insights, and make decisions by seamlessly leveraging enterprise repositories, code, data, and enterprise systems.

Amazon Q quickly connects to enterprise business data, information, and systems so employees can have customized conversations, solve problems, generate content, and take actions relevant to your business. Amazon Q generates answers and insights based on the material and knowledge provided, along with references and citations from the source document.

For example, a new employee can ask Amazon Q, "Where can I find the latest brand logo instructions?" and Amazon Q can find what you need without having to switch between multiple systems. Because Amazon Q understands the follow-up question, you can move on to more help, such as, "Where can I find the different color combinations of our logo?", and Amazon Q will understand the context of the previous question and reveal where the relevant information is.

For enterprises, Amanzon Q is significant for both management and workflow refactoring. Many businesses face similar challenges: while businesses have vast amounts of information scattered across multiple documents, systems, and applications, employees in every organization spend a lot of time in their day-to-day work scouring for internal information, collating data, writing reports, and making presentations, as well as adapting content to different audiences.

Amazon Q currently offers more than 40 built-in connectors to popular enterprise applications and document repositories, including Amazon Simple Storage Service (Amazon S3), Salesforce, Google Drive, Microsoft 365, Gmail, Slack, and Zendesk.

Build responsible AI

For enterprise-level applications, the security and privacy protection of generative AI has always been a top priority for enterprises, and this conference also announced new initiatives of Amazon Web Services in these areas.

For example, Amazon Bedrock Guardrail, which is currently under development, allows enterprises to customize security policies for the use of artificial intelligence to ensure secure interactions between users and large-scale applications.

Enterprises can apply safeguards to all large language models in Amazon Bedrock, including fine-tuned models and agents, to ensure that enterprise customers can innovate with security.

The Amazon Guardrail service includes several security policy features:

Rejection of topics: Businesses can use short natural language descriptions to define a set of topics that are not expected to appear in the context of an AI application. For example, a bank can configure its large model not to provide investment recommendations to customers.

Content filters: Businesses can configure filters to block harmful content such as hate, insults, sex, and violence. While many large language models already offer built-in protections to prevent undesirable and harmful responses from being generated, GuardRail provides enterprises with additional control to filter the interaction between AI and users at generation time to the desired extent based on enterprise use cases and responsible AI policies. Higher filtering corresponds to stricter content control. For example, e-commerce sites can allow AI to refrain from using hate speech or abusive language.

Personal information masking: Businesses can select a set of personally identifiable information, such as name, email address, and phone number, to mask in the generated large language model response, or block user input that contains personally identifiable information. For example, a utility company can redact a customer's personally identifiable information from a customer's call log.

The use of AI to create false information is also a risk point for businesses, and all images generated by Amazon Titan Image Generator will automatically contain invisible watermarks. Amazon Web Services wanted to find a way to mark up images created by AI, especially from its own large models, that don't affect the visuals, have no latency, and can't be cropped or compressed. Adding invisible watermarks can help enterprises solve related risks such as information authenticity and AI content traceability.

From building generative AI technology in the full stack to using AI responsibly, it can be seen that Amazon Web Services is developing one-stop, end-to-end generative AI enterprise services, thereby greatly lowering the threshold for AI implementation.

The era of inclusive generative AI has arrived. From the combing of the latest technological developments, we can also summarize the innovation and application challenges that need to be paid attention to when the implementation of large models. For companies that aim to use AI to improve their competitiveness, no matter what industry or stage of development they are in, these factors deserve the attention of managers.

安健丨文

An Jian is a contributing writer for the Chinese edition of Harvard Business Review

TO

Read on