laitimes

How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?

author:Serverless community
This article is the first in the series "Building AIGC Applications on Serverless".

preface

With the rise of ChatGPT and Stable Diffusion, Midjourney, these new generation AIGC applications, the related development around AIGC applications has become more and more extensive, there is a blowout trend, in the long run, this wave of application outbreak is not only staying on the form, but also generating actual productivity value in various fields, such as copilot system 365 in the office field, DingTalk intelligence; In the field of code programming, there are GitHub Copilot, Cursor IDE; Magic Duck Camera in the entertainment sector; It is certain that in the future, the number of AIGC applications will be more and the types will be richer, and the internal software or SOP of the enterprise will be integrated with AI as much as possible, which will inevitably give rise to a large number of AIGC application development needs, which also represents a huge market opportunity.

The challenge of developing AIGC applications

The application prospects of AIGC are so attractive that it may determine the future direction of the enterprise. However, for many small and medium-sized enterprises and developers, getting started with the development of AIGC applications is still very expensive:

  • Acquisition of basic model services: chatGPT provides a very complete API development system, but it is not open to domestic customers, and it is very difficult to deploy services with an open source model
  • High fees, GPU shortages have led to a sharp spike in GPU costs, local purchase of high-spec graphics cards requires a one-time cost, and does not provide online services.
  • End-to-end docking: The API of simple model service cannot be turned into direct productivity, and it is necessary to complete the complete link of [enterprise data & enterprise SOP]-> LLM service-> various ends

Function Compute solution for AIGC applications

Function Compute provides a complete package around the creation and use of AIGC, from infrastructure to application ecology, development to use

How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?

It mainly consists of three parts:

  • First, the model service base, function computing can deploy AI models from the Kaiyuan community such as Modai and Huggingface, we have made special customization for intelligent knowledge base/assistant scenarios such as LLM and Bert, access OpenAI-compatible API specifications, and provide one-click deployment templates and visual web access interfaces to help developers or enterprises quickly get started with the deployment of llama2, chatglm2, tongyi qianwen and other models
  • Second, the business connection layer, connecting business needs and basic resources such as model services, security services, database services, etc., this part has a lot of the same logic in the AIGC application part, such as account system, data set management, prompt word template, tool management, model service management, etc., from the perspective of each business side, the different part is just prompt words, knowledge base and tool set, underlying model services, security services, database services are shared, This layer can simplify the process of building different intelligent scenarios in the business, and quickly and low-cost AIGC applications for different services can be built
  • Third, the client side, the client is the use part of AI applications, but also the closest part to the business, this part more consider how to integrate AI services into the existing use end, such as DingTalk, enterprise WeChat and other office IM systems, as well as web browser plug-ins, through function computing + eventbridge can quickly help AI services access to these clients

In this tutorial, let's share the first part, how to quickly deploy AIGC-related model services through Function Compute, including LLM model and Embedding (Bert) model

LLM model and Embedding service deployment tutorial

Preliminary preparation

To use this project, you need to activate the following services:

serve remark
The function computes FC

CPU/GPU inference calculations on AIGC

https://free.aliyun.com/?pipCode=fc

File storage NAS To store large language models and models required by Embedding service, new users should first receive the free trial resource package https://free.aliyun.com/?product=9657388&crowd=personal

Application introduction

Application details

Use Alibaba Cloud Function Compute to deploy open source large-model applications, providing an interface compatible with openAI specifications and ChatGPT-Next-Web client.

How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?

Manipulation documentation

LLM application template

Log in to the Alibaba Cloud Function Compute console-> Applications->Create Application-> Artificial Intelligence Select an AI Large Language Model, API Service application template, and click Create Now

How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?

Apply theme settings

How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?
How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?

Set up and know click "Create and deploy default environment"

Wait for the deployment

The process is done automatically

Service Access

How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?

Two links are returned after the service deployment succeeds

1. llm-server is the API service interface of the large language model, based on swagger.

How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?

2. llm-client is the access client, the access client requires you to enter the client private password filled in earlier, and you can test it after filling it in

How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?
How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?
How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?

Embedding template

Log in to Alibaba Cloud Function Compute console-> Applications-> Create Application-> Artificial Intelligence Select the "Open Source Bert Model Service" application template, and click "Create Now"

How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?

Apply theme settings

Simply select a region to create

How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?

Wait for the deployment

The process is done automatically

Service Access

How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?

Test the embedding interface

How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?
How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?

summary

Deploying LLM service means that you have started the development journey of AIGC applications, and I will continue to share more AIGC-related content with you, including how to build a knowledge base Q&A application, how to call tools to improve the ability of large language models, how to access your own IM system and build web integration plugins.

Space Odyssey, Alibaba Cloud x Semir AIGC T-Shirt Design Contest

One

【Semir X Alibaba Cloud AIGC T-shirt Design Contest】Programmer AIGC own T-shirt, win Airpods, pattern custom T-shirt!

Quickly deploy Stable Diffusion through Function Compute FC: built-in model library + common plug-ins + ControlNet, support SDXL1.0

Attend now: https://developer.aliyun.com/adc/series/activity/aigc_design

You can win three generations of Airpods, customized co-branded T-shirts, Semir suitcases and other peripherals!

Two

You can also participate in topical activities to discuss the future development trend of AIGC, users can communicate and share from any angle, and can win eye-protection desk lamps, data cables, and silent purifier prizes!

Topic: How can AIGC push the boundaries of design inspiration compared to good fashion designers? Is it pure mechanical language or a little bit of inspiration? 》

https://developer.aliyun.com/ask/548537?spm=a2c6h.13148508.setting.14.4a894f0esFcznR

How to make Llama2 and Tongyi Qianwen open source big language models run fast on function computing?

Read on