Decoding AI: Demystifying the "brain" of chatbots - large language models

author：NVIDIA China 2024-04-16 17:37:00

If AI is in the "iPhone moment" that will change history, chatbots are one of its first hot applications.

Chatbots are made possible by large language models, which are deep learning algorithms pre-trained on large-scale datasets that recognize, summarize, translate, predict, and generate text and other forms of content. These models can run natively on PCs and workstations powered by NVIDIA GeForce and RTX GPUs.

Large language models excel at summarizing large amounts of text, deriving insights through data classification and mining, and generating new text in a user-specified style, tone, or form. They can facilitate communication in a variety of languages, even unconventional "languages" outside of human language, such as computer code or protein and gene sequences.

The first generation of large language models could only process text, but later iterations were trained on other types of data. These multimodal large language models can recognize and generate images, audio, video, and other forms of content.

Chatbots like ChatGPT are among the first technology applications to bring large language models to consumers, providing a familiar interface to speak and respond with natural language prompts. Since then, large language models have been used to help developers write code and scientists to advance drug discovery and vaccine development.

However, the need for computing power for many AI models should not be underestimated. Combining advanced optimization techniques and algorithms, such as quantization, with RTX GPUs built for AI can "prune" large language models so that they can run locally on a PC without an internet connection. The emergence of new lightweight large language models, such as Mistral, one of the large language models that power Chat with RTX, has reduced the need for computing power and storage space.

Why are large language models important?

Large language models have a wide range of applicability and can be used in a variety of industries and workflows. With this versatility and its own high-speed performance, large language models are able to bring performance and efficiency gains to almost any language-based task.

Decoding AI: Demystifying the "brain" of chatbots - large language models

DeepL running on NVIDIA GPUs in the cloud

Deliver accurate translations with AI.

Because of AI and machine learning to ensure the accuracy of the output, large language models like DeepL are widely used in language translation.

Medical researchers are using textbooks and other medical data to train large language models in hopes of improving patient care. Retailers are leveraging chatbots powered by large language models to provide users with a great customer support experience. Financial analysts are using large language models to record earnings calls and other important meetings, and to summarize the content of those meetings. And that's just the tip of the iceberg of how large language models can be used.

Chatbots like Chat with RTX and writing assistants built on large language models are making their mark on all aspects of knowledge-based work, whether it's content marketing, copywriting, or legal-related tasks. Coding assistants were one of the first applications to support large language models, heralding the future of AI-assisted software development. At present, the project represented by ChatDev is to combine large language models and AI agents (intelligent bots that can autonomously help answer questions or perform tasks) to build an AI-driven virtual software company that can provide services on demand. The user simply tells the system what applications they need and watches the system go to work.

Scan the QR code below to read the NVIDIA developer blog to learn more about large language model agents:

It's as easy as a daily conversation

Many people are first introduced to generative AI through chatbots like ChatGPT, which simplify the use of large language models through natural language, where users simply tell the model what to do.

Chatbots powered by large language models can help draft marketing copy, provide vacation advice, write customer service emails, and even create original poems.

Advances in image generation and multimodality by large language models have expanded the field of application for chatbots, adding the ability to analyze and generate images while retaining the simple and easy-to-use user experience. Users can simply describe an image or upload a photo to the bot and ask the system to analyze it. In addition to chatting, images can also be used as visual aids.

Future technological advancements will help large language models expand their capabilities in logic, reasoning, mathematics, and more, giving them the ability to decompose complex requests into smaller subtasks.

Advances have also been made in AI agents, where applications can take complex prompts, break them down into smaller ones, and autonomously work with large language models and other AI systems to accomplish the tasks that the prompts represent. ChatDev is a typical AI agent, and it doesn't mean that the agent can only be used for technical tasks.

For example, a user can ask a personal AI travel agent to book a vacation abroad for the whole family. The agent can break down the task into subtasks, including trip planning, booking tours and accommodations, creating packing lists, finding dog walkers, and then performing them independently one by one, one by one.

Unlock personal data with RAG

While large language models and chatbots are already powerful in general-purpose scenarios, they become even more useful when combined with individual user data. In this way, they can help analyze emails to spot trends, comb through informative user manuals to find answers to a technical question, or synthesize and analyze bank and credit card statements accumulated over years.

Hooking a specific dataset with a large language model, Retrieval Enhanced Generation (RAG) is one of the simplest and most effective methods.

Example of RAG on PC.

RAG leverages factual data from external sources to improve the accuracy and reliability of generative AI models. By connecting the large language model to virtually any external source, users can "talk" to the data warehouse through RAG, and the large language model can directly reference the source with the help of RAG. The user experience is as simple as pointing out a file or directory for the chatbot.

For example, standard large language models have the knowledge to best practices for content strategy, marketing techniques, and fundamental insights into specific industries or customer segments. However, if it is connected with marketing materials for product launches via RAG, large language models will be able to analyze content and help plan tailored strategies.

RAG works with any large language model, as long as the app natively supports RAG. NVIDIA Chat with RTX is a demo example of connecting large language models to personal data sets through RAG. It runs natively on systems equipped with GeForce RTX GPUs or NVIDIA RTX professional GPUs.

To learn more about RAG and how it differs from large language model tuning, scan the QR code below to read the technical blog post "RAG Basics: Questions and Answers on Retrieval Augmentation Generation."

Experience the speed and privacy of Chat with RTX

Chat with RTX is a personalized chatbot demo app that runs natively and is not only easy to use, but also free to download. It's built on RAG and supports TensorRT-LLM and RTX-accelerated Chat with RTX with RTX support for several open-source large language models, including Llama 2 and Mistral. Support for Google's Gemma model will be available in a future update.

Chat with RTX 可通过 RAG

Connect users with their personal data.

Users can easily connect local files on their PC to supported large language models by simply putting files into a folder and indicating the location of that folder for Chat With RTX. After that, Chat with RTX can quickly answer all kinds of queries and provide relevant responses.

Chat with RTX runs on Windows systems on GeForce RTX PCs and NVIDIA RTX workstations, so it's fast while the user's data is kept locally. Chat with RTX doesn't rely on cloud-based services, so users can work on sensitive data on their local PCs, so there's no need to share data with third parties or connect to the internet.

Decoding AI: Demystifying the "brain" of chatbots - large language models

Read on