laitimes

Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

author:The mountain monster Atu

Meta has once again become the leader in the open-source model world with the launch of its latest-generation open-source large language model, Llama 3. I've already covered Llama 3 in my previous article, but today I'm going to talk about how to deploy Llama 3 on a home computer without an internet connection to run your own private AI model.

Meta's next-generation open-source model, Llama3, is making a stunning debut

Llama 3 is currently available in 8B and 70B size parameters, pre-trained or fine-tuned in a variety of configurations. Pre-training is a basic model, which is convenient for developers to develop their own fine-tuned models. Our local deployment mainly downloads the fine-tuning model of 8B parameters, as long as the graphics card above 8G is fine, and the 70B parameters generally require professional graphics cards, such as A100, 40G graphics cards.

Here's a detailed guide on how to run the model locally.

Option 1: Python deployment

This is suitable for direct installation of Llama 3 in a Conda environment where Python is already deployed.

Download the Llama 3 archive at github.com/meta-llama/llama3

Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

Unzip to the C drive

Then go to the llama.meta.com/llama-downloads to apply for a license and the model download link. Fill in your name, date of birth, nationality, company, and email address. You don't need a real name, just fill it in, check Meta Llama 3, click Continue below, confirm the permission, and a download link will be returned, which cannot be downloaded directly, only from the terminal command line.

Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

Go to the \llama3-main folder you just decompressed, enter "CMD" in the folder bar, and enter "pip install -e" on the command line to install the environment. If an error is reported, you may need to add the address of your llama3 folder after pip install -e, for example, "pip install -e:\llama3-main".

After it automatically downloads and installs the environment, right-click and select "bash here" to enter the conda command line, enter "bash download.sh", if the command line returns an error, "Enter the URL from email:" appears. After pasting the download link you just obtained, it will let you select the model you want to download, there are 4 models, they are 8B (pre-trained model), 8B-instruct (fine-tuning model), 70B, 70B-instruct, we choose the second one, enter "8B-instruct" and press enter, the model will be automatically downloaded.

(There may be an error message missing the weget command, download the weget.exe file in eternallybored.org/misc/wget/ and put it in C:\Windows\System32)

Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

Once the model is downloaded, run the sample steps on the command line and run the following command:

torcharun --nproc_per_node1example_chat_complex.pie\

--ckpt_dir Meta-Llama-3-8B-Instruct/\

--tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model\

--max_seq_len 512 --max_batch_size 6

To create a conversation script, create the following chat.py script in the root directory

# Copyright (c) Meta Platforms, Inc. and affiliates.

# This software may be used and distributed in accordance with the terms of the Llama 3 Community License Agreement.

from typing import List, Optional

import fire

from llama import Dialog, Llama

def main(

ckpt_dir: str,

tokenizer_path: str,

temperature: float = 0.6,

top_p: float = 0.9,

max_seq_len: int = 512,

max_batch_size: int = 4,

max_gen_len: Optional[int] = None,

):

"""

Examples to run with the models finetuned for chat. Prompts correspond of chat

turns between the user and assistant with the final one always being the user.

An optional system prompt at the beginning to control how the model should respond

is also supported.

The context window of llama3 models is 8192 tokens, so `max_seq_len` needs to be <= 8192.

`max_gen_len` is optional because finetuned models are able to stop generations naturally.

"""

generator = Llama.build(

ckpt_dir=ckpt_dir,

tokenizer_path=tokenizer_path,

max_seq_len=max_seq_len,

max_batch_size=max_batch_size,

)

# Modify the dialogs list to only include user inputs

dialogs: List[Dialog]=[

[{"role": "user", "content": ""}], # Initialize with an empty user input

]

# Start the conversation loop

while True:

# Get user input

user_input = input("You: ")

# Exit loop if user inputs 'exit'

if user_input.lower() == 'exit':

break

# Append user input to the dialogs list

dialogs[0][0]["content"] = user_input

# Use the generator to get model response

result = generator.chat_completion(

dialogs,

max_gen_len=max_gen_len,

temperature=temperature,

top_p=top_p,

)[0]

# Print model response

print(f"Model: {result['generation']['content']}")

if __name__ == "__main__":

fire. Fire(main)

Run the following command to start a conversation:

Torcharun --nproc_per_node 1 chat.pie

--ckpt_dir Meta-Llama-3-8B-Instruct/

--tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model

--max_seq_len 512

--max_batch_size 6

Option 2: Use Ollama + Open Webui

Supported platforms: MacOS, Ubuntu, Windows (preview)

Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

Ollama is currently one of the most commonly used methods for deploying AI large language models on-premises. Just download the app here, tap on the file to install automatically, and run the following command from the command line.

ollama run llama3

This will download the Llama 3 8B instruction (fine-tuning) model.

You can choose to provide tags, but if you don't, it will default to the latest. This label is used to identify a specific version. For example, you can run:

ollama run llama3:70b-text

ollama run llama3:70b-instruct

After downloading the model, run ollama run llama3 on the command line again, and you can talk to llama3 directly, of course, this way is very unintuitive, can you save the history of the conversation.

Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

If you want a conversation window similar to OpenAI's ChatGPT, you'll need to install the Open WebUI project. To install Open Webui, you need to install docker first.

Download docker on github.com/open-webui/open-webui or www.docker.com/products/docker-desktop/. After downloading and installing, run the following command on the command line to install Open Webui:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

When you're done, you can open your browser and you can access the Open WebUI through the http://localhost:3000.

Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

Option 3: Use LM Studio

I've already talked about the installation and use of LM Studio.

Use LM Studio to deploy local AI large language models with one click

Need to update to version 0.2.20

Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

Then search for "lmstudio llama 3", download the model and you're ready to use. Note that 8B-instruct has 14 models to choose from, Meta-Llama-3-8B-Instruct.Q5_K_M.gguf is recommended, and the model with a better computer can also choose the model with a maximum of more than 8G.

Option 4: Jan.ai

This is the simplest and most convenient open-source software I have used so far to deploy large AI models locally.

Just download the corresponding version of the software in the Jan.ai, and click on the file to install it automatically.

Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

Then go hug your face and download the LLAMA 3 model. There is no recommended domestic mirror website download hf-mirror.com/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/tree/main for magic network.

The larger the model, the higher the computer configuration required, and you can download it according to your own computer situation. Put the downloaded model in the C:\Users\xxx\jan\models directory (xxx is your computer's username). Then click on Jan to chat directly with the AI.

Option 5: GPT4All

GPT4ALL is also a tool that is very convenient for local deployment of AI models, only the installation file, and then download the model to run, the only drawback is the full English interface.

Just download the model here and you're ready to talk. Or select Settings in the upper right corner, change the model address, and put the previously downloaded model into the settings folder.

Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

Llama is especially fond of using emojis.

Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

The above are the most commonly used methods for deploying AI large language models locally, and novices are recommended to choose the last two, which are simple and fast. I put all the above documents in the network disk, and the friends who need it reply to "llama3" in the background to receive it for free.

Read on