Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

Meta has once again become the leader in the open-source model world with the launch of its latest-generation open-source large language model, Llama 3. I've already covered Llama 3 in my previous article, but today I'm going to talk about how to deploy Llama 3 on a home computer without an internet connection to run your own private AI model.

Meta's next-generation open-source model, Llama3, is making a stunning debut

Llama 3 is currently available in 8B and 70B size parameters, pre-trained or fine-tuned in a variety of configurations. Pre-training is a basic model, which is convenient for developers to develop their own fine-tuned models. Our local deployment mainly downloads the fine-tuning model of 8B parameters, as long as the graphics card above 8G is fine, and the 70B parameters generally require professional graphics cards, such as A100, 40G graphics cards.

Here's a detailed guide on how to run the model locally.

Option 1: Python deployment

This is suitable for direct installation of Llama 3 in a Conda environment where Python is already deployed.

Download the Llama 3 archive at github.com/meta-llama/llama3

Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

Unzip to the C drive

Then go to the llama.meta.com/llama-downloads to apply for a license and the model download link. Fill in your name, date of birth, nationality, company, and email address. You don't need a real name, just fill it in, check Meta Llama 3, click Continue below, confirm the permission, and a download link will be returned, which cannot be downloaded directly, only from the terminal command line.

Go to the \llama3-main folder you just decompressed, enter "CMD" in the folder bar, and enter "pip install -e" on the command line to install the environment. If an error is reported, you may need to add the address of your llama3 folder after pip install -e, for example, "pip install -e:\llama3-main".

After it automatically downloads and installs the environment, right-click and select "bash here" to enter the conda command line, enter "bash download.sh", if the command line returns an error, "Enter the URL from email:" appears. After pasting the download link you just obtained, it will let you select the model you want to download, there are 4 models, they are 8B (pre-trained model), 8B-instruct (fine-tuning model), 70B, 70B-instruct, we choose the second one, enter "8B-instruct" and press enter, the model will be automatically downloaded.

(There may be an error message missing the weget command, download the weget.exe file in eternallybored.org/misc/wget/ and put it in C:\Windows\System32)

Once the model is downloaded, run the sample steps on the command line and run the following command:

torcharun --nproc_per_node1example_chat_complex.pie\

--ckpt_dir Meta-Llama-3-8B-Instruct/\

--tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model\

--max_seq_len 512 --max_batch_size 6

To create a conversation script, create the following chat.py script in the root directory

# Copyright (c) Meta Platforms, Inc. and affiliates.

# This software may be used and distributed in accordance with the terms of the Llama 3 Community License Agreement.

from typing import List, Optional

import fire

from llama import Dialog, Llama

def main(

ckpt_dir: str,

tokenizer_path: str,

temperature: float = 0.6,

top_p: float = 0.9,

max_seq_len: int = 512,

max_batch_size: int = 4,

max_gen_len: Optional[int] = None,

"""

Examples to run with the models finetuned for chat. Prompts correspond of chat

turns between the user and assistant with the final one always being the user.

An optional system prompt at the beginning to control how the model should respond

is also supported.

The context window of llama3 models is 8192 tokens, so `max_seq_len` needs to be <= 8192.

`max_gen_len` is optional because finetuned models are able to stop generations naturally.

"""

generator = Llama.build(

ckpt_dir=ckpt_dir,

tokenizer_path=tokenizer_path,

max_seq_len=max_seq_len,

max_batch_size=max_batch_size,

)

# Modify the dialogs list to only include user inputs

dialogs: List[Dialog]=[

[{"role": "user", "content": ""}], # Initialize with an empty user input

]

# Start the conversation loop

while True:

# Get user input

user_input = input("You: ")

# Exit loop if user inputs 'exit'

if user_input.lower() == 'exit':

break

# Append user input to the dialogs list

dialogs[0][0]["content"] = user_input

# Use the generator to get model response

result = generator.chat_completion(

dialogs,

max_gen_len=max_gen_len,

temperature=temperature,

top_p=top_p,

)[0]

# Print model response

print(f"Model: {result['generation']['content']}")

if __name__ == "__main__":

fire. Fire(main)

Run the following command to start a conversation:

Torcharun --nproc_per_node 1 chat.pie

--ckpt_dir Meta-Llama-3-8B-Instruct/

--tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model

--max_seq_len 512

--max_batch_size 6

Option 2: Use Ollama + Open Webui

Supported platforms: MacOS, Ubuntu, Windows (preview)

Ollama is currently one of the most commonly used methods for deploying AI large language models on-premises. Just download the app here, tap on the file to install automatically, and run the following command from the command line.

ollama run llama3

This will download the Llama 3 8B instruction (fine-tuning) model.

You can choose to provide tags, but if you don't, it will default to the latest. This label is used to identify a specific version. For example, you can run:

ollama run llama3:70b-text

ollama run llama3:70b-instruct

After downloading the model, run ollama run llama3 on the command line again, and you can talk to llama3 directly, of course, this way is very unintuitive, can you save the history of the conversation.

If you want a conversation window similar to OpenAI's ChatGPT, you'll need to install the Open WebUI project. To install Open Webui, you need to install docker first.

Download docker on github.com/open-webui/open-webui or www.docker.com/products/docker-desktop/. After downloading and installing, run the following command on the command line to install Open Webui:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

When you're done, you can open your browser and you can access the Open WebUI through the http://localhost:3000.

Option 3: Use LM Studio

I've already talked about the installation and use of LM Studio.

Use LM Studio to deploy local AI large language models with one click

Need to update to version 0.2.20

Then search for "lmstudio llama 3", download the model and you're ready to use. Note that 8B-instruct has 14 models to choose from, Meta-Llama-3-8B-Instruct.Q5_K_M.gguf is recommended, and the model with a better computer can also choose the model with a maximum of more than 8G.

Option 4: Jan.ai

This is the simplest and most convenient open-source software I have used so far to deploy large AI models locally.

Just download the corresponding version of the software in the Jan.ai, and click on the file to install it automatically.

Then go hug your face and download the LLAMA 3 model. There is no recommended domestic mirror website download hf-mirror.com/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/tree/main for magic network.

The larger the model, the higher the computer configuration required, and you can download it according to your own computer situation. Put the downloaded model in the C:\Users\xxx\jan\models directory (xxx is your computer's username). Then click on Jan to chat directly with the AI.

Option 5: GPT4All

GPT4ALL is also a tool that is very convenient for local deployment of AI models, only the installation file, and then download the model to run, the only drawback is the full English interface.

Just download the model here and you're ready to talk. Or select Settings in the upper right corner, change the model address, and put the previously downloaded model into the settings folder.

Llama is especially fond of using emojis.

The above are the most commonly used methods for deploying AI large language models locally, and novices are recommended to choose the last two, which are simple and fast. I put all the above documents in the network disk, and the friends who need it reply to "llama3" in the background to receive it for free.

Run Llama3, the most powerful open-source AI model on your computer, with the most complete method in history

Read on

Is a portable screen an IQ tax? Briefly talk to you about the recent mini host to start a portable screen, with wireless screen projection function, support gravity sensing horizontal and vertical screen switching, very easy to use, flat

From June 4th to 7th, 2024, Computex2024 will be opened! Xingu will take you to appreciate the charm of new mechanical and electrical products! As one of China's three major mechanical and electrical brands, Xingu is in Taiwan

How do I take a screenshot of a computer screen? Three Easy Ways to Take a Screenshot from Your Computer!

I want to buy a tablet in June, and at present, these 4 models are very good, and the reputation is good, and it is worth starting

iQOO released 2 tablets, high performance + large screen, is it worth buying?

#一体机电脑还能玩出什么新花样? #

Snapdragon X Elite with the most powerful AI performance! The ASUS Vivobook Pro 15 2024 Taipei Computex was besieged

The last round of mathematics in the high school entrance examination to fill in the gaps: the Hu Bugui model and its extended application

The last round of mathematics in the high school entrance examination is to fill in the gaps: the model of the melon bean principle and its extended application

The last round of mathematics in the high school entrance examination is missing and filling: the Afch's circle maximum value model and its extended application

The final round of mathematics in the high school entrance examination is to fill in the gaps: the general's drinking horse model and its extended application

The final round of mathematics in the high school entrance examination: the Fermat point model and its extended application

Don't step on the pit after reading it! 618 full-price tablets are recommended in detail

Recommend a powerful Windows disk partitioning software to effectively manage your computer's hard disk space!

Anime Characters HD - Computer Wallpapers

Recommend an open-world object detection model: DINO 1.5

16 college entrance examination records! Use mathematical models to predict Tang Shangjun's 2024 college entrance examination scores!

Today I encountered a strange thing, and I still haunt it! I went to the supermarket today and bought five things, this is a small supermarket, usually customers don't want supermarket receipts, cashiers won't give customers, such as

Podcast Update|First Voter for MiniMax: MiniMax, GenAI Conference, and Big Model Playing Cards

#头条创作挑战赛#当今社会真荒唐夫妻同寝不同床这个样像夫妻吗手机电脑害的祸这样感情会变淡都是手机在作怪即使同床不同被同被

CVPR 2024|Only one language model is needed to generate high-quality 360-degree scenes from image diffusion models

The Digital Transformation Maturity Model and Assessment was released

3 types of children "will be abolished as soon as the test is taken", Dr. Tsinghua's iron triangle model will help you become a master of the exam

绝对新鲜实惠图源：archiminibricks#乐高 #乐高MOC #积木#模型#大人也要玩玩具

A girl of this age wants to be sure to have beautiful clothes, computers, mobile phones and toys, and ask her father to buy a house for her, and she doesn't have to think about it to know who taught her [flash of inspiration] to use all means to get her ex-husband's brother

Development Trend of Large Models: Multimodal, Autonomous Intelligence, Edge Intelligence...

A cost-effective computer for prospective college students! 3599 is in hand, can you buy ASUS Vivobook 14 2024?

The effect is benchmarked against Sora's domestic AI video application, and the large model of Kuaishou video generation can be unveiled

How do I connect my desktop computer to WiFi? 4 Treasure Ways to Collect!

Altman talks about the opportunities, challenges and human self-reflection of AI: China will have a unique large language model

In-depth report on the artificial intelligence industry - after the "first year" - let's look at the commercialization progress of large model applications

Can living cells of the human brain be used as computers? It can live for 100 days, which is 1 million times more energy-efficient than a computer