Intel Arc GPUs deliver excellent performance when running Llama 3

As soon as Meta released the Llama 3 large language model, Intel optimized and verified that the Llama 3 model with 8 billion and 70 billion parameters can run on Intel's AI portfolio. On the client side, the power of Intel Arc graphics makes it easy for developers to run Llama 3 models locally to accelerate generative AI workloads.

In initial testing of the Llama3 model, the Intel Core Ultra H-series processors demonstrated output generation performance faster than the average person's reading speed, thanks to the built-in Intel Arc GPU, which has 8 Xe cores, as well as a DP4a AI accelerator and up to 120GB/s of system memory bandwidth.

Intel Core Ultra processors and Intel Arc graphics were a good match at the first release of the Llama 3 model, demonstrating the efforts of Intel and Meta to work together to develop local AI and deploy millions of devices. Intel's client hardware performance has been greatly improved thanks to a wide range of software frameworks and tools, such as PyTorch and Intel PyTorch Extensions for local development, as well as the OpenVINO toolkit for model deployment and inference.

Run Meta-Lama3-8B-Instruct on an Intel Core Ultra7 with built-in Intel Arc graphics

The next token delay running Llama3 on the Intel Arc A770

The graph above shows how the Intel Arc A770 graphics card performs excellent performance when running the Llama3 model with the PyTorch framework and optimizations for Intel GPUs. In addition, Intel Arc graphics cards also allow developers to run large language models including Mistral-7B-InstructLLM, Phi2, Llama2, and more locally.

The main reason why developers can run multiple models locally based on the same base installation can be attributed to IPEX-LLM, a large language model library for PyTorch. It is primarily based on the Intel PyTorch Extension Pack, which covers the latest large language model optimizations and low-bit data compression (INT4/FP4/INT8/FP8), as well as most of the latest performance optimizations for Intel hardware. IPEX-LLM can dramatically improve performance thanks to Xe-core XMX AI acceleration on Intel discrete graphics cards such as Arc A-series graphics cards, which support Intel Arc A-series graphics on Windows subsystem Linux versions, native Windows environments, and native Linux.

Since all operations and models are based on the native PyTorch framework, developers can easily swap out or use different PyTorch models and input data. Not only can the above models and data run on Intel Arc graphics, but developers can also enjoy the performance gains brought by Intel Arc graphics acceleration.

(8676225)