Decode AI performance on RTX AI PCs and workstations

2024-07-09 18:20:00

The era of AI PCs powered by NVIDIA RTX and GeForce RTX technology has arrived. Against this backdrop, a new way to evaluate AI-accelerated performance has emerged, along with a new set of terminology that has become the reference for users when choosing desktops and laptops.

While PC gamers are aware of frames per second (FPS) and similar statistics, measuring AI performance requires new metrics.

TOPS stands out

TOPS, or trillion operations per second, is the primary benchmark. "Trillions" is the key word here: the amount of processing behind generative AI tasks is enormous. You can think of TOPS as a raw performance indicator, similar to the engine's power rating. Naturally, the higher the number, the better.

Compare, for example, to Microsoft's recently announced Windows 11 AI PC, which contains a neural processing unit (NPU) capable of performing at least 40 trillion operations per second. 40 TOPS is more than enough hashrate for some lightweight AI-assisted tasks, such as asking a local chatbot where yesterday's notes were.

But much generative AI requires much more computing power than that. NVIDIA RTX and GeForce RTX GPUs deliver exceptional performance for all generative tasks, with the GeForce RTX 4090 GPU delivering up to 1,177 TOPS. That's the computing power needed to handle tasks like AI-assisted digital content creation (DCC), AI super-resolution for PC games, generating images from text or video, interacting with native large language models (LLMs), and more.

Performance is measured in tokens

TOPS is just a foundational metric. The performance of an LLM is measured by the number of tokens generated by the model.

A token is the output of an LLM, which can be a word in a sentence or even a smaller fragment like punctuation or a space. The performance of AI-accelerated tasks can be measured in "tokens per second".

Another important factor is batch size, which is the number of inputs that can be processed at the same time in a single inference. As large language models (LLMs) are at the heart of many modern AI systems, the ability to process multiple inputs, such as from or across multiple applications, will be a key differentiator. While a larger batch size can improve the performance of concurrent inputs, it also requires more memory, especially when running larger models.

RTX GPU 非常适合 LLM,因为它们拥有大量专用的显存(VRAM)、Tensor Core 和 TensorRT-LLM 软件。

GeForce RTX GPUs deliver up to 24GB of high-speed VRAM, while NVIDIA RTX GPUs offer up to 48GB of high-speed VRAM to support larger models and larger batch sizes. RTX GPUs can also take advantage of Tensor Cores, a purpose-built AI accelerator that dramatically accelerates compute-intensive computations in deep learning and generative AI models. Applications can easily achieve ultra-high performance with NVIDIA TensorRT software development kits (SDKs). The kit unlocks ultra-high-performance generative AI on more than 100 million Windows PCs and workstations powered by RTX GPUs.

Combined with large video memory, dedicated AI accelerators, and optimized software, RTX GPUs get a huge boost in throughput, especially as the batch size increases.

Text-to-image production is faster than ever

Measuring image generation speed is another way to evaluate performance. One of the most straightforward ways to do this is to use Stable Diffusion, a popular image-based AI model that allows users to easily transform text descriptions into complex visuals.

With Stable Diffusion, users can quickly get the image they want by entering keywords, and when running AI models on RTX GPUs, they can generate the desired results faster than CPU or NPU.

The performance is even better when using the TensorRT extension in the popular Automatic1111 interface. With SDXL models, RTX users can generate images up to 2x faster with prompts, greatly simplifying the Stable Diffusion workflow.

Another popular Stable Diffusion user interface, ComfyUI, also supported TensorRT acceleration last week. RTX users can now speed up to 60% faster Bunsen graphs. RTX users can also use Stable Video Diffusion to convert these images into video, which can even be up to 70% faster with TensorRT.

全新的 UL Procyon AI 图像生成基准测试现已支持 TensorRT 加速。与最快的非 TensorRT 加速状态相比,TensorRT 加速可在 GeForce RTX 4080 SUPER GPU 上带来 50% 的速度提升。

TensorRT Acceleration for Stable Diffusion 3, Stability AI's highly anticipated new text-to-image model, was recently announced. In addition, the new TensorRT-Model Optimizer boosts performance even further. It delivers significant speed gains while reducing memory consumption compared to non-TensorRT accelerated states.

Of course, seeing is believing. The real test comes from iterating on the real-life scenario of the prompt word. On RTX GPUs, users can optimize images significantly faster by improving prompts, with each iteration taking only a few seconds. On the Macbook Pro M3 Max, the same iteration takes minutes. Plus, if running locally on an RTX-powered PC or workstation, users can enjoy the benefits of speed and security at the same time, and keep everything private.

The test results are released, and the related technology is open source

But don't just take our word for it. The team of AI researchers and engineers behind the open-source Jan.ai recently integrated TensorRT-LLM into their native chatbot application and then tested these optimizations for themselves.

The researchers tested the real-world performance of TensorRT-LLMs on a variety of GPUs and CPUs used by the community, using the open-source llama.cpp inference engine as a control. They found that TensorRT was "30-70% faster than llama.cpp on the same hardware" and was more efficient when doing continuous processing. The team also provides a testing methodology, inviting others to test the performance of generative AI in person.

Whether it's a game or generative AI, speed is the key to success. TOPS, tokens per second, and batch size are all taken into account when determining a performance champion.

Decode AI performance on RTX AI PCs and workstations

Read on

Huashida high-end plastic machinery expert workstation settled in the expert introduction

You come to mention that I will do it丨Changyang Town's "I will do" workstation actively solves residents' urgency, hardship and sorrow, and builds a "heart-to-heart bridge" between the party and the masses

Disability assistance services are upgraded again! "Nanyue Disabled · Lion Love Action Workstation" was established in Huangjiang

You come to mention that I will do it丨Changyang Town's "I will do" workstation to strengthen active governance and do a good job in people's livelihood "key small things"

Changyang Town's "I'll Do It" Workstation Strengthens Active Governance and Handles "Key Small Things" in People's Livelihood

Reply to the proposal on the restoration (establishment) of township forestry workstations in key forestry counties in the province

Administrative dispute prevention and mediation workstation unveiled!

Win-win performance and cost: Dell AI workstations are measured to support the local deployment and training of large enterprise models

The city's first data rights and interests circuit judge workstation jointly established by the two courts settled in Jingxi Zhigu

Approaching the masses and walking into the hearts of the masses

The unveiling ceremony of the "Runxin · Sailing" Youth Legal Service Workstation in Xiuhui Street was successfully held

The "I'll do it" workstation in Changyang Town starts from small things to contribute to the happiness of residents

You come to mention that I will do it丨Changyang Town's "I will do" workstation starts from small things to contribute to the happiness of residents

The liaison workstation of the committee members of Liuchang Street carried out the activities of the committee members to perform their duties at the station and served the intensive processing enterprises of agricultural products

Notice and Announcement丨Jiancaoping District is openly recruiting 21 staff members of community labor and social security workstations! Registration starts on October 21st!

Look at the masses in "Zhe"! The comprehensive management sub-center (workstation) at the "doorstep" solves problems for you

Yang Yuxing went to Luojia Town to supervise grain production and procurement on the spot

The official report that the township chief was killed while working in the village: he was stabbed to stop the crime, and was determined to have died in the line of duty according to the procedure

The official report that the township chief was killed while working in the village: it has been determined that he died in the line of duty according to the procedure!

The Yanqi County Party Committee held a meeting of the "two new" working committee (expansion) and the "two enterprises and three new" party building work promotion meeting

WeChat launched a new service, "Find Jobs Nearby", which reflects Tencent's brand philosophy

Forging an advanced organization that closely follows the party in the forefront of the times - a review of the organizational work of the Communist Youth League since the 19th National Congress of the Communist Youth League

Five forestry and grass science and technology workers told about strengthening scientific and technological research and protecting lucid waters and lush mountains

Singapore's founding Prime Minister Lee Kuan Yew's lesser-known work experience

Fat Donglai "Ni Ni" resigned, and self-detonated that he suffered from neurasthenia due to internal friction at work, which caused discussion among netizens

Perfect World sold the game studio for 250 million yuan, and the developer of "Monopoly GO" took over

"Negative man" to infatuated lover, 61-year-old childless, no daughter, did not become a grandmother and became a mother's regret Under the bright stars of the entertainment industry, being characterized by the audience because of a drama is a blessing for actors

It turns out that he is Cai Ming's son! concealed for 37 years and not made public, it is Cai Ming's pride and the biggest heart disease In the bright galaxy of the entertainment industry, Cai Ming is like a unique and dazzling star, especially in

AI dominates mental work in 4 years, and humans move bricks? Musk predicts that 30 billion robots will take over the world

How is the ITTF Task Force progressing? Problems are on the way to being solved, but there are more and more problems

A boring day for Emperor Qianlong! Get up at 3 o'clock, start working and studying, and spoil the concubine to sleep at 7 o'clock

After Fu Mingxia and Lao Liang got married, Fu's parents were "depressed" for a long time, and they both quit their jobs