laitimes

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

author:New Zhiyuan

Edit: Run: I'm sleepy

The game industry is really accelerating the embrace of AI technologies such as large language models, and both large manufacturers and independent game producers have begun to rely on LLM technology to create a new AI NPC experience.

In addition to chatbots, how to make large language models complete the productization has always been a worldwide problem.

Because the interpretability of large language models is low and the content is affected by illusions, many highly professional industries may still have a long way to go before they can really use the capabilities of large models.

The game industry has become one of the first industries to fully embrace large models!

Recently, Microsoft announced a partnership with AI startup inworld to develop Xbox tools that will enable game developers to create AI-powered characters, stories, and missions.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

Microsoft has signed a multi-year partnership with inworld that will include the co-development of an "AI design co-pilot" system that Xbox developers can use to create detailed scripts, dialogue trees, quest lines, and more.

According to Microsoft's official statement, nworld's expertise in character development using generative AI models, Microsoft's cutting-edge cloud-based AI solutions (including Azure OpenAI Service), Microsoft Research's technical insights into future games, and Team Xbox's experience in game production and publishing provide responsible creator tools for all developers.

Together, the goal is to provide an easy-to-use, multi-platform AI toolset to assist and empower creators with conversations, stories, and task design. The toolset will include:

AI game design Copilot helps game designers explore more ideas, turning prompts into detailed scripts, dialogue trees, quests, and more.

The AI character engine integrated into the game client enables a completely new story and plot through dynamically generated stories, quests, and dialogues for players to experience.

As a startup founded by former Google employees, inworld has provided AI-generated NPC and storyline solutions to many large gaming companies.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

The company has been established for more than two years, raised more than 120 million US dollars, and has completed cooperation with NetEase, Disney and other major manufacturers in the game and animation industry.

Cygnus Enterprises, launched by NetEase's studio, takes advantage of the AI NPCs launched by inworld to create AI companions that not only provide players with an interesting character to talk to when collecting resources, but also instruct players to collect resources for players when they ask their AI companions to collect resources through voice commands.

Serial entrepreneurs who were acquired by Google start their own businesses again

Inworld创始人Ilya Gelfenbeyn,Michael Ermolenko之前创立了API.AI。

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

API. AI was acquired by Google, renamed Dialogflow, and integrated into Google Cloud, becoming the most popular conversational AI platform on the market.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

Then in 2021, they started their own business again and founded Inworld.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

At the end of 2022, a tool that uses GPT-3 to generate dialogue content for game NPCs was launched, which became the official publicity case of OpenAI.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

Subsequently, Inworld launched a product called "Character Engine", which specifically helps game developers create personalized AI NPCs.

AI NPCs can learn to Xi and adapt, use emotional intelligence to handle relationships, have memory and recall abilities, and are able to autonomously initiate goals, execute actions, and follow their own motivations.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

Use defined triggers, intent recognition, and motivation to trigger a character's response to player behavior and drive interactions in the game. The Character Engine's objectives and actions feature allows the user to drive NPC behavior, responding to player input in a dynamic and customized way.

Players believe that, just like the game's animation engine, the character engine could change the way the experience of future triple-A titles is experienced.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

Characters operate with human-like memory functions by retrieving information from flash memory and long-term memory, creating an engaging experience that players return to.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

Inworld supports multimodal character expression by orchestrating more than 30 machine-based Xi models designed to mimic the full spectrum of human communication, including voice changes and non-verbal cues such as intonation, facial expressions, and body language.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

Characters in the world are able to express emotions based on their interactions with the user. Emotions can be mapped onto animations, goals, and triggers, presenting NPCs with rich and realistic personalities.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

The Character Engine can use built-in voice settings to minimize latency, and configure the character's gender, age, pitch, and speaking speed. Or, use a third-party service like ElevenLabs to create custom and fork sounds.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

Implement the framework of local AI NPCs on your own computer

Most tools like the "Character Engine", which uses generative AI to provide AI NPC functionality, can be replaced by almost free or open-source technologies.

This gives a lot of independent game makers hope to catch up with the big names in this space.

We've introduced an AI NPC mod for The Elder Scrolls.

Recently, developer Joe Gibbs shared his local framework for building intelligent NPCs through open-source large models on his own computer:

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

Specifically, Joe Gibbs used llama.cpp and Mistral7b to create dialogue content, StyleTTS2 to generate speech, and Unreal Engine 5 to render the scene.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

Project address: https://github.com/joe-gibbs/local-llms-ue5

At first, he integrated the llama .cpp into Unreal as a dynamic link library (DLL), but the process did not go smoothly. So he instead built a solution implemented in Node .js scripts.

As for the voice part, it uses the StyleTTS2 demo Docker container image provided by mrfakename and interacts with it through Gradio's API interface. Ideally, it would be better if I didn't have to rely on Docker containers, but I wasn't able to run the StyleTTS2 model directly on my own computer.

The implementation configuration is as follows:

System: Windows 11

HDD: Samsung 980 PRO M.2 1TB

CPU:iCore i5 12600KF

RAM: 64GB DDR5

GPU:英伟达GeForce RTX 4070 12GB

How it works

In Unreal, Node scripts are executed by calling the FInteractiveProcess class. This script passes in the previous conversation history as a command line argument, and then outputs the NPC's dialogue content sentence by sentence. To improve performance, you can generate and voice the next sentence while playing the current sentence without waiting for the entire dialogue to be generated.

Output a JSON object containing the text required for the subtitle and the location information of the corresponding audio file, which will be parsed into a structure and played.

Performance

Surprisingly good in terms of performance. There is a slight stutter when generating new sentences, but it has little impact. StyleTTS2 requires 14GB of RAM, while the Llama server requires 3GB, so running it requires a lot of memory space.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

As for the frame rate, as the video shows, the system easily maintains a smooth 60 FPS.

It takes about 2-3 seconds to generate a new sentence. Maybe some animations could be played during sentence generation to reduce the discomfort of waiting, but the status quo is pretty good.

When paired with Whisper, you can generate a response that slightly delays the speed at which the player's voice transcript appears on the screen, an experience that users are already familiar with when using services like Siri.

Chin-hsien

Despite its faster speed, the main drawback of the Mistral model compared to GPT-3.5 is its poor coherence.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

It's easy to go off-topic, and there will still be hallucinations problems. For example, in one test, it directed the player to a village a few miles away, even though the player was actually in that village.

Similarly, it can be seen in the video that it knows that the player's name is John, which is not actually mentioned in the dialogue, and it also mentions Angers, a place name that would not appear until hundreds of years later.

In addition, it lacks realism about what can and cannot happen in the game world. For example, the task of training villagers shown in the demo video is actually impossible to complete because there are no mechanics in the game to support it.

StyleTTS2's speech synthesis isn't natural either, and it's still a little mechanical. It also pronounces words that it is not familiar with, or will pronounce them incorrectly depending on the context.

Future Directions for Improvement

At the moment, this is just a preliminary framework for conceptual experimentation, and there are many areas that can be optimized if the framework is to run well locally.

- First, try integrating llama.cpp into the Unreal engine as a dynamic link library (DLL). This avoids the use of Node .js scripts, greatly simplifying the distribution process for your game.

- Mistral can be off-topic at times, so it's best to tweak it for the game's background. For example, there doesn't seem to be a way to completely avoid the anachronistic narrative that is set in the 12th century, and when a character set in the 12th century suddenly explains how to use AWS, it can be a bit of a drama.

- In addition, it is also possible to adjust the algorithm so that emotions can be annotated in the text output. By using StyleTTS2, it is possible to pass in a fragment of a voice clone so that a voice actor can simply have a sentence read aloud with different emotions to generate speech based on that particular sample.

- There are a number of other ways to explore, such as StyleTTS2 being more closely integrated with the Unreal engine. It would be beneficial to get rid of the dependency on a specific Python version and a specific Python package.

Nearly 900 million yuan has been raised in 2 years since its establishment!AI NPC has detonated great changes in the game industry, and major manufacturers such as Microsoft have entered the game

- Integrating Whisper may allow players to naturally talk to NPCs and get answers.

In other aspects of the technology presentation, there are also some indirectly related improvements. For example, NPCs don't move their lips when they are talking. This can be solved using a technique similar to Audio2face, and it won't have much of an impact on game performance.

In addition, large language models (LLMs) can be used to control the body movements of NPCs by returning JSON data containing animated instructions, making the dialogue more dynamic.

Some thoughts on the future

The authors envision a database scheme that updates as the player starts a conversation.

The database will store information about the player, NPCs (including their backstories and objectives), the world, and more as the basis for conversations.

A detailed file is created that records all of the player's quest history (what the NPC knows), the player's equipment, the current weather, the NPC's interactions with other NPCs, and the chat history with the player (including date, time, etc., so that the player can guess that the player is referring to an event in the past), and then performs a natural language query analysis on the file before submitting it to the Large Language Model (LLM).

Ideally, of course, the task is carefully designed by a human being.

At present, it is not possible to let large language models design the complete task on their own.

This can lead to repetitive tasks like in The Elder Scrolls, and simply repeating "go somewhere and do something" is hard for gamers to accept.

The industry also needs precisely trained large language models that focus on what they are good at and refuse to perform tasks that are beyond their capabilities. It can be given a list of abilities (e.g. giving items, taking items, starting quests, etc.) and having it refuse to perform actions outside the list.

However, the focus of this article is to show how to run an AI NPC character framework in its entirety locally, and it doesn't seem complicated to do that right now.

Resources:

https://www.cnbc.com/2023/12/23/the-first-minds-controlled-by-gen-ai-will-live-inside-video-games.html

https://jgibbs.dev/blogs/local-llm-npcs-in-unreal-engine

Read on