Zeng Guangjun, AI LAB of NetEase Interactive Entertainment: Use reinforcement learning to assist game design to make games smarter

Reinforcement learning is a hot topic in artificial intelligence research in recent years. Deep reinforcement learning uses neural network modeling to show powerful effects and is used in many fields, and a typical application in games is to fight AI.

How to train the game against the AI? This requires a lot of interaction with the environment, through the rewards of environmental feedback, and constant trial and error learning. Through self-play, AI battles can be trained from scratch. At first, the opponent is a weak AI equivalent to himself, increasing the intensity through combat, while updating the opponent's strategy, so that the training is also constantly playing against stronger opponents, thereby improving their own level.

At today's 2022 N.GAME NetEase Game Developer Summit, Zeng Guangjun, technical director of NetEase Interactive Entertainment AI Lab, not only introduced the landing of battle-type AI in interactive entertainment games, but also shared how to apply reinforcement learning to auxiliary game design to improve the efficiency of game design testing and verification.

Zeng Guangjun pointed out that manual verification will have deviations caused by operating habits, deviations caused by level differences and efficiency problems, and the use of AI can assist game design verification more efficiently and with low deviations. For example, you can train a strong AI to evaluate whether the current design of the game meets the requirements, or you can use the AI to heuristically search for the combination of matches in the game to find out whether there is a balance problem caused by improper configuration of attributes.

In addition, reinforcement learning and adversarial learning can also be used to generate game levels, even in areas such as autonomous driving and nuclear fusion.

The following is the full text of Zeng Guangjun's speech, technical director of NetEase Interactive Entertainment AI Lab, slightly edited:

Founded at the end of 2017, the Interactive Entertainment AI Lab is mainly to implement 3D, CV, RL, NLP and voice technologies into the game and solve the pain points of the game. Today we will talk about what reinforcement learning is. What can reinforcement learning do in the game? Finally, let's look at what kind of applications reinforcement learning can be applied in other fields in addition to games.

What is reinforcement learning

If we were to train a puppy to make such a move as to sit down, we might make a command to sit down. If the puppy does this right, we reward it with a food; if it doesn't do it, we don't reward it. Through this feedback, many iterations, our puppy will eventually know to make the act of sitting down to get rewarded.

Similarly, when our game is plugged in, we will send the current state information to the AI, and the AI will make actions according to some of the current state. Whether there will be rewards and punishments for this action will be rewarded by the game feedback, and the AI will adjust its strategy accordingly after receiving feedback.

Through multiple iterations, it will know what kind of action it should do at a certain moment to get the maximum return. Because we need a lot of interaction with the environment, intensive internships usually take a lot of time for machines to explore, and efforts to reduce the exploration space and improve the utilization of samples to improve the training effect of reinforcement learning is an important direction.

The game is very simple to access reinforcement learning, as long as two interfaces are implemented, one is the reset interface and the other is the step interface. The reset interface returns the initial state from inside the game; the step interface obtains the corresponding action from the AI. Then, the game environment returns the information of the next state and some reward information of the return, and finally packages the game into a dynamic library or docker, which can be handed over to reinforcement learning AI training.

Against the battle AI, it is important to reduce the exploration space and improve the sample utilization

In fact, the most important application of reinforcement learning is mainly combat AI. In the interactive entertainment game, we landed the NPC battle AI. First take the Tianxia 3 Mountains and Seas Painting as an example, the Tianxia 3 Mountains and Seas Painting is a card game of human-machine battle, players can choose multiple difficulties and fight with robots.

Planning needs to complete the human-machine battle AI in a short period of time, which must adapt to a variety of difficulties, and the difficulty should be dynamically adjusted to adapt to the player's level.

If we use the behavior tree to do the AI of the mountain and sea painting, it is necessary to plan to use a lot of time to enumerate the information of each state, and then do the corresponding actions according to the nodes of the state, if we want to do the grading difficulty, it is more necessary to divide it in detail, so that the time spent will be even greater, and every update of the cards after the launch needs to be planned to spend a lot of time to modify the behavior tree to adapt to the new cards.

In fact, this wastes a lot of manpower and time. More critically, the AI intensity of the behavior tree is usually not up to the level of the usual player. If we use reinforcement learning to do it, we can quickly generate AI, especially when the new card is updated into the new game environment, reinforcement learning can be very quick to adapt to a new environment, just need to re-refine it.

Reinforcement learning training itself is the act of doing the robot's self-play. In such a process, a large number of AI of different difficulties are naturally generated in batches. These AI can be seamlessly migrated to meet the player's needs for difficulty choices. The most critical point is to use reinforcement learning to do AI, and its final strength can far exceed the level of the player.

Our reinforcement learning training is similar to general reinforcement learning, which is mainly composed of samplers and trainers. The sampler executes the AI's decisions on the CPU, generating a large number of samples by interacting with the game environment. These samples can be sent to the trainer on the GPU for optimization, and the optimized model is then placed in the model pool.

The model pool allows the AI to choose the opponent to fight, and by iterating on the strength of the model pool, the currently trained AI will gradually become stronger. Among them, the model of the model pool can also be used as an AI model of different difficulties for players to choose. The difficulty of this AI is mainly in the action space, just mentioned that training reinforcement learning is actually a trial and error process, if we can choose too many actions, we need to find the right action it takes a long time.

For example, if I want to play an AACCCD, it has several cards, and we may have dozens of choices for the first card, and there will also be dozens of choices for the second card. With so many card options combined, the tree structure will make our action space grow exponentially. Our solution is to turn one-step decisions into sequential decisions.

That is to say, the state we get from the game environment, the AI decides what the first card should be played, and then the first card and the state of the environment are entered into the AI, and then make a decision. The second card is then exported in the same pattern to the next round of decision-making.

Finally, we can output an ongoing decision, and AABCCD is unified back into our game environment. In this way, one-step decisions can be turned into multi-step decisions, and the game space can be reduced from exponential to a constant level.

Let's compare reinforcement learning with AI for behavior trees. After adding new cards, reinforcement learning will obviously take much less time than the behavior tree. It has only 3-5 levels compared to the behavior tree, and has more than 100 difficulty levels. In addition, it may also have a relatively large difficulty jump. Reinforcement learning, on the other hand, can do much higher than the player's level, which is something that the behavior tree can't do.

We also challenged the harder games, participating in the MineRL Diamond Competition organized by NeurIPS, which has been held for the third time, and each time attracted a large number of strong teams from industry and academia. The purpose of this competition is to start with an axe to collect wood resources in the MineCraft environment, then use the wood resources to do the next step, and finally, to dig for diamonds. Since this competition was held, basically no team has been able to dig diamonds in this environment. There are many scenarios for the game, and most teams choose to train based on player data, such as imitation learning based on player data, or reinforcement learning on imitation learning groups.

However, the official data is actually not much, the level of players is also uneven, and there are many invalid operations. We've also tried training using the official dataset, and it doesn't actually work well. So can we just use reinforcement learning to train from scratch? Yes, but we have a few difficulties to address. The output of the environment is mainly image information, because of the image information, it is a 3x64x64 picture, and its information dimension is very large. It is actually very difficult for AI to traverse data in such a large space, so we have adopted a CNN network to reduce its complexity as much as possible and propose some of its key features.

In addition, for this competition requires the AI to have long-term planning ability. For example, it needs to start from the production of wood, produce enough wood to do the pickaxe, use the pickaxe to dig the stone better, dig the stone and make the stone to dig the iron, so that the operation of the ring buckle can have the opportunity to dig the diamond, which requires the AI to know what its strategy is going to do at every moment, and what it needs to do next. For such a long time, allowing AI to blindly explore is also a huge challenge to use reinforcement learning for direct training. The main thing we did was to reduce its exploration space.

The first is the action coding, we will reduce the action to only 20 actions, and according to the current situation to block some unwanted actions. In this way, in fact, our AI can choose very few actions each time, which can compress the space for exploration. Frame skipping is also a key point.

By skipping frames, we can compress the originally long game into a relatively short game process, and the number of decisions that AI needs to make is greatly reduced, and these strategies can further reduce our exploration space. By reducing the space for exploration, we can train better results in a relatively short period of time. What's more, there is a reasonable reward. For example, when we go treasure hunting, we need a treasure map, use the treasure map to guide us where our next goal is, and get the next clue by reaching a goal to make it easier to find the goal.

If you use the original original reward, you can't get the reward after the first resource gets the reward. In this case, our AI may not be able to learn, and we should repeatedly collect enough resources to make tools. If there is a reward for each time, the AI may learn such a behavior as brushing points. So we carefully adjusted one of its actual rewards to better guide the training of AI.

For example, wood, it is more needed at the beginning, and it is not useful later. So, we'll give it a repeat reward at the beginning, and at some point we won't give it again. In addition, the behavior of digging diamonds is actually similar to mining and digging on a stone, which is to continue to dig. We need to encourage it to do this, so digging stone, digging iron ore, we are giving it unlimited rewards. Finally, we train the AI very effectively through such a strategy. It can be seen that with the iteration of AI, its cumulative return and the probability of digging a diamond are rising rapidly. In the end, we also won the championship with the highest score in history, and it was the first time since the tournament that a team had dug a diamond.

Accelerate AI training and power game design

We also explore the use of reinforcement learning to assist in game design. For example, there is a racing game that needs to test the lap speed of the car, the drift performance of the car, the passability of the track, the difficulty of the corners; if it is verified manually, it will take a lot of time. For example, we need a few days for it to familiarize itself with the game, master the game's skills, and raise the game technology to a relatively high level, so that the test is relatively accurate. It's also necessary to validate the combination of the car and the track, and run every combination of the car and the track, which also takes a lot of time.

After planning a redesigned car or track, it is necessary to manually re-adapt to the characteristics of the new racing track, which takes a lot of time to re-adapt. In addition, if you use manual verification, there will be some deviations. Because manual verification cannot guarantee that each test is the highest level in humans, he may need to repeat the test verification. In addition, people's inherent operating habits will also affect his evaluation of new cars, in the old racing track, he is already familiar with, after encountering new cars, he may follow some of the operations of the old cars, so that the characteristics of the new cars may not be able to play out.

One of the focuses of reinforcement learning is to accelerate ai training. Because only by accelerating the training of AI can we better adapt to the planned new configuration and complete a run test in a shorter period of time. Therefore, our main job is to block out some unreasonable actions, reduce its exploration space, and dynamically end the game early.

Training on similar tracks at the same time will also help the AI to learn the connections between them, accelerating its convergence process. AI can also quickly output results on CPU machines. Even with training on CPU machines, we can reduce testing time by 90%. Using AI, you can output multiple cars on the same track at the same time, observing its position, speed, gear, and engine information at each moment, which is convenient for planning and debugging.

Reinforcement learning how to effectively verify balance

In addition to the verification of racing games, we can also do some game balance analysis. For example, in the strategy game, the new hero may be online and there may be players complaining, this hero is too strong, and what hero to match is an insoluble existence. Then the next version of the plan may be modified, weakening it a little. In fact, online players may find that this hero has been cut too much, and the money previously charged may be spent in vain. This will have a great impact on the reputation of the game, and it will also greatly affect the experience of gamers.

We try to solve it with prior analysis, for example, we can evaluate it manually, and we can use programs to simulate all the combined battle results. Of course, we can also use reinforcement learning to explore, with artificial words, as just like just now may appear, artificial omissions. There are some cases that are not taken into account, and after going online, players will find that there are some particularly strong combinations. In this case, its accuracy rate is actually relatively low.

It would be very accurate to program the results of all combinations, but because it has so many combinations, it usually takes months, or even unacceptable, time. If you use reinforcement learning, it is equivalent to taking a compromise between the two times. Through reinforcement learning, we heuristically go to search and don't need to enumerate all combinations. We can find some combinations that may be stronger through heuristic searches. We don't need to go through all the combinations to get a more accurate result, because AI does not have some biases of a priori knowledge that people have, so the results of AI will be more accurate than that of the relative person's experience.

Reinforcement learning training is inseparable from an environment, in this game balance analysis scene, it is very important to build a suitable environment to express this problem. We use the model pool and the game simulator to form this game environment, and each time, the AI gets the current lineup to fight against from the game environment, and then the character it needs is to defeat the combination of this lineup. The output moves are returned to the battle simulator for a simulated battle, and finally the results are returned to our AI.

AI through such feedback can get whether this lineup is reasonable, after multiple rounds of iteration AI will learn how to match the lineup to defeat the opposite combination, and such a strong combination will gradually join our lineup pool for elimination, the poor lineup is eliminated, leaving a strong lineup. Through iteration, the lineup pool can leave a large number of potentially too strong hero combinations, we put such a process into a self-service test platform, planning only need to upload the updated game attribute files, and then click to run can directly run out of the desired results. Includes the results of each lineup's actual battle. The actual strength of each hero compared to other heroes, as well as the appearance rate of each hero in the lineup, can verify whether the strength of this hero meets the expectations of the planning pre-design.

Reinforcement learning has a lot to offer in all areas

As mentioned earlier, a lot of reinforcement learning has landed in interactive entertainment games, and we can also observe reinforcement learning, not only in the game, we can also see that in foreign countries, there are some companies that use reinforcement learning and counter-learning to do game level generation. Both autonomous driving and robot control use reinforcement learning techniques.

More recently, Deepmind has also proposed reinforcement learning to control nuclear fusion reactors. I believe that such a technology could facilitate the application of eventual nuclear fusion.

"Moved a small bench to class"

N.GAME is an annual industry exchange event organized by NetEase Interactive Entertainment Learning and Development, which has been successfully held for seven sessions so far. The theme of this year's year is "The Future is Now", inviting 20 heavy guests at home and abroad, university scholars to gather together to share industry research and development experience, cutting-edge research results and future development trends.

Zeng Guangjun, AI LAB of NetEase Interactive Entertainment: Use reinforcement learning to assist game design to make games smarter

Read on

Big bulls have left, do Chinese companies really need AI Lab?

"Reborn Heroes of the Three Kingdoms" will chase good articles in the near future, and the more you look at it, the more you will go up

Soul Master Duel: All Wounded Soul Divisions are classified and explained one by one! Increased wounds of iron strikes, output of running water

Cai Wenji dared to pretend in this way, even if the enemy issued a sanction nightmare gift package, she was not afraid and did not come to learn

Decisive battle Heian Kyo Shot Fu combination inventory, which pair is the real god combination?

Tang San's defeat of the Descending Demon Douluo caused controversy, and Tang San's strength without Ning Rongrong's assistance could not be called top-notch

【Martial General】What is the difference between the full red/white board: how many levels of the auxiliary will be the peak? Add a bit of depth profiling

Korean media rating MSI's strongest 5 assists: Keria was named the world's first assistant!

Glory of the King gradually eliminated low-tech players, and became more and more unfriendly to female players, netizens: play to the end

Original God Xin Yan 丨 Character Raiders, Fire Physics Assist!

How much do you know about the fifth position of the Heroic Soul Blade that only the Great God understands?

"Dream New Immortal" Level 100 Equipment Selection - Auxiliary Chapter

NetEase Blizzard "divorce", millions of players are difficult to find a home

The largest suspension in the history of games, and 7 game national servers such as "World of Warcraft" are about to be discontinued

Behind Blizzard's departure is the 14 years of love that countless players have been let down

"Jia Junpeng, your mother called you to go home for dinner" 14 years ago, this sentence set off "youth memories" again

After 25 years of cultivation, Blizzard left only "Diablo: Immortal" in China

NetEase opened the Blizzard game refund channel, more than 700,000 people applied! The archiving puzzle is unsolved

A million people lined up for a refund: Blizzard Games continued to cease after the game was suspended

Console games are getting worse? What are the three major host companies doing?

The game industry after the version number is restored: big factories regain their trump cards, and small projects fall before dawn

NetEase's net income in the fourth quarter of 2022 was 25.4 billion yuan, a year-on-year increase of 4%

NetEase Telephone Conference Transcript: "Believe in the ability of long-term operation"

The 2023 KPL Spring Tournament Big Joe ban rate is high, is this hero really that strong?

It has saved tens of thousands of outsourcing costs! Solo producers are using AI to assist in developing games

May kills crazy! "Valorant", "Perfect New World" and many other new end-game games are tested together

After Blizzard NetEase broke up: players wandered, anchors were frustrated, and it was difficult to find a takeover

520 If you can't meet a soft-hearted God, it's better to see the new version of God

NetEase Games' ambitions and troubles

A little fierce? Just after the first test took the version number, "Sixteen Voices of Yanyun" is expected to be launched this year?

August auxiliary echelon: T0 level "version god costume" was born, Zhuang Zhou advanced to T1, bull demon is really fragrant!

Chinese Game Ranking: Who Can Continue to Traverse the Cycle?

King Experience Service Update: 7 hero adjustments, mirror reinforcement Shen Mengxi pullback, T0 auxiliary acceleration skill weakened