AI interesting: NVIDIA crammed GPT4 into Minecraft 15 times faster

The gaming industry may be changing?

The general AI large model GPT-4 entered the game, entered the open world, and played at a high level.

YESTERDAY, NVIDIA'S VOYAGER RELEASED A LITTLE SHOCK IN THE AI CIRCLE.

AI interesting: NVIDIA crammed GPT4 into Minecraft 15 times faster

VOYAGER is the first large-model-driven, lifelong learning game agent, a well-known AI scholar, Andrej Karpathy, who just returned to OpenAI, said after reading the paper: Remember how desperate it was to develop AI agents in an environment like Minecraft around 2016?

Now the style of painting has changed - the right thing to do is to forget all of this, first use the whole network data to train a large language model (LLM) to learn the world, reason and tool use (coding), and then let it solve the problem in this way by NVIDIA.

Karpathy concluded: If I had read about this "gradient-free" proxy approach in 2016, I would have been shocked.

After the experts were finished, the others' thoughts were straightforward: it looked one step closer to general artificial intelligence (AGI).

There are also people imagining the scene of future games, driven by large models NPCs, and the realm of vitality and competition jumps into sight:

We know that ChatGPT, which leads the trend of technology, is a text-interactive chatbot, and because GPT-4 upgrades multimodal capabilities, people often predict that the next step in general AI is to put this large model in the robot and let it interact with the real world.

In the way robots interact with the real/virtual world, advanced large models like GPT-4 unlock a new paradigm: "training" is code execution rather than gradient descent. THE "TRAINED MODEL" IS A SKILL CODE BASE OF VOYAGER'S ITERATIVE COMBINATIONS, NOT A FLOATING-POINT MATRIX. Now, we're pushing gradient-free architectures to their limits.

IN MINECRAFT, VOYAGER QUICKLY BECAME AN EXPERIENCED EXPLORER, ACQUIRING UNIQUE ITEMS BY 3.3X, TRAVELING 2.3X MORE, AND UNLOCKING KEY TECH TREE MILESTONES 15.3X FASTER THAN THE PREVIOUS METHOD.

NVIDIA HAS THOROUGHLY OPEN-SOURCED VOYAGER'S RESEARCH:

Link to paper: https://arxiv.org/pdf/2305.16291.pdf
Project Home:https://voyager.minedojo.org/
GitHub：https://github.com/MineDojo/Voyager

Research background

Building embodied agents with universal capabilities to constantly explore, plan and develop new skills in the open world is a huge challenge in the field of artificial intelligence. Traditional methods employ reinforcement learning and imitation learning, which operate on the basis of primitive behaviors, and can be challenging for systematic exploration, interpretability, and generalization.

Recently, agents based on large language models (LLMs) have made breakthroughs in these areas, using the world knowledge encapsulated in pre-trained LLMs to generate consistent action plans or executable strategies. They are applied to embodyful tasks like games and robots, as well as NLP tasks without embodyment. However, these agents are not lifelong learners and cannot gradually acquire, update, accumulate and transfer knowledge over a long time span.

Unlike most other games studied in AI, Minecraft does not impose a predetermined end goal or fixed storyline, but instead offers a unique playground with endless possibilities. An efficient lifelong learning agent should have similar capabilities to human players:

(1) propose suitable tasks based on its current skill level and state of the world, for example, if it finds itself in a desert instead of a forest, it will learn to obtain sand and cacti before striking iron;

(2) refine skills based on environmental feedback, and store the mastered skills in memory for future reuse in similar situations (e.g., fighting zombies is similar to fighting spiders);

(3) Constantly explore the world and find new tasks in a self-driven way.

VOYAGER is the first LLM-powered agent embodying lifelong learning that can drive exploration in Minecraft, master a wide range of skills, and continually make new discoveries without human intervention.

The researchers used code as an action space rather than low-level motion instructions, because programs can naturally represent extended and combinatorial actions in time, which are critical for many long-term tasks in Minecraft.

VOYAGER interacts with the black box LLM (GPT-4) through prompt and contextual learning. It is worth noting that this method avoids the need for access to model parameters and explicit gradient-based training or fine-tuning.

SPECIFICALLY, VOYAGER TRIES TO SOLVE PROGRESSIVELY DIFFICULT TASKS PROPOSED BY AUTOMATED COURSES. The course was generated by GPT-4 with the overall goal of "discovering as much of the different thing as possible". This approach can be seen as a contextual novelty search. BY STORING ACTION PROGRAMS THAT HELP SUCCESSFULLY SOLVE A TASK, VOYAGER GRADUALLY BUILDS A SKILL BASE. Each program is indexed by the embedding it describes, which can be retrieved in similar situations in the future. COMPLEX SKILLS CAN BE SYNTHESIZED BY COMPOSING SIMPLER PROGRAMS, WHICH ALLOWS VOYAGER'S ABILITIES TO QUICKLY BECOME "COMPOUNDED" OVER TIME, ALLEVIATING "CATASTROPHIC FORGETTING" IN OTHER METHODS OF CONTINUOUS LEARNING.

method

VOYAGER consists of three new components: (1) automated lessons to propose goals for open exploration; (2) a skill pool for developing increasingly complex behaviors; (3) Iterative prompt mechanism to generate executable code for embodied control.

Automatic courses

Embodied agents encounter target environments of varying complexity in open mode. This component provides many benefits for open exploration, enabling a challenging but manageable learning process, fostering curiosity-driven intrinsic motivation for agents to learn and explore, and encouraging the development of generic and flexible problem-solving strategies.

The automated course component leverages internet-scale knowledge to provide a very strong adaptability and responsiveness by enabling GPT-4 to deliver a constant stream of new tasks or challenges. Automatic lessons maximize exploration based on exploration progress and the state of the agent. The course was generated by GPT-4 based on the overall goal of "discovering as many different things as possible."

Skill pool

AS AUTOMATED COURSES CONTINUE TO PRESENT INCREASINGLY COMPLEX TASKS, VOYAGER NEEDS TO HAVE A SKILL BASE THAT SERVES AS A FOUNDATION FOR LEARNING AND EVOLUTION. Inspired by the versatility, interpretability, and universality of the program, the research team represented each skill with executable code that supported temporary extensions to complete specific tasks proposed by automated courses.

Specifically, the top of the skill pool is used to add new skills. Each skill is indexed by the embedding of its description and can be retrieved in similar situations in the future.

At the bottom of the skill pool is skill retrieval. When an automated course proposes a new task, the skill pool executes a query to determine the 5 most relevant skills. Complex skills can be synthesized by writing simpler programs. THIS APPROACH ALLOWED VOYAGER'S CAPABILITIES TO GROW RAPIDLY OVER TIME AND ALLEVIATED THE PROBLEM OF "CATASTROPHIC FORGETTING."

Iterative prompt mechanism

The research team introduced an iterative prompt mechanism for self-improvement through three types of feedback, including environmental feedback, execution errors, and self-verification of checking the success of tasks.

The image below (left) is an example of environmental feedback: GPT-4 realizes that it needs 2 more boards before making sticks. An example of an execution error is shown in the figure below (right), where GPT-4 realizes that it should make a wooden axe instead of a bush axe, as there are no bush axes in Minecraft.

The following figure is an example of self-validation. By providing GPT-4 with the agent's current state and task, GPT-4 acts as a "commenter" and notifies the program whether the task has been completed. In addition, if the task fails, it "criticizes" the agent and provides suggestions on how to complete the task.

experiment

In the experiment, the researchers systematically compared the exploration performance of VOYAGER and baseline, the mastery of the technology tree, the map coverage, and the zero-sample generalization ability of new tasks in the new world.

They utilize OpenAI's gpt-4-0314 and gpt-3.5-turbo-0301 APIs for text embedding, as well as text-embedding-ada-002 APIs for text embedding. All temperatures are set to 0, except for automatic curriculum, which requires temperature = 0.1 to encourage task variety. The simulation environment is built on top of MineDojo and utilizes Mineflayer's JavaScript APIs for motor control.

The results of the assessment are as follows:

Significantly stronger ability to explore

VOYAGER's strength lies in its ability to continuously make new advances (Figure 1), such as discovering 63 unique projects in 160 prompt iterations, 3.3 times more than its peers. AutoGPT, on the other hand, is significantly lagging behind in discovering new projects, while ReAct and Reflexion are struggling to make significant progress.

Mastery of the tech tree

The technology tree in Minecraft tests the ability of agents to make and use tool levels. Progress through this tree (wooden tools→ stone tools→ iron tools→ diamond tools) requires systematic and compositional skills on the agent.

In Table 1, the score represents the number of successful trials in three total runs. The number is the average number of prompt iterations over three trials, and the fewer iterations, the more effective the method. VOYAGER unlocks wood levels 15.3x faster (in terms of prompt iterations), 8.5x faster to unlock stone levels, and 6.4x faster to unlock iron levels compared to baseline, and VOYAGER is the only model that can unlock diamond levels in the tech tree.

Extensive map traversal

VOYAGER's range of action is able to cover 2.3 times the distance and traverse a variety of terrains compared to baseline, while baseline agents often find themselves confined locally, which greatly hinders their ability to discover new knowledge (Figure 7).

Zero-sample generalization capability for unseen tasks

To assess zero-sample generalization capabilities, the researchers cleared the agent's library, reset it to a new instantiated world, and tested it with unseen tasks. FOR VOYAGER AND AUTOGPT, THEY USED GPT-4 TO BREAK DOWN THE TASK INTO A SERIES OF SUB-GOALS.

AS SHOWN IN TABLE 2 AND FIGURE 8, VOYAGER CAN SOLVE ALL TASKS CONTINUOUSLY, WHILE BASELINES CANNOT SOLVE ANY TASKS IN 50 PROMPT ITERATIONS. NOTABLY, THE SKILL BASE BUILT FROM LIFELONG LEARNING NOT ONLY ENHANCES VOYAGER'S PERFORMANCE, BUT ALSO GIVES AUTOGPT A BOOST. This shows that the skill bank is a versatile tool that can be readily adopted by other methods, effectively acting as a plug-and-play asset to improve performance.

Ablation studies

The researchers dissolved 6 design choices (automated courses, skill pools, environmental feedback, execution errors, self-validation, and GPT-4 for code generation) in VOYAGER and investigated their impact on exploration performance, as shown in Figure 9.

VOYAGER's performance outperforms all alternatives, demonstrating the critical role of each component. In addition, GPT-4 is significantly better than GPT-3.5 in terms of code generation.

Finally, NVIDIA's researchers also pointed out some limitations and future work directions.

The first is the question of cost. The GPT-4 API resulted in significant costs. It costs 3.5 times more than GPT-15. HOWEVER, VOYAGER NEEDED GPT-4 TO MAKE A LEAP IN CODE GENERATION QUALITY THAT NEITHER GPT-3.5 NOR OPEN SOURCE LLM COULD PROVIDE.

Second, despite the iterative prompt mechanism, there are cases where the agent gets stuck and cannot generate the right skills. Automatic courses have the flexibility to retry this task at a later time. The self-verification module may occasionally fail, for example, by failing to recognize that a spider string is a sign of success in bringing down the spider.

Then there is the problem of "hallucinations" of large models. Automatic courses occasionally present tasks that cannot be completed, such as agents may be asked to make "bronze swords" or "copper breastplates" that do not exist in the game. Hallucinations also occur during code generation, such as GPT-4's tendency to use pebbles as fuel input, which is an invalid fuel source in games. In addition, it may call functions that are not in the original API of the provided control, resulting in code execution errors. The researchers believe that improvements to the GPT API model and new technologies for fine-tuning open source LLM will overcome these limitations in the future.

AI interesting: NVIDIA crammed GPT4 into Minecraft 15 times faster

Read on

The "sleeping prophet" who accurately predicted the future, predicted that China would lead the world and Japan would sink

In 14 years, his son lost contact with Malaysia Airlines, and many years later, his father received a reply, convinced that his son had gone to a parallel world

At the age of 18, she went to Peking University and was buried in Babaoshan at the age of 24

The most beautiful quadruplets in China: 25 years of sharing a face, being admitted to a world-famous school at the same time, and conquering Nicholas Tse

How terrible is the head of the rotten country? Human trafficking, the world's largest murder rate, and the president personally sells drugs!

The world's most difficult mountain to climb: no one has reached the summit, and the death rate of those who try to reach the summit is 100%

"The most expensive piece of in history" shocked the world! Collected in England, it is more precious than the jewel in the crown

A large number of celebrity gambling photos were picked! There is no shortage of world champions, and some people owe 600 million, and almost all of them lose

Congratulations to the national football team! Asia's No. 1 helped to score points, 22-2, and the coach announced that he would go straight to the World Cup without releasing water

The first world-class luxury train in China! 180,000 tickets are still sold out, can the "train tour" really be popular?

The former president of the Philippines became the "world's largest corrupt", hiding tons of gold at home, dragging down the country's economy

Just today, the Chinese women's volleyball team was announced, and Zhu Ting and other three world champions were defeated

The White House may not be able to do anything! This Chinese man, who is concerned by the whole world, suddenly "left the United States"

"Jumping Beam Clown" Yang Zhuoma fell to the altar, will not be tolerated by the world, and his reputation will completely collapse

He was behind World War II, and may be brewing World War III!

Zheng Qinwen lost to the world No. 3 again, and the prize money stopped at 240,000 euros, and there was two good and one bad news after the game