laitimes

Tsinghua SenseTime's latest AI conquers Minecraft

author:Quantum Position

The west wind comes from the temple

Qubits | Official account QbitAI

Following the amazing "Minecraft" of GPT-4, domestic AI agents have also come -

Survive, explore and create like a human and beat the entire game!

Tsinghua SenseTime's latest AI conquers Minecraft

Starting from scratch in survival mode, you can not only get all the items in the Overworld, dig diamonds, but also craft enchanted books!

You can hold it in the face of various terrains, environments, day and night, and even deal with monsters.

Tsinghua SenseTime's latest AI conquers Minecraft

It's all done by this little thing, and it's called Ghost in the Minecraft (GITM).

Tsinghua SenseTime's latest AI conquers Minecraft

It was jointly developed by SenseTime, Tsinghua University, Shanghai Artificial Intelligence Laboratory and other institutions.

Compared with previous agents, the characteristics of GITM can be described in two words: stronger.

  • More tasks to accomplish:

Achieved 100% mission coverage on all technical challenges in the Overworld.

  • Higher task success rate:

On the "Get the Diamond" task, a high success rate of 67.5% can also be achieved.

Tsinghua SenseTime's latest AI conquers Minecraft

Seeing this, you will definitely ask: so strong, training takes a long time, right?

Leak! A single CPU node only takes two days of training!

Break the limits of AI development

In the process of AI development, there is a very interesting but contrary to common sense:

Some tasks that are relatively difficult for humans, such as playing chess, are relatively easy for AI to achieve; In the open world, which is relatively simple for humans, such as interacting with the environment, planning and making decisions, AI faces great challenges.

This is the Moravik paradox.

However, this generalist AI agent GITM is said to have successfully broken this paradox limitation:

It can make breakthroughs in complex and real-world-like environments, capable of surviving, exploring, and creating like humans.

Let's take a look at how it performs:

IN MINECRAFT, GITM ACHIEVES 100% MISSION COVERAGE ON ALL TECHNICAL CHALLENGES IN THE OVERWORLD, I.E. SUCCESSFULLY UNLOCKING 262 ITEMS IN THE FULL TECH TREE.

Previously, the sum of all agents could only cover 30%. (All previous agent methods, including OpenAI and DeepMind, have only unlocked a total of 78.)

Tsinghua SenseTime's latest AI conquers Minecraft

△ Red represents items that have also been unlocked by other agents, and green represents items that are only unlocked by GITM

Tsinghua SenseTime's latest AI conquers Minecraft

On the most talked about "Get Diamonds" task, GITM achieved a success rate of 67.5%, an improvement of 47.5% compared to the current best score (OpenAI VPT).

Tsinghua SenseTime's latest AI conquers Minecraft

But goose, here's the point.

In terms of training efficiency, GITM has also reached new heights. The number of environmental interaction steps only needs one-ten-thousandth of the existing method, and a single CPU node can be completed in two days of training.

That's far less than the 6480 GPU-days required for the previous OpenAI VPT or the 17 GPU-days required for DeepMind Dreamer V3.

Tsinghua SenseTime's latest AI conquers Minecraft

Not only that, but GITM can also be further applied to the more complex missions of Minecraft, such as shelters, farmlands, and iron golems needed to survive, redstone circuits needed to create automation equipment, and Nether portals to enter the Nether.

Tsinghua SenseTime's latest AI conquers Minecraft

Behind the powerful capabilities and scalability of GITM is the blessing of the Large Language Model (LLM).

The heart of GITM: The Big Language Model

The biggest dilemmas previously faced by reinforcement learning-based agents were:

How to map an extremely long time domain and complex target into a series of keyboard and mouse operations.

To solve this problem, GITM developers have adopted agents based on the Large Language Model (LLM).

Tsinghua SenseTime's latest AI conquers Minecraft

Unlike reinforcement learning agents that map directly, their LLM-based agents take a layered approach:

First split the decomposition target into sub-targets, then further split into structured actions, and finally split into keyboard and mouse actions.

Tsinghua SenseTime's latest AI conquers Minecraft

Specifically, LLM-based agents include LLM disassemblers, LLM planners, and LLM interfaces, which are responsible for decomposing sub-targets, structured actions, and keyboard/mouse operations, respectively:

1) The LLM decomposer first breaks down the goal into a series of well-defined sub-goals based on text-based knowledge gathered from the internet.

2) The LLM planner then plans a series of structured operations for each sub-goal. LLM Planner also records and summarizes the list of successful actions into text-based memory to enhance planning capabilities.

3) The LLM interface interacts with the environment by processing raw keyboard/mouse input and receiving raw observations, performing structured operations.

Tsinghua SenseTime's latest AI conquers Minecraft

Previously, SenseTime based on supervised learning and reinforcement learning refined into a DI-star that can pick top players in StarCraft 2.

TRAINING A DI-STAR USES "160,000 VIDEOS" AND "100 MILLION GAMES".

And this time, with the blessing of the big language model, things have become interesting again.

Project address: https://github.com/OpenGVLab/GITM

— End —

Qubits QbitAI · Headline number signed

Follow us and be the first to know the latest scientific and technological trends

Read on