laitimes

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

Wisdom Stuff (Public Number: Zhidxcom)

Author | Yunpeng

Edit | Heart

Zhidong news on June 21, just now, Google DeepMind launched a self-improvement, self-improving (self-improving) AI agent for robots, called RoboCat.

DeepMind claims that it is the world's first robotic AI agent that can solve and adapt to a variety of tasks, and it can complete these tasks on a variety of real robot products.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

RoboCat manipulates robotic arms to complete a variety of tasks, source: Google DeepMind

Overall, RoboCat's main breakthrough is in three aspects:

DeepMind allows a neural network to work on multiple different robots, quickly operate new robotic arms, and solve new complex tasks.

2. The more new tasks RoboCat learns, the better it is at learning and solving additional new tasks.

3. RoboCat is an important research progress in the field of general robotics, which can reduce the need for human supervision and training.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

RoboCat solves more types of tasks, source: Google DeepMind

AI agents can control the robotic arm by themselves, learn to play with hoops, build blocks, and catch fruits! It is extremely efficient and does not require much manpower.

With just 100 or so demonstrations, RoboCat can learn to manipulate the robotic arm to complete a variety of tasks, and it can also iterate on self-generated data.

Most importantly, RoboCat has never seen anything before before in terms of the robotic arm it controls or the tasks it completes.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

RoboCat can solve various tasks, source: Google DeepMind

This "universal learning ability" is RoboCat's strength, in addition, the most important feature of RoboCat is "learning fast", this ability is of great significance for accelerating research in the field of robotics, because with this ability, the need for human supervision training will be greatly reduced, which is a very important part of creating universal robots.

In the DeepMind demo video, RoboCat can complete tasks such as "hoops", "building blocks", and "holding fruit" through self-learning. RoboCat's success rate in completing a new task has increased from 36% to 74%.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

▲ Comparison of the success rate of RoboCat before and after the completion of tasks, source: Google DeepMind

And according to the DeepMind paper, the success rate of RoboCat in completing real-world training tasks is much higher than that of traditional vision-based model schemes, and the leading range is still relatively obvious, which is also an important value of DeepMind's research.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

Comparison of RoboCat and vision-based models in terms of success rates in completing real-world training tasks, source: Google DeepMind

It is worth mentioning that one of the key technologies used by RoboCat is a multimodal model Gato, which means "cat" in Spanish, that is, "cat", which is one of the origins of the name "RoboCat".

Researchers have previously explored how robots learn multiple tasks at scale, and combine understanding of language models with real-world robotic capabilities. The advancement of RoboCat is that it is the first robotic AI agent that can solve and adapt to a variety of tasks.

DeepMind believes that RoboCat can learn skills independently, quickly improve itself, and quickly adapt to different hardware devices will play an important role in promoting the development of a new generation of general-purpose robot AI agents.

Paper Address:

https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/robocat-a-self-improving-robotic-agent/robocat-a-self-improving-foundation-agent-for-robotic-manipulation.pdf

First, the ferrule and building blocks are proficient, how many steps are there to take the fruit out of the bowl?

First, let's take a look at what exactly this RoboCat can do.

From the DeepMind demonstration video, we can see that the researchers place the object under the robot's camera, and the robot will set the state of the placed object to "target image", and after setting the target image, the researcher will restore the placement position of the object, and then let the robot operate to restore the object placement state just now.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

▲ RoboCat completes the "hoop" task, source: Google DeepMind

In the task of "ferrule", RoboCat can very well manipulate the robotic arm to restore the position of the orange-red circle.

In the same type of "ferrule" task, RoboCat can also solve more complex situations, such as distinguishing between large and small circles and accurately fitting them on the corresponding metal column.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

RoboCat completes more complex "ferrule" tasks, source: Google DeepMind

DeepMind also demonstrated a task of catching fruit. This task RoboCat has been seen in previous training, but it is worth noting that there has never been a "human hand" in the previous training data, and this time the target image set by the researchers for RoboCat contains human hands, and in the end, RoboCat can still successfully complete the task.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

▲ RoboCat completes the task of catching fruit when the target image has "human hands" interference, source: Google DeepMind

This is not over, the follow-up researchers further increased the difficulty, allowing RoboCat to control a robotic arm it had never seen before, this robotic arm is different from the previous use to catch fruit, but in the end RoboCat can still control this new robotic arm to complete the task.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

RoboCat controls a robotic arm that has never been seen before to complete previously learned tasks, source: Google DeepMind

In another "building block" test, the researchers demonstrated another skill of RoboCat, when the target image is set, RoboCat can restore the state of the block in the target image very well, regardless of the initial block position.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

▲ When the target image is set, regardless of the initial block position, RoboCat can well restore the block state in the target image, source: Google DeepMind

In addition to building blocks, RoboCat can also complete tasks such as taking fruit out of a bowl.

Second, based on the super large data set, it will also self-iteratively upgrade, and you can master new skills in five steps

Looking specifically at the hardcore technology behind RoboCat, DeepMind mentioned that RoboCat uses a multimodal model Gato, which can process language, images and actions in both simulated and physical environments, and the researchers combined Gato's architecture with a large training dataset containing image sequences and actions of various robotic arms solving hundreds of different tasks.

After the first round of training, the researchers put RoboCat into a "self-improvement" training cycle in which the RoboCat learns to solve many tasks it has never seen before.

Learning for each new task is divided into five steps:

Collect 100-1000 demonstration of new tasks completed by researchers-controlled robotic arms.

2. Fine-tune RoboCat on the robotic arm used in the new task to create a dedicated derivative agent.

3. The derivative agent exercises 10,000 times on the robotic arm to generate more training data.

4. Combine demo data and self-generated data into RoboCat's existing training dataset.

5. Train the new version of RoboCat on a new training dataset.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

Schematic diagram of RoboCat's training cycle, which can generate additional training data by itself, source: Google DeepMind

The combination of all of this training means that RoboCat's dataset will contain millions of training trajectory data from real robotic arms as well as simulated robotic arms, including RoboCat self-generated data.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

RoboCat learns from a variety of training data types and tasks, source: Google DeepMind

In total, the researchers used four different types of robots and various robotic arms to collect vision-based data.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

RoboCat uses real and virtual robotic arms to accumulate training data, source: Google DeepMind

RoboCat: A "self-improving generalist"

With this variety of training methods, RoboCat can learn to operate different robotic arms in a matter of hours, including some more complex ones that have never been seen before.

RoboCat can operate these robotic arms to complete previously seen tasks, such as ferrule, fetch fruit, and even place objects of the corresponding shape in the corresponding shape of the grid, which will test the accuracy, comprehension and ability to solve shape matching problems of RoboCat operation.

DeepMind "robot cats" are self-taught, can operate multiple robots, and do not rely on human supervision

RoboCat uses a new robotic arm to complete previously learned tasks, source: Google DeepMind

In DeepMind's words, RoboCat is a "self-improving generalist" because it is based on a virtuous training cycle to learn new tasks. In simple terms, the more new tasks it learns, the better it can learn and solve additional new tasks.

The original version of RoboCat had only a 36% chance of successfully completing a task that had never been seen before after 500 demonstrations per new task, but the latest version of RoboCat has increased this success rate to 74%.

These improvements are attributed to RoboCat's growing breadth of experience, just as humans continue to deepen their learning in specific areas to develop more diverse capabilities.

Today, robots have been widely used in our lives, but most robots can only complete specific tasks, and these robots are basically programmed in advance.

Progress in building "general-purpose robots" that can accomplish a wider variety of tasks has been slow because collecting training data in the real world is time-consuming and laborious.

RoboCat's ability to learn skills independently, improve itself quickly, and adapt quickly to different hardware devices will play an important role in promoting the development of a new generation of general-purpose robot AI agents.

Conclusion: The introduction of multimodal AI models has further advanced the research of general robots

At a time when global AI research hotspots are flocking to large models, Google DeepMind seems not keen on big model competitions, still focusing on solving the problem of how AI interacts with the physical world, and focusing its research on optimizing the basic models of robotics.

And the newly released RoboCat is definitely an amazing AI model. It solves various picking and placing tasks on different platforms through visual object adjustment, and can learn to perform various tasks on different robots in just 100 demonstrations, and the method of improving skills from self-generated training data is eye-catching.

The introduction of multimodal AI models has contributed another exciting development to the journey towards universal robots!

Source: Google DeepMind