laitimes

The first in the world! Google's DeepMind demonstrated RoboCat, an AI agent for general-purpose robots: it only takes 100 trainings to complete tasks and improve itself

author:National Business Daily

Per reporter: Cai Ding Per editor: Lan Suying

Robots are quickly becoming part of our daily lives, but they are often only used for specific tasks. While recent advances in AI can make robots useful in many ways, global progress in general-purpose robot manufacturing remains slow, in part because collecting real-world training data takes a lot of time. Recently, the latest research of Google's AI team DeepMind may solve this "pain point" faced by this field.

On June 20, Eastern time, DeepMind demonstrated RoboCat, an AI agent for robots. DeepMind calls it the world's first agent capable of solving and adapting to multiple tasks. What's more, RoboCat is a self-improving AI agent that can operate different robotic arms and solve tasks with a minimum of 100 demonstrations and improve from its self-generated data.

Google showcases the world's first multi-tasking AI agent

The latest paper by Google's AI team DeepMind introduces an AI agent that can improve itself, which is essentially a software program empowered by AI, equivalent to the "brain" of the robot, and the robot blessed by it is different from traditional robots in that RoboCat is more "versatile" and can achieve self-improvement and self-improvement.

Image source: DeepMind screenshot

In previous research, DeepMind explored how to develop robots that support large-scale learning multitasking, combining language model understanding with the real-world capabilities of assisted robots. The robotic agent, called RoboCat, is the world's first AI agent that can solve and adapt to multiple tasks, learning to perform various tasks on different robotic arms and then self-generating new training data to improve it.

RoboCat learns much faster than other advanced models – with just 100 demonstrations or so, RoboCat can learn to manipulate a robotic arm to complete a wide variety of tasks, and then iterate on self-generated data. This capability will help accelerate robotics research, as it reduces the need for human-supervised training and is an important step in creating general-purpose robots.

Alex Lee, a research scientist at DeepMind and co-author of the RoboCat team, said, "We show that a large model can solve a variety of tasks carried by multiple real robots and quickly adapt to new tasks. ”

According to DeepMind, RoboCat is based on its multimodal model, Gato (Spanish for "cat"), which can process language, images, and actions in both simulated and physical environments. DeepMind combines Gato's architecture with a large training dataset consisting of image sequences and actions of various robotic arms that can solve hundreds of tasks.

In the DeepMind demonstration video, RoboCat can already control the robotic arm through self-learning to complete tasks such as "looping", "building blocks" and "catching fruit". These tasks may seem simple, but they test the precision, comprehension and ability to solve shape matching problems in the robotic arm. RoboCat's success rate in accomplishing a new task has doubled from the initial 36%.

The first in the world! Google's DeepMind demonstrated RoboCat, an AI agent for general-purpose robots: it only takes 100 trainings to complete tasks and improve itself

Image source: DeepMind screenshot

Based on the original dataset and the data generated by the new training, RoboCat's dataset will contain millions of training trajectory data. The more new tasks it learns, the better it can learn and solve additional new tasks. DeepMind's paper argues that the dramatic increase in task success is due to RoboCat's growing experience, just as people develop more diverse skills as they deepen their learning in a particular area. RoboCat's ability to learn skills and improve itself quickly, especially when applied to different robotic devices, will help pave the way for future research.

The first in the world! Google's DeepMind demonstrated RoboCat, an AI agent for general-purpose robots: it only takes 100 trainings to complete tasks and improve itself

Image source: DeepMind screenshot

Embodied intelligence will lead the next wave of AI

The "Daily Economic News" reporter noted that at present, in the field of robotics, including Tesla, Google, Amazon, NVIDIA, Tencent and other giants have already laid out. However, as DeepMind pointed out above, because training robots takes a lot of time, the level of intelligence is still insufficient to achieve large-scale commercialization. The advent of RoboCat may solve this "pain point".

In fact, DeepMind's RoboCat is just one of the main examples of AI-enabled robots. Since the beginning of this year, several companies have applied language models to robots: in early 2023, Google launched the visual language model PaLM-E and applied it to industrial robots; In April, Alibaba connected the Qianwen model to industrial robots; In May, Tesla's humanoid robot Optimus demonstrated precise control and perception capabilities, and in the same month, NVIDIA released a new autonomous mobile robot platform.

Thanks to this, the robot embodied intelligence (Embodied Intelligence) supported by artificial intelligence has attracted widespread attention around the world.

Musk said at Tesla's 2023 shareholder meeting that humanoid robots will be Tesla's main long-term source of value in the future, "If the ratio of humanoid robots to people is about 2 to 1, then people's demand for robots may be 10 billion or even 20 billion, far exceeding the number of electric vehicles." NVIDIA founder Jensen Huang also said at the ITF World 2023 semiconductor conference that the next wave of AI will be "embodied intelligence."

The first in the world! Google's DeepMind demonstrated RoboCat, an AI agent for general-purpose robots: it only takes 100 trainings to complete tasks and improve itself

Image source: Screenshot of Soochow Securities Research Report

Soochow Securities Research Report pointed out that embodied intelligence first needs to understand human language, decompose tasks, plan subtasks, identify objects while moving, interact with the environment, and finally complete tasks. Soochow Securities believes that humanoid robots are well suited to the requirements of embodied intelligence and are expected to become benchmark applications. "The key to robot research is to adapt robots to the human environment and eventually enter the lives of thousands of households (industry, catering, medical and other fields). Humanoid robots are expected to take the lead in the B-end and eventually open the C-end market. The long-term market space is considerable. ”

Soochow Securities expects that in 2035, assuming that the price of humanoid robots is 200,000 yuan, and the care and companionship functions will add a cumulative penetration rate of 5%/7%/4% in the US/Europe/Asian markets, that is, the single-year penetration rate is 1%/1.4%/0.8% respectively. In the more pessimistic/neutral/optimistic scenario, the market size of the family scene will reach 3.00 trillion yuan/3.66 trillion yuan/4.26 trillion yuan respectively.

Daily economic news