laitimes

Stanford's robot fried shrimp exploded on the Internet, and Google DeepMind released the latest progress of the robot

author:Xi Xiaoyao Technology said

In 2024, intelligent robots will give us unlimited imagination at the beginning.

Just yesterday, the Stanford Chinese team's "fried shrimp" robot exploded on the Internet:

, duration 00:49

It is no exaggeration to say that when this open-source project with a cost of 220,000 yuan is popularized and the cost is reduced for a period of time, 2024 may really be the first year of the future of robots!

Stanford's robot fried shrimp exploded on the Internet, and Google DeepMind released the latest progress of the robot

Yesterday, the new mobile robot Mobile ALOHA brought infinite surprises to everyone, and today the person in charge of the project, Zipeng Fu, a Chinese doctor from Stanford, updated a wave of follow-up videos. He brought Mobile ALOHA home and tried a series of "household chores" such as doing laundry, throwing out garbage, watering flowers, and more.

Stanford's robot fried shrimp exploded on the Internet, and Google DeepMind released the latest progress of the robot

Here's a sneak peek at how Mobile ALOHA performs:

, duration 02:37

Google DeepMind released the latest progress of intelligent robots

Still yesterday, Google's DeepMind struck while the iron was hot, po out a series of cutting-edge research progress on intelligent robots, and gave a technical report called "Shaping the future of advanced robotics".

Stanford's robot fried shrimp exploded on the Internet, and Google DeepMind released the latest progress of the robot

For humans, things that may be inherently simple, such as "tidying up the room", "cooking", etc., when a five-year-old child is given a command to take out the trash, they will quickly understand the semantics and take action. However, for a pure "mechanical creation" such as a robot, the robot needs to have a high understanding of the world from simple and direct natural language to the action of translating into the physical world.

So what is the high level of understanding of the world? The answer is on the horizon – the big model. As early as 22, Google open-sourced Robotics Transformers (RT-1) in the field of robotics, and successfully upgraded to Robotics Transformers 2 (RT-2), which is expected to achieve embodied intelligence, in 23

Stanford's robot fried shrimp exploded on the Internet, and Google DeepMind released the latest progress of the robot

On the basis of the RT series, in this technical report, Google officially announced AutoRT, SARA-RT and RT-Trajectory technologies to help intelligent robots applied in the real world improve their data collection capabilities, learning Xi speed, and have stronger generalization capabilities.

The first is AutoRT, as the name suggests, AutoRT is an AI Agent system that combines large models with robot control models (RT-1 or RT-2), specifically, AutoRT is considered by Google as a "data collection system" that aims to expand the learning and Xi capabilities of robots to make them better trained and adapted to the real world.

AutoRT can command multiple robots at the same time, each equipped with a camera and executive manipulator, and the system uses a large visual model (VLM) to help understand the environment and objects in line of sight, and the large model (LLM) gives the robot a series of tasks to be performed, such as moving objects, wiping tables, etc. AutoRT was able to coordinate 20 robots simultaneously in a variety of different environments over a seven-month evaluation period, collecting a large amount of diverse data, and the overall process is shown in the following diagram:

Stanford's robot fried shrimp exploded on the Internet, and Google DeepMind released the latest progress of the robot

SARA-RT, or Self-Adaptive Robust Attention for Robotics Transformers, is an Attention model architecture proposed by Google to make RT more streamlined and efficient. When applied to RT-2, SARA-RT improved the accuracy of the RT-2 model by more than 10% and increased by 14%. SARA-RT still aims at the old problem of Transformer, the attention module of quadratic complexity, while SARA-RT proposes a new model fine-tuning method - up-training, which converts quadratic complexity into linear complexity, greatly reduces the computational requirements, and provides a general method for accelerating Transformer.

Stanford's robot fried shrimp exploded on the Internet, and Google DeepMind released the latest progress of the robot

Finally, there is RT-Trajectory, a system that helps intelligent robots generalize better. Going back to the beginning, many of the tasks that are self-explanatory to humans require robots to translate instructions into actual physical movements in a variety of ways. To achieve this transformation, the model needs to have an efficient dataset for learning Xi, and have a good generalization ability for unknown tasks on the basis of learning.

Based on this starting point, Google designed the RT-Trajectory model to take each video in the training dataset and overlay it with a 2D trajectory sketch of the robot arm gripper as it performs the task, which will provide visual cues to the model in the form of RGB images. RT-Trajectory greatly improves the generalization ability of the model by translating vague control utterances in natural language, such as moving left and right, into specific robot movements.

Stanford's robot fried shrimp exploded on the Internet, and Google DeepMind released the latest progress of the robot

Superimposed AutoRT, SARA-RT and RT-Trajectory, we can see that a more powerful and realistic real intelligent robot is already waiting for us in the near future, whether it is Mobile ALOHA, which has given us visual shock, or more underlying technologies such as AutoRT, SARA-RT and RT-Trajectory, in 2024, I hope to give us a new shock in the field of robotics, such as GPT-4!

Stanford's robot fried shrimp exploded on the Internet, and Google DeepMind released the latest progress of the robot

Read on