laitimes

can cook and clean up, a team at Stanford only used 230,000 yuan to create a "robot nanny". Titanium Media Focus

author:Titanium Media APP
can cook and clean up, a team at Stanford only used 230,000 yuan to create a "robot nanny". Titanium Media Focus

Recently, a robot called Mobile ALOHA has quickly become popular on the Internet due to its excellent cooking and housekeeping skills.

The appearance of the robot in the video is relatively basic, mainly composed of two robotic arms, a host that provides computing power, and a bottom moving platform. On the whole, the robot is still in its infancy, more like a rough "prototype". Compared to the "robot butler" we see in science fiction works, it still has a lot of room for improvement. Still, the robot's potential should not be underestimated.

In the past, many service robots have demonstrated their capabilities such as cooking, supporting, and floor cleaning, so why does Mobile ALOHA have great potential? The main reason is that it is cheap and open-source. In other words, as long as you spend $32,000 (about 230,000 yuan) and have strong hands-on and learning Xi skills, you can reproduce a robot with these skills at home.

can cook and clean up, a team at Stanford only used 230,000 yuan to create a "robot nanny". Titanium Media Focus

The Stanford trio who developed Mobile ALOHA

The Universal Robot, Mobile ALOHA, developed by a team of three at Stanford, can perform a variety of complex tasks by mimicking Xi. At the same time, at the control level, in addition to autonomous operation, it can also support full-body remote control.

When will the "machine dream" of mankind come true?

From the perspective of the overall direction of robot development, it can be simply divided into two categories: special robots and general robots, the former mainly focuses on improving the production and work efficiency of a single scene. In daily life, people will be more or less exposed to such products, such as various robotic arms used in smart factories, sweeping robots used to clean the floor at home, and delivery robots used to deliver items in hotels, etc., which can be divided into special robots.

Compared with general-purpose robots, they have a wider range of applicability, especially at the level of "providing services for people", and one of the major structural characteristics of general-purpose robots is that they are more "anthropomorphic". Because its biggest use is to replace users to complete part of daily work, such as housework, cooking, etc., so many technology companies and research institutions will take "humanoid robots" as the development direction of continuous investment in the future at the beginning of product design.

can cook and clean up, a team at Stanford only used 230,000 yuan to create a "robot nanny". Titanium Media Focus

WABOT-1

The world's first full-scale humanoid "intelligent" robot WABOT-1 was born in 1972, but the early motors, drivers, and computing power are difficult to meet the application needs of robots, and WABOT-1 is only a product with human-like characteristics in form. In 2009, Boston Dynamics began the development of the PETMAN humanoid robot, and since then it has launched the continuous iteration of the Atlas, which has been the most talked about humanoid robot product throughout the 10s of the 21st century.

In 2022, with Tesla's first demonstration of the Optimus robot, this robot that can walk, wave, lift weights, and even assemble another "self" has once again refreshed the public's perception of the speed of development of humanoid robots.

can cook and clean up, a team at Stanford only used 230,000 yuan to create a "robot nanny". Titanium Media Focus

CyberOne

In addition to foreign companies, domestic technology companies have also accelerated the layout of humanoid machines in the past two years, such as exrobots of TIACE Technology, WALKER X robot released by UBTECH, CyberOne launched by Xiaomi, etc. However, there is a problem with all of these robots, that is, they only belong to the "future", and Tesla does not expect to complete mass production and bring the Optimus to the market until 2030.

In addition to showing the capabilities that Mobile ALOHA can achieve on the web page, the Stanford team also completely open-sourced the various parameters and data involved in the software and hardware of the robot, such as the hardware used by the robot and the specific parameter information.

"Robot nanny" with the ability to learn Xi

can cook and clean up, a team at Stanford only used 230,000 yuan to create a "robot nanny". Titanium Media Focus

Mobile ALOHA is equipped with 2 wrist cameras, 1 top camera, and is equipped with a power bank and a local computing module, using only 2 ViperX 300s when performing autonomously. The minimum and maximum height of both arms is 65 cm/200 cm, respectively, and can extend up to 100 cm from the base in all directions.

In terms of computing power, it can be seen from the video that the "brain" of Mobile ALOHA is actually a laptop, according to the official information, its configuration is Intel's twelfth generation core processor i7-12800H, and the graphics card uses Nvidia RTX3070ti, this set of configuration is placed in the market in 2024, which means that only one game book of about 8000 yuan is needed to meet the computing power needs of Mobile ALOHA.

In order to allow the robot to have a larger range of activities, the R&D team chose AgileX Tracer AGV (Tracer) as the mobile base when setting up Mobile ALOHA, which was originally a mobile platform designed for warehouse logistics, with a moving speed of 1.6m/s and a maximum payload of 100kg.

In addition to hardware, Mobile ALOHA also adds data and vision learning capabilities Xi, which is the more popular "big model" artificial intelligence. Mobile ALOHA's learning Xi capabilities consist of two parts, including a static ALOHA dataset and "imitation capabilities" based on visual recognition or manual manipulation.

can cook and clean up, a team at Stanford only used 230,000 yuan to create a "robot nanny". Titanium Media Focus

The static ALOHA dataset that has been open-sourced has a total of 825 demo tasks, including sealing a bag, picking up a fork, packing candy, tearing paper towels, opening a plastic cup with a lid, playing table tennis, using a coffee maker, flipping a pencil, fixing a Velcro cable, attaching batteries, operating a screwdriver, and more.

In the demonstration, the researchers used only 50 demonstrations per task to get the Mobile ALOHA robot to start learning to do this, such as wiping wine spilled on the table nine times in a row and riding the elevator five times in a row. In the end, the overall result was good, and Mobile ALOHA completed all the steps from food preparation and cooking to final cleaning.

In addition to the fixed action of chopping vegetables, Mobile ALOHA also learned to turn the spoon and put on the plate. In the process of tidying up, it can not only move large and bulky furniture such as chairs, but also grasp and clean small and smooth items such as plates, which shows that as a general-purpose robot, its scene application space is still very large.

So for the entire robot industry, Mobile ALOHA is also of great significance, first of all, as an open source project, it will naturally attract many companies to start exploring similar robot-like commercial landing capabilities based on it, and as parts change from the current individual procurement to large-scale supply, the cost of finished products will be further reduced, and then promote the popularization of robots.

At the same time, Mobile ALOHA also pointed out that when the mechanical components continue to mature, relying on the learning and Xi capabilities given by the large model artificial intelligence to the robot can release the application potential of the robot faster.

Previously, Oussama Khatib, director of the Stanford University Robotics Laboratory, professor of computer science, IEEE Fellow, and president of the International Robotics Research Foundation, said: "One of the new environments and new challenges faced by robots is the cost of learning and Xi learning brought about by complex environments. ”

The large-scale model artificial intelligence and the increasingly mature robot parts supply system have obviously accelerated the process of "robots entering ordinary homes to serve the public". (This article was first published on the Titanium Media App, author/Deng Jianyun, editor/Zhong Yi)

Read on