The robot uses a large model, and the buff is stacked! AI academic goddess Li Feifei released embodied intelligence results

#暑期创作大赛#

Wen 丨Congerry

Is the day when AI rules the world near?

A research team led by AI scientist Feifei Li at Stanford University has announced a new result in the field of embodied intelligence, using large language models (LLMs) and visual language models (VLMs) to drive robots.

Robots are able to plan and execute operational tasks based on complex instructions given by humans in natural language. To put it bluntly, the vernacular can command the robot.

Open the top drawer and watch out for that vase!

What's more, with the support of large models, robots are not only able to interact effectively with the environment, but also complete various tasks without additional data and training, such as bypassing obstacles, opening bottles, pressing switches, unplugging charging cables, etc.

This system, named VoxPoser by Li Feifei's team, does not require an additional pre-training process like traditional methods, but directly solves the problem of scarcity of robot training data.

Address: https://voxposer.github.io/voxposer.pdf

How is VoxPoser made?

How does VoxPoser manage to understand natural language instructions without the need for predefined motion primitives or additional data and training?

First, the robot uses a camera to collect environmental information.

Second, based on language instructions, large language models (LLMs) generate code that interacts with visual language models (VLMs).

A 3D map is then generated.

Finally, the robot plans the route according to the map information to complete the action.

Mapping to the real world completes the operation of opening the drawer.

This process makes the robot more human-like, no longer relying on the database entered in advance, thus achieving zero-sample capability. (Receiving instructions→ eyes to obtain information → action)

In addition to opening drawers, the robot can "sort the garbage into a blue tray", "remove the bread from the toaster", "take out a napkin", "open the vitamin bottle", "measure the weight of the apple", "close the top drawer", "sweep the trash into the dustpan", "unplug the phone charger", "hang the towel on the shelf", "press the moisturizer pump", "put down the spoon", "turn on the light", etc.

And, even in the event of interference, the robot can still complete the task.

In addition, VoxPoser has emerged four behavioral abilities.

Evaluate physical characteristics: Given two blocks of unknown mass, the robot's task is to conduct physical experiments using the available tools to determine which block is heavier.
Behavioral common sense reasoning: In the task of laying out the dishes by the robot, the user can specify behavioral preferences, such as "I am left-handed", which requires the robot to understand its meaning in the context of the task.
Fine-grained language correction: For tasks that require high accuracy, such as "put a lid on a teapot," the user can give the robot accurate instructions, such as "you are off by 1 centimeter."
Multi-step vision program: Given a task "to open the drawer precisely in half", due to the lack of information due to the absence of an object model, the robot can propose a multi-step operation strategy based on visual feedback, that is, first fully open the drawer while recording the handle displacement, and then push it back to the midpoint to meet the requirements.

At present, VoxPoser has some limitations, such as it requires an external perception module, the need to manually enter prompt words for the built-in large model, and the need for a general dynamic model to achieve more diverse actions.

Embodied intelligence, Li Feifei pointed out the development direction of computer vision

Who is Li Feifei?

Feifei Li is the world's top Chinese female AI expert, tenured professor and director of the Artificial Intelligence Lab at Stanford University, former vice president of Google and chief scientist of Google Cloud, whose research areas include computer vision, machine learning, deep learning, cognitive neuroscience, etc.

She has also cultivated many outstanding AI talents, such as Andrej Karpathy, a former Open AI researcher and current director of artificial intelligence and autonomous driving vision at Tesla.

The reason for completing VoxPoser is because Li Feifei knows the importance of data to machine learning and the difficulty of obtaining it.

Feifei Li led the creation of the ImageNet dataset in 2006, the first large-scale labeled image dataset for computer vision algorithms, which contains tens of millions of labeled images that can train complex machine learning models, and is considered a milestone in the history of artificial intelligence.

But the data was time-consuming to collect and process, taking nearly 50,000 crowdworkers from 167 countries three years to complete.

In 2022, a paper by Feifei Li and Krishna R. titled "In Search of the North Star of Computer Vision" was published in the journal Daedalus.

In the paper, Li Feifei pointed out that after the success of ImageNet and object recognition, there are many exciting research directions and challenges in the field of computer vision, such as embodied intelligence, visual reasoning, scene understanding, etc.

Li Feifei believes that embodied intelligence is an important and challenging direction of artificial intelligence, which requires robots or other agents to be able to interact with the physical world in a complex and changeable environment, combining vision, language, reasoning and other capabilities.

Moreover, embodied intelligence is not limited to humanoid robots, and any morphologically moving intelligent machine belongs to embodied intelligence.

In addition to Li Feifei, NVIDIA founder Huang Jenxun and Tesla CEO Musk are also very optimistic about the prospects of embodied intelligence.

At present, Li Feifei's team has taken the first step, does this mean that the pace of AI dominating the world is one step closer?

If you have anything to say, welcome to leave a message in the comment section 7 before the screen to discuss! We will give unlimited red envelopes to students who like, comment and follow~

The robot uses a large model, and the buff is stacked! AI academic goddess Li Feifei released embodied intelligence results

Read on

AI Godmother Li Feifei: The AI academic community has no money and no resources!

AI goddess Li Feifei gave a lecture, and Stanford's course became popular all over the Internet!

The top ten protagonists of Gulong martial arts are ranked, Xiao Li Feidao Li Xunhuan is on the list, and Ding Peng, the full moon scimitar, is ranked first

The 7 actors of "Xiao Li Flying Knife" are now compared, the protagonist is too angry and has no drama to film, but the supporting roles are in a line

The women's volleyball team is 1.95 meters sharp in attack and fastball, and Cai Bin relies on her to win the Olympic championship?

Li Feifei: When history was created, only a few people in the world knew about it

Yoon Suk-yeol's apology and dismissal of ministers The results can only be carried to the end

The leakage test materials of the space station brought back by Shen 17 are outstanding and worthy of celebration.

Another enterprise has landed! The Kekedala Economic and Technological Development Zone has achieved a lot of investment results

Daqo Energy has achieved remarkable results in reducing costs and increasing efficiency, earning 330 million yuan in the first quarter

Focus on the 2024 Zhongguancun Forum: 10 major achievements of the Xiong'an aerospace industry were released

We should give better play to the role of colleges and universities in accelerating the transformation of scientific and technological achievements into real productive forces

The Iraqi team has entered the Olympic Games for the sixth time, and has paid great attention to youth training and reaped fruitful results

Nine cutting-edge achievements in the field of medical artificial intelligence in 2023 are released!

Guess how Li Fei lost weight: bio-black technology or the live version of The Hunger Games?

The J-15 "Flying Shark" is an important achievement of China's aviation industry on the road of independent research and development

Netizen: Is this still the little old man I know?

Ten major scientific and technological achievements were released, involving brain-computer interfaces, chips, etc

OpenAI may launch a search engine to challenge Google, Li Feifei AI company has received financing to focus on "spatial intelligence", and Chang'e-6 has been successfully launched to start its journey to the moon

AGI News: Stanford Li Feifei started his first business, aiming at "spatial intelligence"; OpenAI will release a search product next week to challenge Google

2024 AUA Voice of China | The innovative achievements of new drugs in the national research and development continue to emerge and reappear on the international academic stage

Nonglin Lower Road Primary School held the 2024 Campus Red Scarf Labor Fruits Charity Sale

Li Feifei was exposed to AI entrepreneurship and spatial intelligence, and has raised a seed round of financing

Listen to archaeologists talk about archaeology and let the public enjoy the results of archaeology

Li Feifei, the godmother of AI, started her first business, established a "spatial intelligence" company, and completed the seed round

Li Feifei, the godmother of AI, founded a spatial intelligence company that strives to overcome the existing limitations of large-scale AI technology

Li Feifei started a business! The company's direction is "spatial intelligence"

Li Feifei, the "godmother of AI", founded a spatial intelligence company in an effort to overcome the existing limitations of AI technologies such as large models

Li Feifei's latest report: The private investment in the field of AI in the United States last year was nearly 9 times that of China, and China is still so

The summary meeting of the results of the twinning and co-construction activities was successfully held

10 Years of Connectivity!

What is hidden behind Qi Wei's body fat rate of 14% and rapid weight loss?