"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

Shin Ji Won reports

Edit: LRS

ChatGPT not only moves your mouth, but also helps you control a drone!

Although ChatGPT has been tuned to conform to human preferences, it can still force some "unethical content" under various reverse operations, such as ChatGPT can give you a detailed list of plans to destroy the world, specific to each step.

However, the current ChatGPT only moves its lips, and does not have any ability to contact the real physical world, at most it is a science fiction novel.

But what if ChatGPT could really control bots?

Recently, Microsoft published a paper announcing their research on applying ChatGPT to robots.

Paper Link:

https://www.microsoft.com/en-us/research/uploads/prod/2023/02/ChatGPT___Robotics.pdf

However, Microsoft's goal is not to "destroy the world", but to speed up the development of robots.

In fact, in the modern life and production process, robots are inseparable everywhere, from the robotic arm that manufactures products in the factory to the vacuum cleaner used in the home, which can be counted as a robot.

Every time you want to develop a new product, or have an existing machine execute a new feature, you need a senior engineer to write code, while writing tests to cover all scenarios as much as possible.

In its paper, Microsoft proposes a new set of design principles that use large language models such as ChatGPT to provide instructions to robots.

ChatGPT: Robot controller

The fundamental reason why ChatGPT exploded is that AI can finally "understand people" to a certain extent, rather than just generating content according to syntax; And it's also powerful, Q&A, writing essays, writing poems, writing code, as long as the prompt is well written, ChatGPT's performance will be even more amazing.

If this ability is transferred to robots, assuming that in a few decades, every household will have a robot, just say "warm me lunch", it can find the microwave on its own, and then bring the dish back, and human-computer interaction directly enters the new era.

Although "natural language" is concise, existing robot development still relies on "programming languages".

ChatGPT is a language model trained on a large amount of text and human feedback to produce coherent and grammatically correct responses to a wide variety of prompts and questions.

The goal of this study was to see if ChatGPT was able to think beyond text and reason out the physical world to help the robot complete its task.

The researchers expect ChatGPT to help users interact with bots more easily without having to learn the details of complex programming languages or robotic systems, and the key challenge is to teach ChatGPT how to use the laws of physics, the context of the operating environment, and understand how the robot's physical behavior changes the state of the world to solve assigned tasks.

Experiments have shown that ChatGPT can do a lot of work independently, but it still needs some assistance, and the paper describes a series of design principles that can be used to guide language models to solve robot tasks, including but not limited to special prompt structures, high-level APIs and text-based human feedback, etc., a revolution in the development of robot systems is coming.

A new code design process

Writing prompts for large language models is a highly empirical science, and through trial and error, researchers have developed a set of methodologies and design principles specifically for writing prompts for robotic tasks:

1. Define a set of high-level bot APIs or libraries.

This library can be designed for a specific robot type and should be mapped from the robot's control stack or perception library to existing low-level concrete implementations.

Descriptive names used for high-level APIs are important to help ChatGPT infer the functionality of the function.

2. Write a text prompt for ChatGPT that describes the task goal and explicitly states which functions in the high-level library are available.

The prompt can also contain information about task constraints, or how ChatGPT should organize its answers, including using a specific programming language, using auxiliary resolution components, etc.;

3. The user evaluates the code output of ChatGPT in a loop, can execute the code directly to check the correctness, or use the emulator.

Users can use natural language to provide ChatGPT with feedback on the quality and safety of answers if needed.

When the user is satisfied with the solution, the final code can be deployed to the bot.

What can ChatGPT+ bots do?

Here are a few examples, see the repository for a more complete list of ChatGPT capabilities.

Code link: https://github.com/microsoft/PromptCraft-Robotics

Zero-shot mission planning

When ChatGPT meets a drone, the researchers first let ChatGPT control the full function of a real drone, and then follow the dialogue in the video below, and the experimental results prove that a user who does not understand technology at all, only needs to control the drone through dialogue, "natural language" is a very intuitive and efficient user interface.

When the user's instructions are ambiguous, ChatGPT asks the user to explain the problem further and writes complex code structures for the drone, such as zig-zag patterns, in order to visually inspect the shelves; You can even give users a selfie.

The researchers also simulated industrial inspection scenarios using ChatGPT in the Microsoft AirSim simulator, and the results showed that the model was able to effectively parse the user's high-level intentions and geometric clues to accurately control the drone.

Complex tasks require user feedback

When using ChatGPT for robotic arm operation scenarios, the researchers used "conversational feedback" to teach the model how to combine the initially provided API into more complex high-level functions, namely functions encoded internally by ChatGPT itself.

Using a curriculum-based strategy, ChatGPT is able to logically link these learned skills together to perform actions such as stacking blocks.

In addition, in one example of ChatGPT's power, the researchers asked the model to build the Microsoft logo from wooden blocks, i.e. to connect the text domain with the physical domain.

ChatGPT was able to recall the Microsoft logo from an internal knowledge base, but it was also able to "draw" the logo with SVG code, and then use the skills learned above to determine the physical form in which existing robot actions could compose it.

In another example, researchers asked ChatGPT to write an algorithm that allowed the drone to reach its target in the air without hitting an obstacle.

Just tell the model that the drone it controls has a forward distance sensor, and ChatGPT is immediately able to encode most of the key components for the algorithm, a task that requires some dialogue with humans, and ChatGPT can make local code modifications through natural language feedback alone.

Perceive the world before you act

Being able to perceive the world before an algorithm decides to do something is fundamental to building a robotic system.

To test ChatGPT's understanding of specified concepts, the researchers designed a framework that required ChatGPT to continuously explore the environment, allowing the model to access functions such as object detection and target distance APIs until it found a user-specified object, a process called perception-action loops.

In the experimental session, the researchers conducted additional experiments to evaluate whether ChatGPT could decide where the robot should go based on real-time feedback from sensors, rather than having ChatGPT generate a code loop to make those decisions.

The experimental results verify that the user can enter a text description of a camera image at each step of the conversation, and the model can figure out how to control the robot and drive the robot to a specific object.

Open Source PromptCraft: Collect valuable prompts

"Good prompting engineering" is essential for large language models such as ChatGPT to successfully perform robotic tasks.

But prompting is an entirely empirical science, lacks a comprehensive summary, and has few resources to help researchers and enthusiasts in the field determine what makes a good prompt

To compensate for this disadvantage, the researchers open-sourced a platform, PromptCraft, on which any user can share examples of prompt strategies for different robot categories.

All tips and conversations for this research project have been placed in the repository, and interested readers can continue to contribute!

In addition to the rapid design, the researchers plan to develop multiple robot simulators and interfaces in the future to allow users to test the performance of algorithms generated by ChatGPT, and an AirSim environment with integrated ChatGPT has been released.

Take the robot out of the lab and into the world

Microsoft's goal in releasing these technologies is to bring robotics to a wider audience, and researchers believe that language-based robot control systems are the foundation for bringing robots from science labs to everyday users.

That said, the output of ChatGPT should not be deployed directly on the bot without careful analysis.

By obtaining experimental results in a simulated environment, it is possible to evaluate algorithms and take necessary safety precautions before future real-world deployment.

Resources:

https://www.microsoft.com/en-us/research/group/autonomous-systems-group-robotics/articles/chatgpt-for-robotics/

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

Read on

Microsoft Open Source Deep Speed Chat: The era of ChatGPT for everyone is here

Musk does not talk about Wude: while publicly calling for the suspension of AI research, while secretly developing an "AI version of WeChat"?

Microsoft seeks to transform its digital advertising business with ChatGPT

Ten thousand layoffs turned around and embraced AI, and Meta was going to change its name again

Microsoft Google wants to reinvent the business with AI, Musk said that AI will destroy humanity... Talk about AI

Microsoft Azure OpenAI International Edition integrates ChatGPT and other five large model services

Samsung "backstabbed" Google

Musk threatened to sue Microsoft, saying it "illegally used Twitter data for AI training."

Keep up with Microsoft! Google's generative AI Bard can program and debug code bugs too

Bing chat improvement report: Correctly display math formulas to reduce abnormal ending of conversations

Gates: AI will disrupt education, but in the short term "there will be far more failures than successes"

"Red Sky Island" debut rollover was bombarded with bad reviews, and the president of Xbox apologized

GPT-4 Windows Fried Field! The whole system is a conversational robot, and Microsoft has built an AI universe

Game information: Microsoft is determined to win and settle with Sony Nintendo for mergers and acquisitions!

Sony Hong Kong service PS+ one, two and three levels of membership officially increased the price, and the national service annual membership has risen to 309 yuan

Microsoft today officially launched the XGP Core service: replacing the Gold membership and providing a mini-game library

Humanoid robots usher in a new singularity: Tesla "paints a pie" and Boston "earns tears"

"The era of killer robots has arrived"?AI's "Oppenheimer moment"

Tesla robots enter the factory to work 24 hours a day without pay

Tesla's humanoid robot enters the factory to "work" to sort cells: relying on pure vision, it can make corrections independently

One-stop batch building of satellites and more than 700 robots to build vehicles...... A glimpse of the gigafactory from the perspective of the traverser

In-depth exploration: the multi-purpose of domestic robot protective clothing

Domestic robot protective clothing

Discuss the application field of domestic robot protective clothing

In-depth discussion: some potential problems of domestic robot protective clothing

Overhaul of domestic robot protective clothing

Prof. Zheng Liu|Qingchao Tang: Progress and Prospects of Robotic Surgical System

Efficient, accurate and all-weather: analysis of the advantages of telemarketing customer service robots

Helan County Artificial Intelligence Innovation Challenge stimulates the innovation of young people

Tesla's humanoid robot enters the factory to "work" to pick up battery cells: there will be no work in the future, are you afraid?

Instead of children, pension robots have become a new pension model, but the price is a big problem

341 minutes 326 minutes: Brunson bluntly says that Hart is crazy Boat Notes Robot Heart-piercing Little Card