laitimes

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

  Shin Ji Won reports  

Edit: LRS

ChatGPT not only moves your mouth, but also helps you control a drone!

Although ChatGPT has been tuned to conform to human preferences, it can still force some "unethical content" under various reverse operations, such as ChatGPT can give you a detailed list of plans to destroy the world, specific to each step.

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

However, the current ChatGPT only moves its lips, and does not have any ability to contact the real physical world, at most it is a science fiction novel.

But what if ChatGPT could really control bots?

Recently, Microsoft published a paper announcing their research on applying ChatGPT to robots.

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

Paper Link:

https://www.microsoft.com/en-us/research/uploads/prod/2023/02/ChatGPT___Robotics.pdf

However, Microsoft's goal is not to "destroy the world", but to speed up the development of robots.

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

In fact, in the modern life and production process, robots are inseparable everywhere, from the robotic arm that manufactures products in the factory to the vacuum cleaner used in the home, which can be counted as a robot.

Every time you want to develop a new product, or have an existing machine execute a new feature, you need a senior engineer to write code, while writing tests to cover all scenarios as much as possible.

In its paper, Microsoft proposes a new set of design principles that use large language models such as ChatGPT to provide instructions to robots.

ChatGPT: Robot controller

The fundamental reason why ChatGPT exploded is that AI can finally "understand people" to a certain extent, rather than just generating content according to syntax; And it's also powerful, Q&A, writing essays, writing poems, writing code, as long as the prompt is well written, ChatGPT's performance will be even more amazing.

If this ability is transferred to robots, assuming that in a few decades, every household will have a robot, just say "warm me lunch", it can find the microwave on its own, and then bring the dish back, and human-computer interaction directly enters the new era.

Although "natural language" is concise, existing robot development still relies on "programming languages".

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

ChatGPT is a language model trained on a large amount of text and human feedback to produce coherent and grammatically correct responses to a wide variety of prompts and questions.

The goal of this study was to see if ChatGPT was able to think beyond text and reason out the physical world to help the robot complete its task.

The researchers expect ChatGPT to help users interact with bots more easily without having to learn the details of complex programming languages or robotic systems, and the key challenge is to teach ChatGPT how to use the laws of physics, the context of the operating environment, and understand how the robot's physical behavior changes the state of the world to solve assigned tasks.

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

Experiments have shown that ChatGPT can do a lot of work independently, but it still needs some assistance, and the paper describes a series of design principles that can be used to guide language models to solve robot tasks, including but not limited to special prompt structures, high-level APIs and text-based human feedback, etc., a revolution in the development of robot systems is coming.

A new code design process

Writing prompts for large language models is a highly empirical science, and through trial and error, researchers have developed a set of methodologies and design principles specifically for writing prompts for robotic tasks:

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

1. Define a set of high-level bot APIs or libraries.

This library can be designed for a specific robot type and should be mapped from the robot's control stack or perception library to existing low-level concrete implementations.

Descriptive names used for high-level APIs are important to help ChatGPT infer the functionality of the function.

2. Write a text prompt for ChatGPT that describes the task goal and explicitly states which functions in the high-level library are available.

The prompt can also contain information about task constraints, or how ChatGPT should organize its answers, including using a specific programming language, using auxiliary resolution components, etc.;

3. The user evaluates the code output of ChatGPT in a loop, can execute the code directly to check the correctness, or use the emulator.

Users can use natural language to provide ChatGPT with feedback on the quality and safety of answers if needed.

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

When the user is satisfied with the solution, the final code can be deployed to the bot.

What can ChatGPT+ bots do?

Here are a few examples, see the repository for a more complete list of ChatGPT capabilities.

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

Code link: https://github.com/microsoft/PromptCraft-Robotics

Zero-shot mission planning

When ChatGPT meets a drone, the researchers first let ChatGPT control the full function of a real drone, and then follow the dialogue in the video below, and the experimental results prove that a user who does not understand technology at all, only needs to control the drone through dialogue, "natural language" is a very intuitive and efficient user interface.

When the user's instructions are ambiguous, ChatGPT asks the user to explain the problem further and writes complex code structures for the drone, such as zig-zag patterns, in order to visually inspect the shelves; You can even give users a selfie.

The researchers also simulated industrial inspection scenarios using ChatGPT in the Microsoft AirSim simulator, and the results showed that the model was able to effectively parse the user's high-level intentions and geometric clues to accurately control the drone.

Complex tasks require user feedback

When using ChatGPT for robotic arm operation scenarios, the researchers used "conversational feedback" to teach the model how to combine the initially provided API into more complex high-level functions, namely functions encoded internally by ChatGPT itself.

Using a curriculum-based strategy, ChatGPT is able to logically link these learned skills together to perform actions such as stacking blocks.

In addition, in one example of ChatGPT's power, the researchers asked the model to build the Microsoft logo from wooden blocks, i.e. to connect the text domain with the physical domain.

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

ChatGPT was able to recall the Microsoft logo from an internal knowledge base, but it was also able to "draw" the logo with SVG code, and then use the skills learned above to determine the physical form in which existing robot actions could compose it.

In another example, researchers asked ChatGPT to write an algorithm that allowed the drone to reach its target in the air without hitting an obstacle.

Just tell the model that the drone it controls has a forward distance sensor, and ChatGPT is immediately able to encode most of the key components for the algorithm, a task that requires some dialogue with humans, and ChatGPT can make local code modifications through natural language feedback alone.

Perceive the world before you act

Being able to perceive the world before an algorithm decides to do something is fundamental to building a robotic system.

To test ChatGPT's understanding of specified concepts, the researchers designed a framework that required ChatGPT to continuously explore the environment, allowing the model to access functions such as object detection and target distance APIs until it found a user-specified object, a process called perception-action loops.

In the experimental session, the researchers conducted additional experiments to evaluate whether ChatGPT could decide where the robot should go based on real-time feedback from sensors, rather than having ChatGPT generate a code loop to make those decisions.

The experimental results verify that the user can enter a text description of a camera image at each step of the conversation, and the model can figure out how to control the robot and drive the robot to a specific object.

Open Source PromptCraft: Collect valuable prompts

"Good prompting engineering" is essential for large language models such as ChatGPT to successfully perform robotic tasks.

But prompting is an entirely empirical science, lacks a comprehensive summary, and has few resources to help researchers and enthusiasts in the field determine what makes a good prompt

To compensate for this disadvantage, the researchers open-sourced a platform, PromptCraft, on which any user can share examples of prompt strategies for different robot categories.

All tips and conversations for this research project have been placed in the repository, and interested readers can continue to contribute!

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

In addition to the rapid design, the researchers plan to develop multiple robot simulators and interfaces in the future to allow users to test the performance of algorithms generated by ChatGPT, and an AirSim environment with integrated ChatGPT has been released.

Take the robot out of the lab and into the world

Microsoft's goal in releasing these technologies is to bring robotics to a wider audience, and researchers believe that language-based robot control systems are the foundation for bringing robots from science labs to everyday users.

That said, the output of ChatGPT should not be deployed directly on the bot without careful analysis.

By obtaining experimental results in a simulated environment, it is possible to evaluate algorithms and take necessary safety precautions before future real-world deployment.

Resources:

https://www.microsoft.com/en-us/research/group/autonomous-systems-group-robotics/articles/chatgpt-for-robotics/

"Terminator" into reality? Microsoft's ambition: control bots with ChatGPT!

Read on