Using the python programming language, build a large language model to define knowledge for the robot

Before reading it, please click "Follow", which is convenient for you to discuss and share, and can bring you a different sense of participation, thank you for your support.

Wen 丨 Spit dissatisfied phlegm entertainment

Editor丨Spit dissatisfied phlegm

preface

Task planning may require defining myriad domain knowledge about the world that the robot needs to act on, and to improve this work, large language models can be used to score potential next actions during mission planning, or even generate sequences of actions directly without natural language instructions for additional domain information.

Such methods either require enumerating all possible next steps for scoring, or generate free-form text that may contain actions that a given bot would not be possible in the current context, proposing a procedural, large-scale language model prompt structure that can generate plans across contextual environments and bot capabilities as well as tasks.

Situational awareness is introduced in robot-based task planning

Daily housework requires both a common-sense understanding of the world and a situational understanding of the current environment, and in order to make a task plan for making dinner, the agent needs object availability, common sense and logical sequence of actions, and task correlation of objects and actions, and this reasoning is not feasible without state feedback.

Autoregressive large language models trained on large corpus can generate text sequences conditional on input prompts, with significant multitask generalization capabilities, which can be used to generate reasonable action plans in the context of robot task planning, by scoring subsequent steps or directly generating new steps.

Evaluate a series of actions and their arguments from the possible space in scoring mode to list possible actions, in text generation mode a large language model can generate the next few words and then need to map them to the actions and world objects available to the agent, if the generation reaches for a pickle jar, the string must be subtly mapped to an executable action such as picking up the jar.

A key component missing from large-scale language model-based task planning is state feedback from the environment, introducing situational awareness into robot-based task planning.

A prompt scheme that goes beyond natural language conditions is introduced, leveraging programming language constructs and the fact that training on a vast web corpus containing many programming tutorials and code documentation provides a large language model with import statements and lists of environment objects for available operations and their expected parameters, as well as functions that define a series of actions whose body is acting on an object.

Integrating situational state feedback from the environment by asserting prerequisites for the plan, responding to failed assertions by resuming actions, and including natural language comments in the program to explain the goals of the upcoming action can improve the task success rate of the generated scheduler.

Create a large language model

Represent the robot plan as a Python program, follow the large language model prompt example to create a prompt with Python code structure and use the large language model to complete the code, use the functions provided in Python to build the prompt, and guide the generation of a positioning robot task plan conditional on natural language instructions.

Comments help decompose high-level tasks into logical subtasks and subtasks using annotations to provide natural language summaries for subsequent sequences of actions, and annotations to decompose tasks into subtasks, a division that helps humans express their knowledge about tasks and subtasks in natural language and assistance programs.

The commentary also provides humans with information on near-term goals, reduces the likelihood of incoherent or repetitive output, also demonstrates the efficacy of similar intermediate summaries called thought chains in improving human performance on a range of arithmetic and symbolic reasoning tasks, and asserts that provide an environmental feedback mechanism that encourages the preconditions to be met and allows for the possibility of error recovery if they are not.

Humans are provided with examples of example tasks and plans by providing humans with information about the environment and primitive actions through instant build, which receives all the information and generates a Python prompt for a large language model to complete.

In order to inform large language models of the agent's operating primitives to provide them as Python import statements, these encourage humans to limit their output to functions that are only available in the current context to change the proxy, only a new list of imported functions representing the agent's operations is required.

The objects available in the environment are provided as a list of strings, and because the prompt scheme explicitly lists the set of functions and objects available to the model, the resulting plan typically contains the actions that the agent can take and the objects available in the environment.

Each sample task demonstrates how to accomplish a given task using the actions and objects available in a given environment, the relationship between the task name given as a function handle and the action to be taken, and the restrictions on the operations and objects involved.

The given task is fully inferred by the large language model based on the prompt, and the resulting plan is executed on a virtual agent or physical robot system using an interpreter that executes each action command against the environment, asserting that the check is done in a closed loop during execution to provide feedback on the current state of the environment.

Evaluated through virtual home environments and physical robotic manipulators

When the generated program is executed, it responds to the assertion in conjunction with environmental state feedback, provides observations with object properties and relationships in the form of a state diagram, extracts information about related objects from the state diagram to check the assertions in this environment, and prompts the large language model to return the assertion and gives the state diagram and assertion in the form of a text prompt.

Using a robot with a parallel jaw fixture, use a pick-and-place strategy that takes the target object and two point clouds of the target container as input and performs pick and place operations to place the object on or inside the container to avoid collisions and generate a gripping pose.

Specifying a single import statement uses the open-vocabulary object detection model ViLD to identify and segment objects in the scene and build a prompt list of available objects, unlike the list of objects in a virtual environment, which is a global variable common to all tasks, where the object list is local to each planning function, which allows more flexibility to accommodate new objects.

Using ViLD segmentation masks and text strings mapped to depth images to point clouds, assertion-based closed-loop options are not implemented on desktop plans due to real-world uncertainty.

Proving large language models is an effective way to generate virtual and physical agents

Using GPT3 as the language model backbone to receive language model prompts and generate plans, the change in performance during operation stems from sampling the output of large language models, and also contains the results of recent GPT4 backbones, unlike GPT3 language models, GPT4 is a chatbot model trained by human feedback reinforcement learning to act as a useful digital assistant.

Instead of simply autocompleting the code in the prompt, GPT4 interprets the user prompt as a question and generates the answer as an assistant.

Continuing to use GPT3 as the main large language model backbone in ablation experiments, the suggestion to the community is to use program-like prompts for task planning and execution based on large language models, the basic GPT3 works well and the large language model can be better with further fine-tuning of the programming language data.

The feedback mechanism in the sample program, i.e., assertion and resume operations, can improve the performance of individual indicators, with the exception of slightly improved in the absence of feedback when there are no comments In the prompt sample code, removing comments from the hint code greatly reduces the performance of all indicators, highlighting the usefulness of natural language guidance in programming language constructs.

Build prompts based on natural language text descriptions of available objects and sample task plans, fine-tune GPT2 to learn the strategy to map these generated sequences to executable actions in the simulation environment, use tasks in the training set and annotate text steps and corresponding sequence of actions to obtain data points for training and validating the policy.

While this approach has reasonable partial success, it does not match program executable and does not produce any fully successful task executions.

The task-by-task performance test sets the performance of each task, and tasks similar to the hint example have a higher GCR because the basic fact hint example hints at good stopping points because some tasks have multiple appropriate target states but are only evaluated against a single real target.

Common failure modes are characterized by decisions that make the program independent of the deployment environment and its characteristics, which can be addressed through explicit communication, and the inability of VH agents to find or interact with nearby objects while seated and not providing some common-sense actions against objects in VH.

The assertions generated when an object is inaccessible may not be enough, the operation success feedback will not be provided to the agent, which may cause subsequent operations to fail, the assertion recovery module in the plan can help but does not cover all possibilities at the time of generation, and some plans are shortened by large language model API caps.

In addition to these failure modes, the final state check means that failure may be inferred if the agent completes the task, because the environmental target state will not match the precomputed ground real final target state, likewise some task descriptions are ambiguous and have multiple plausible correct procedures.

Although the reasoning ability of humans in the current state is impressive, the proposed method does not make any claims that provide guarantees, and although it effectively prevents large language models from generating actions or objects that are not available, there is still the possibility of hallucinations depending on the generative quality and reasoning ability of large language models.

All results displayed are annotated but not with feedback, physical robot settings do not allow reliable tracking of system status and checking assertions and are prone to random failures due to situations such as gripping slippage, and the introduction of randomness in the real world complicates quantitative comparisons between systems.

It is hoped that the physical results can qualitatively prove that the cue method can easily constrain and base the large language model generation into the physical robot system, in a variety of tasks with or without interference object system almost always succeeds only on the sequencing task and fails without the operation of the jammer due to random fixture failure.

epilogue

A human cue scheme for robot task planning is proposed, which brings together the advantages of human common sense reasoning and code understanding, and constructs prompts including contextual understanding of the world and robot capabilities that enable humans to directly generate executable plans as programs.

As a community only scratches the surface of task planning, that is, the generation and completion of robot plans, and hopes to study the wider use of programming language functions, humans can do arithmetic and understand numbers, but its ability to generate complex robot behavior is still relatively underdeveloped.

The article description process and pictures are all from the Internet, this article aims to advocate positive social energy, no vulgar and other bad guidance. If copyright or character infringement is involved, please contact us in time, we will delete the content as soon as possible! If there is any doubt about the incident, delete or change it immediately after contact.

Using the python programming language, build a large language model to define knowledge for the robot