laitimes

Academic sharing丨Combined with the large language model Chatgpt, the tomato picking end of the robot was designed

author:Chinese Society of Artificial Intelligence

Nature Machine Intelligence's latest research work, combined with Large Language Model (LLM) ChatGPT, guides the robot design process at a conceptual and technical level, proposing a new human-AI collaborative design strategy.

Academic sharing丨Combined with the large language model Chatgpt, the tomato picking end of the robot was designed

https://www.nature.com/articles/s42256-023-00669-7#Sec1

Large language models will fundamentally change the robot landscape by providing robots with unprecedented ability to understand and analyze natural language. The main advantage of LLMs is their ability to process and internalize large amounts of textual data, such as instructions, technical manuals, and academic articles, and use this implicit knowledge to answer questions truthfully and coherently. The potential to harness these powerful AI tools in robotics has been demonstrated by their ability to synthesize code from text prompts. Research efforts such as Microsoft and Google have translated natural language instructions into actions that robots can perform.

However, recent improvements in the availability and functionality of LLM have opened up new opportunities that can now lead to another bottleneck in robot design. Leveraging its emerging capabilities, LLM can provide a conversation, educating and guiding humans to build robots. These capabilities can fundamentally change the way we design robots, changing the role of humans and enriching and simplifying the design process. So, how is LLM changing the robot design process, and what are the associated opportunities and risks?

To explore this question, a case study of humans driven by the desire to "help the world with robots" is considered, presenting a robot designed with ChatGPT-3. This task is accomplished in two steps. In the first advanced stage, computers and humans collaborate at a conceptual level, discussing ideas and outlining specifications for the robot design, while in the second stage, there is a physical implementation.

In the ideation phase, humans first ask LLM what the future challenges of humanity are and quickly get an overview of the main hazards, as shown in the figure below. Next, humans choose the most interesting and promising directions, narrowing the design space through further dialogue. This interaction can span multiple knowledge domains and levels of abstraction, from concept to technical implementation. In the process, humans rely on AI partners to gain knowledge beyond their individual expertise. AI models help humans explore the intersection between research areas such as agriculture and robotics, and consider factors that are not typical of an engineer's training, such as which crop is most economically valuable. Through dialogue, applications are selected, LLM and humans converge to technical design specifications, including software, material sections, mechanism design, and manufacturing methods.

Academic sharing丨Combined with the large language model Chatgpt, the tomato picking end of the robot was designed

Figure 1 A picture overview of the discussion between the human designer and LLM, with the questions asked by humans above and the options offered by LLM below. Green highlights the human decision tree, where humans gradually focus their problems on their goals

In the second lower stage of the design process, these directions need to be translated into physical and functional robots. While LLM is currently unable to generate entire CAD models, evaluate code, or automate manufacturing robots, recent advances have shown that AI algorithms can support the technical implementation of software. While we expect AI approaches to produce these in the future, for now, the technical enablement remains a collaborative effort between AI models and humans. Humans play the role of "technicians", optimizing the code proposed by LLM, completing CAD and manufacturing robots. The robot can then be tested in a real-world scenario, and further dialogue with LLM can be performed based on experimental evidence to iterate on the design. As an example of the second phase, Figure 2 shows the main output generated by LLM and the actual deployment of an AI-designed robotic gripper for crop harvesting.

Academic sharing丨Combined with the large language model Chatgpt, the tomato picking end of the robot was designed

Figure 2 a.Some technical recommendations produced by LLM, including shape indication, code, component and material selection, and mechanism design. Guided by these inputs, a gripper was built and tested on a practical task such as tomato picking, as shown in the figure on the right.

This case study demonstrates the potential of LLM to transform the design process and how the human-AI relationship may need to change depending on an individual's expertise, stages of the design process, and the end goal. By properly combining multiple approaches to human-AI collaboration, the design process can be enhanced and simplified.

At one extreme of human-AI interaction, LLM can provide all the inputs needed for robot design, and humans blindly follow those inputs. AI is then the inventor, solving human problems and providing "creativity," technical knowledge, and expertise, while humans handle the implementation of the technology. This can facilitate the transfer and democratization of knowledge by enabling non-specialists to implement robotic systems. For the first time, AI agents not only solve human-specified technical problems, but also present conceptual options to humans. In this sense, LLM acts as a researcher, harnessing knowledge and finding interdisciplinary connections, while humans act as managers, providing direction for design.

A gentler but more powerful approach is collaborative exploration between LLM-HUMAN, enhancing human expertise by leveraging LLM's ability to provide interdisciplinary, broad knowledge. In the second model, the role of LLM is to support humans to effectively gather knowledge from areas outside of personal experience, enriching conceptual processes. This has great potential for an inherently interdisciplinary field of research like robotics. By increasing human knowledge, this approach removes the limitations of human education and supports humans in finding relevant connections between fields, making interdisciplinary research more accessible. However, the knowledge provided by LLM can be narrow and error-prone. For areas far from the expertise of engineers, they may not be able to fact-check the validity of AI-generated answers. This risk is shown in Figure 3. By providing only a small portion of insight or window into a large and complex topic, interaction with LLM can lead to misunderstandings and oversimplifications, ultimately creating errors in the design and biases in the field.

Academic sharing丨Combined with the large language model Chatgpt, the tomato picking end of the robot was designed

Figure 3 on the left shows the two phases of the design process: first humans and LLMs discuss specific applications and designs, and then humans implement them. The figure on the right is an illustration of knowledge during the high-level discussion. With LLM, human designers can effectively access areas of knowledge beyond their personal expertise and link different areas of those fields through questions. However, this comes at the cost of accepting incorrect inputs that are far removed from the designer's knowledge. While in the traditional learning process, designers radially expand their personal experience, while through LLM-based exploration, designers only have access to limited areas of knowledge, thus risking misunderstandings.

Finally, we can consider a third approach, where LLM helps refine the design process and provides technical input, while humans remain the inventors or scientists involved in the process. AI can help with debugging, troubleshooting, and method selection, speeding up tedious and time-consuming processes. In this AI-human relationship, human knowledge and intuition regulate the discussion, and humans work within their professional confines in order to be critical of answers and suggestions.

Robot design is a creative, interdisciplinary and intellectual property (IP) creation process that currently relies on highly skilled professionals. We believe that careful integration of these approaches could revolutionize this process. However, the introduction of LLM into robot design introduces questions about potential negative effects. LLM must be seen as an evolution of search engines, generating the "most likely" answer for a given prompt. These answers may be incorrect, LLM output can be misleading if not properly fact-checked or verified, or in the worst case, potentially dangerous. However, unlike search engines, LLM can suggest ways to integrate "knowledge" and apply it to unseen problems, potentially giving the false impression that new knowledge is being generated. This may prevent humans from taking responsibility for the solutions developed. This could ban and stall advances in new robotics and design. Another problem with the widespread use of LLM in robot design is the statistically preferred solution of the model, which may hinder the exploration of new technological solutions.

Finally, there are the key issues related to plagiarism, traceability and intellectual property. Can the design created through LLM be considered novel because it is based only on prior knowledge, and how can this prior knowledge be referenced? As the technology matures, there are longer-term considerations, including data privacy, frequency of retraining, and how to incorporate new knowledge to keep the tool usable and relevant. Human-AI interactions in robot designs also have significant social and ethical implications. If LLM is used to automate advanced cognitive design tasks, then humans may take on more technical work. This can redefine the skills required of engineers and their role in the economy and society.

In summary, the robotics community must determine how to leverage these powerful tools to accelerate the development of robots in an ethical, sustainable, and socially empowering way. We must develop methods that acknowledge the use of LLM and be able to trace the lineage of LLM-aided design.

Looking ahead, we strongly believe that LLM will open up many exciting possibilities that, if managed properly, can become a force for good. By combining collaborative LLM to ask and answer questions, the design process can be fully automated, with one helping to refine the other. This approach can also be enhanced by automated manufacturing to allow fully autonomous pipelines to create customized and optimized robotic systems. Ultimately, the future of the field lies in the open question of how these tools can be harnessed to help robotics developers without limiting the creativity, innovation, and scientific efforts needed for robots to meet the challenges of the twenty-first century.

This paper was contributed by the CAAI Cognitive Systems and Information Processing Committee

Read on