laitimes

With generative AI, robots will also have an "iPhone moment"

author:The world of electronic engineering

The recently popular low-cost robot system Mobile ALOHA has become popular, which has once again refreshed the attention to robots and made the market have more expectations for robots.

Robotics is a highly comprehensive field that covers many disciplines such as mechanics, electronics, computers, and perception. This interdisciplinary nature makes the development of robotics technology require the joint progress of technology in multiple fields in order to achieve an overall breakthrough. As a result, the development of robotics may be constrained by technical bottlenecks in one of these areas.

However, in recent years, with the development of information technology, the speed of interdisciplinary introduction of robots is getting faster and faster. For example, image recognition, visual processing, speech recognition and other technologies have been rapidly adopted by the robot industry.

In 2023, large language models (LLMs) are undoubtedly the most dazzling technology, and the process of porting LLMs from the cloud to the edge is also accelerating, with AIPC and AI mobile phones already appearing. Now, the embedded industry is ushering in a new era of AI.

Deepu Talla, vice president of embedded and edge computing at NVIDIA, recently gave a presentation at CES about the convergence of AI and robotics.

Talla predicts that the impact of generative AI will extend beyond text and image generation into homes and offices, farms and factories, hospitals, and labs. The key is that large language models (LLMs), which resemble the language centers of the human brain, enable robots to understand and respond to human instructions more naturally.

"AI-driven autonomous robots are increasingly being used to improve efficiency, reduce costs, and address labor shortages. Talla said.

With generative AI, robots will also have an "iPhone moment"

N

What generative AI can bring to the robotics industry

Generative AI will be a game-changer for the robotics industry, and this natural interaction will make robots easier to use, more efficient, and more trustworthy.

Boston Dynamics has installed ChatGPT on the robot dog to support all kinds of human-computer interactions, act as a tour guide, and guide guests through the company's facilities.

With generative AI, robots will also have an "iPhone moment"

Boston Dynamics' robot dog

Collaborative Robotics is developing a collaborative robot that is designed to operate around humans. The company says its system is designed to automate the task of moving items around places such as warehouses, capable of moving boxes, bags and trolleys. Many companies have adopted robots within their logistics facilities to move goods automatically. However, the more complex parts of the task still require human intervention. Collaborative Robotics says it's designing cobots that can handle these tasks "end-to-end" without human intervention. One of the key indicators for this is the ability to leverage LLMs for semantic understanding.

Peter Chen, founder of Covariant, an AI-powered picking robot, published an article last year titled "The GPT Moment for AI Robots is Coming", in which Chen pointed out that "the core technology that enables GPT to see, think, and even speak also enables machines to see, think, and act." Robots driven by the underlying model can understand their physical environment, make informed decisions, and adapt their behavior to the changing environment. ”

"Robot GPT" is built in the same way as GPT – laying the groundwork for a revolution that will once again redefine artificial intelligence as we know it.

The Phoenix humanoid robot developed by Sanctuary Cognitive is special not only for its capabilities, but also for its cognitive abilities. Equipped with a comprehensive cognitive architecture and software designed specifically for humanoids, the robot is able to understand natural language commands and perform actions based on them, similar to how human employees follow verbal instructions. Phoenix's cognitive architecture includes reasoning, tasks, and actions that ensure full transparency and accountability in the decision-making process. It combines symbolic and logical reasoning and employs large language models, including OpenAI LP's ChatGPT, to provide a wide range of general-purpose knowledge and domain-specific knowledge. Relying on deep Xi and strong chemical Xi techniques, Phoenix can exhibit autonomous behaviors and goal-seeking behaviors. Deep chemistry Xi enables robots to extract patterns from data, while strong chemistry Xi allows robots to Xi learn the best strategies to perform different tasks through trial and error.

Unitree also launched Unitree Go2, a new quadruped robot based on the empowerment of large model GPT, last year.

Agility Robotics, NTT, and other companies are incorporating generative AI into their robots to help understand text or voice commands. Dreame's robot vacuum cleaner is being trained in a simulated living space created by generative AI models. Electric Sheep is developing an autonomous lawn mower that uses generative AI.

"It's all a no-brainer, with a growing number of partners using GPU-accelerated large language models to bring unprecedented intelligence and adaptability to machines of all kinds," Talla said. ”

NVIDIA accelerates the robotics industry to leverage generative AI

NVIDIA technologies, such as the NVIDIA Isaac and Jetson platforms, power the development and deployment of AI robots and are relied upon by more than 1.2 million developers and 10,000 customers and partners.

Many of these companies participated in this week's CES, including Analog Devices, Aurora Labs, Canonical, Dreame Innovative Technologies, DriveU, e-con Systems, Ecotron, Enchanted Tools, GlüxKind, Hesai Technology, Leopard Imaging, Nine Company (Weilan Continental (Beijing) Technology Co., Ltd.), Nodar, Obi Zhongguang, QT Group, Suteng, Spartan Radar, TDK, Telit, Unitree, Voyant Photonics, and Yijing Technology, among others.

In her presentation, Talla showcased the dual-computer models necessary to deploy AI into robotics (below), demonstrating NVIDIA's comprehensive approach to AI development and applications.

With generative AI, robots will also have an "iPhone moment"

The first computers, known as the "AI Factory," were at the heart of creating and continuously improving AI models.

The AI Factory uses NVIDIA data center computing infrastructure, as well as NVIDIA AI and NVIDIA Omniverse platforms, to simulate and train AI models.

The second computer represents the environment in which the robot runs.

The operating environment can vary depending on the application, such as the cloud or data center, a local server for tasks such as defect detection in semiconductor manufacturing, or an autonomous machine with multiple sensors and cameras.

Talla also highlighted the role of LLMs in breaking down technical barriers. LLMs can turn ordinary users into technical artists, capable of creating complex robotic work cells or entire warehouse simulations.

With generative AI tools like NVIDIA Picasso, users can generate photorealistic 3D assets based on simple text prompts and add them to digital scenes for a dynamic, comprehensive robot training environment.

This capability can also be extended to create diverse, physics-based scenarios in Omniverse to enhance the testing and training of bots and ensure their real-world suitability.

This coincides with the transformative potential of generative AI to reimagine the way bots are deployed.

Previous robots were purpose-built for specific tasks, and modifying robots for different tasks was time-consuming.

Talla also explains that advances in the field of LLMs and visual language models are removing this bottleneck, allowing us to interact more intuitively with bots through natural language. This adaptable, perceptual machine will soon be available all over the world.

"When testing or training a bot, the diversity of the environment is critical to ensuring that the bot can be generalized to the real world, and ChatGPT-like tools allow users to create thousands of accurate bot scenarios in minutes instead of days. ”

Write at the end

Last October, NVIDIA unveiled an AI system called Eureka, which is based on OpenAI's GPT-4 and is capable of allowing robots to perform more than 30 complex actions such as "spinning a pen", "opening a drawer", "holding scissors", and "passing the ball to each other with both hands". Eureka's reward program reportedly promotes the Xi of trial and error learning by robots, surpassing human-written reward programs in more than 80% of tasks. According to the NVIDIA team, this has improved the robot's performance by more than 50%. These results are due to AI agents utilizing OpenAI's GPT-4 and generative AI to write software code that rewards bots during intensive Xi.

ChatGPT proves that large models can enable computers to understand and express human thinking and judgment, and that large models can transform the robotics industry, including development and user experience.

At this year's CES, we also saw many examples of the convergence of generative AI and robots, and the "iPhone" era of robots may be at this time.

"This adaptable, perceptual machine will soon be available all over the world. Talla said.

Read on