laitimes

Just now, the domestic autonomous agent OmBot was released

Karpathy, a former Tesla AI director who joined OpenAI this year, said at a recent developer event: AI agents represent a future of AI!

Not only him, but also the big guys and technology giants in the global AI field have shown great interest in the development of AI agents and have high expectations.

The emergence of large language models has undoubtedly brought a new imagination to the development of AI agents, so although many AI agents have not yet reached the level of fully simulating human intelligence, it still attracts global attention, because its emergence means that human beings have taken an important step towards the goal of general artificial intelligence in the future.

This new track born on the technology and application of large models, the person who rushes means that he can have a first-mover advantage.

The birth of the OmBot autonomous agent

At today's 2023 World Artificial Intelligence Conference, Lianhui Technology released the OmBot Ohm Agent, an autonomous AI agent based on large model capabilities, and launched the first batch of applications for typical scenarios.

Behind the birth of OmBot Ohm Intelligent Twin is a "long-planned plan" by the technical team of Lianhui Technology.

The company's core team comes from Carnegie Mellon University, the world's computer temple, and the laboratory's exploration of autonomous agents has begun since the 90s. In 2014, when Zhao Tiancheng, chief scientist of Lianhui Technology, was studying for his Ph.D., he had successfully developed the world's first multimodal agent platform DialPort, which allows agents (robots) from different universities to gather on one platform and let them work together to help humans complete various tasks.

The areas of expertise of these agents vary.

For example, some help you book a restaurant, some help you analyze movies, some help you deal with copywriting, and so on. With the increasing degree of intelligence, DialPort brings together more than 100 agent capabilities, provides the basic platform for agents for more than 100 academic research projects, and influences the design ideas of many mature interactive agents including AmazonAlexa.

Just now, the domestic autonomous agent OmBot was released

Preliminary exploration of autonomous agents

So, what is an autonomous agent?

Lianhui Technology gave a clear answer - an agent is a computer model that can perceive the environment, make decisions autonomously, and have short-term and long-term memory, which can imitate the working mechanism of the human brain and actively complete tasks according to task goals.

As an automatic, autonomous agent, it runs in a loop in its simplest form, generating self-directed instructions and actions with each iteration. As a result, it does not rely on humans to guide commands and is highly scalable.

Just now, the domestic autonomous agent OmBot was released

Core capabilities of autonomous agents

Cognition is the process by which an agent obtains environmental information. Raw data is transformed into a form that computers can understand and process, and 80% of human information input comes from vision.

Memory is the ability of an agent to store and extract information. These include short-term memory, which is used to store temporary information, and long-term memory, which is used to store more lasting knowledge and experience, and ultimately by memory to play a value in decisions and actions.

Thinking is the process by which an agent analyzes, reasons, and makes decisions about perception and memory. Various algorithms and techniques are used to process perceptual data and memory information to generate sound decisions and action plans. Among them, language is the core logic of our thinking.

Actions are specific actions taken by an agent based on perception, memory, and thinking outcomes. Includes control mechanisms and actuators to translate decisions into actual physical actions or other forms of output.

What were the first autonomous agents

When autonomous agents have the above 4 types of core capabilities, the first batch of agents for different industries, different needs, and different scenarios naturally came into being - video Xiaoou, document Xiaoou and AIGC Xiaoou.

Yes, Lianhui has launched not just one, but a batch of autonomous agents for the first time.

What can they do?

Video Xiaoou can become a smart store manager in the new retail scene. By combining with the camera's visual information, the Ohm model is used to intelligently identify everything that happens in the store, form a robot memory, and make autonomous decisions to prompt interactive information. Keep an eye on events worth watching in your store and prompt when necessary. By interacting with the robot, users can ask everything that has happened in the store at any time and assist in the management and operation of the store.

Just now, the domestic autonomous agent OmBot was released

Video: Xiaoou becomes a smart store manager who thinks independently

Document Xiaoou can become a learning assistant for individuals and businesses. Faced with the pain points of high learning cost and difficult query of professional knowledge in electric power, petroleum, medicine and other industries, the document question answering robot can effectively integrate professional knowledge into the vector database, store memory, form a professional robot, intelligently reply to user questions through multimodal content understanding and content generation, and give professional answers.

Just now, the domestic autonomous agent OmBot was released

Document Xiaoou helps industry whites solve professional problems

AIGC Xiaoou can become an editing assistant in media, culture, games and other industries. Through AIGC, the one-click film of media video materials is realized, and the language module completes the video content copywriting generation for the video theme, and then splits it into more detailed video shot descriptions, relying on language understanding capabilities to search, edit and generate the video library video, and finally greatly reduces the threshold of video production.

Just now, the domestic autonomous agent OmBot was released

AIGC is a one-click film

At the scene, Lianhui Technology released an industry-level smart cultural tourism base based on OmBot ohm intelligent twin and large model technology, providing rapid empowerment of typical scenarios including metaverse, AIGC, and smart assistant for the entire cultural tourism industry.

Facing the ever-changing needs of different subjects such as industries, enterprises, and individuals, OmBot Ohm agents will realize the rapid generation and evolution of personalized agents through efficient tuning, and the future autonomous agents will not be one, not a batch, but everything under the "per capita" unit.

The autonomous intelligent body will be like the monkey hair of the Monkey King, and it can be quickly realized if needed.

Ohm Big Model 3.0 is here!

Carefully experiencing the first batch of autonomous agents, it is not difficult to find that in the application process, cognition and thinking are the core capabilities of autonomous agents.

For the solution of cognition and thinking, Lianhui relies on the multimodal big model behind it.

As early as 2019, Lianhui Technology launched the ohm model 1.0 at the same time as the OpenAI CLIP model to achieve cross-modal search, followed by the ohm large model 2.0, focusing on open target recognition, realizing the transition from image and text retrieval to target understanding.

At present, Lianhui Technology has officially launched the Ohm Big Model 3.0, which directly refers to the industry's strongest performance and truly applied large model.

What leaps has been made in this Ohm Model 3.0?

Just now, the domestic autonomous agent OmBot was released

OmModel V3 is officially released

In terms of open recognition, the ohmic model supports full open recognition of labels for visual images and videos. Pre-training has included billions of high-quality graph-text matching data, including a large number of environmental background, target types, target attributes and behavioral characteristics, superimposed full-graph fine-grained level understanding, semantic matching of graphics and text, graphic question and answer and other multi-task training, so that the Ohm Model 3.0 has the guarantee of emerging capabilities.

Ohmic Model 3.0 is no longer limited to a fixed list of target types, but through semantic understanding to understand arbitrary objects in vision, and even describe the way to define goals.

Just now, the domestic autonomous agent OmBot was released

Open identification

In terms of visual question answering, private billion-level media data and IoT data are built, including drone perspectives, monitoring perspectives, etc., and through multi-task training, Ohm Model 3.0 will deeply integrate AI capabilities including natural language analysis, logical reasoning, image understanding and natural language generation. Fine-grained alignment of visual models and language models so that they can understand human instructions and answer reasonably.

In addition, the ohmic model can perform multiple rounds of dialogue reasoning after Q&A on pictures, and expand information beyond vision.

Just now, the domestic autonomous agent OmBot was released

Visual Q&A

In terms of cognitive reasoning, by continuously improving the ability of content understanding and multimodal semantic alignment of the Eumodel model, combined with the ability of the language model, the Eumodulo model can achieve reasoning based on visual cognition, and thus support the cognitive and reasoning ability required by the agent.

For example, seeing a child fall, the model can reason to immediately check the child for injuries. Seeing a small child by the window, the model can remind the child to pay attention to safety. Seeing the bottle crack and the drink knocks over, the model can alert to clean up immediately to prevent someone from slipping.

On the basis of open recognition and visual question answering, the ability of cognitive reasoning can empower agents to transform from passive recognition to active reasoning, think and make decisions, and propose corresponding intelligent solutions.

Just now, the domestic autonomous agent OmBot was released

Reasoning cognition

In terms of efficient fine-tuning, in view of the situation that traditional full-parameter fine-tuning consumes a large amount of GPU computing and storage resources, Lianhui starts from the aspects of model training and model reasoning, so that the ohmic large model can be easy to use and easy to use.

In terms of model training, Lianhui independently designs PEFT feather fine-tuning technology, which only fine-tunes a small part of the model parameters and the amount of training parameters is less than 1% compared with standard full-parameter fine-tuning, which greatly reduces computing and storage costs and achieves performance comparable to full-parameter fine-tuning. This approach can truly reduce the fine-tuning training threshold of large models and quickly adapt to the training needs of users' long-tail scenarios.

Just now, the domestic autonomous agent OmBot was released

Less than 1% of training parameters

In terms of model inference, Lianhui launched the Hydra Hydra Deployment Architecture, an inference running system for multimodal large models, which deploys the snake body through a multi-card cluster, consisting of multiple common base models, and each algorithm task only needs to deploy featherweight snakehead models to achieve MaaS architecture. During inference, the snake head model can be combined with any public snake body model to produce recognition results, and the new algorithm task only needs to add the featherweight snake head model. This realizes the efficient utilization of GPU cluster resources and breaks through the upper limit of video memory resources deployed by algorithm tasks.

Just now, the domestic autonomous agent OmBot was released

Hydra deployment architecture

As a mature large model, the ohm model has good performance while still evolving itself. The R&D team has built a complete human instruction learning evolution system in the loop.

Just now, the domestic autonomous agent OmBot was released

People learn evolutionary systems in loop instructions

For a new version of a large model after iterative upgrade, it is first necessary to go through the tempering of the quality department, conduct proficiency testing based on internal quantitative datasets, and then configure and test various algorithm tasks to ensure the successful upgrade of the model. After the model is actually deployed and launched, continuously track the operation of the algorithm task, record and feedback the potential defects and optimization points of the model.

Based on this, the data department uses a complete data reflow system to carry out targeted data collection, data cleaning, and instruction learning data set generation for key points such as new algorithm tasks, long-tail scenarios, and model identification defects.

After the instruction learning dataset completes the accumulation cycle, the algorithm group will perform a new version of iterative optimization training on the ohmic model based on the feedback of the quality group and the data collected by the data group, so as to improve the model's ability in business algorithms and enhance the generalization ability.

Based on the human in-loop instruction learning evolution system composed of effect evaluation, upgrade strategy, data return, and optimization and upgrading, the ohm large model can effectively learn and iteratively upgrade the base model, so as to have better performance in the existing algorithm tasks.

This also means that every few months, the Ohm model iteratively evolves to become more powerful.

Complete toolchain and engineered framework

Successful large-scale models need to be implemented at the application level with supporting tool chains and engineering frameworks.

In order to help users better and faster use of large model technology and products, Lianhui officially released the Ohm large model tool software collection, using AI-native ideas, re-imagining the development tools of AI agents, so that developers can quickly build future explosive agents!

Just now, the domestic autonomous agent OmBot was released

Ohmic large model application system

In the past few years, Lianhui Technology has built a complete toolchain platform for visual understanding scenarios. Developers can use natural language to flexibly express recognition needs, OmVision Studio, OmVision OS and other platforms and systems to improve algorithm production efficiency, effectively reduce the application threshold of artificial intelligence technology, and empower more enterprises and industries.

Just now, the domestic autonomous agent OmBot was released

OmVision application architecture

Today, Lianhui Technology released the OmBot OS operating system for agents for the first time. Based on flexible module configuration, developers can deeply integrate multimodal large models, vector databases, and human-computer interaction cognitive architectures, laying the foundation for building agents based on multimodal data perception, cognition, thinking and action.

Just now, the domestic autonomous agent OmBot was released

OmBot OS architecture

OmBot OS provides a built-in long-term memory module, which allows developers to write active thinking modules and interactive response modules, supporting responsive question answering and active recommendation thinking task scenarios. At the same time, it supports the memory reflection module, simulates the active compression and thinking process of human beings for long-term memory, extracts more high-dimensional abstract memory information from complex original memory, and makes our intelligent body more humane.

OmBot OS allows developers to deeply integrate multimodal large models, vector databases, human-computer interaction and other technologies based on flexible module configuration, laying the foundation for building intelligent twins based on multimodal data for perception, cognition, thinking and action.

Embrace the AGI era in a more open way

The perfect product matrix and human command learning evolution system have laid a solid foundation for the technological growth of Lianhui Technology, and the ability to open up to the outside world on this basis is also expected.

Dr. Zhao Tiancheng, chief scientist of Lianhui, said that we believe that in the future, everyone and every company can be blessed with AI capabilities to have better memory, cognition and decision-making capabilities, and our current technical direction is to let machines continue to align with us humans, continue to evolve, and eventually truly be used by humans.

In this process, Lianhui Technology has always been user-centric, constantly evolving capabilities, iterating products, and opening up the ecosystem, promoting the lowering of the threshold for the use of artificial intelligence, and accelerating the promotion of inclusive AI to empower thousands of industries.

At the dawn of the AGI era, the paradigm change in artificial intelligence is accelerating, and what was once a story is becoming a reality.