Build a safe shield for the big model! Relais Smart RealSafe 3.0 released

Author: Qian Lifu Source: IT Times

"Hundred Model War", the war is raging!

At the just-concluded 2023 World Artificial Intelligence Conference, "big model" is undoubtedly the top "Internet celebrity", and many companies have displayed general large models, large models in the industrial field, and professional large models, etc., to discuss how to accelerate the commercialization of large models.

However, with the rapid development of artificial intelligence and large model technology, security is a problem that has to be considered.

A few weeks before the 2023 World Artificial Intelligence Conference, Musk once again said: "Artificial intelligence is a risk to the public and must be regulated." On July 5, OpenAI posted a blog on its official website, saying that it plans to invest more resources and set up a new research team to study how to ensure the safety of artificial intelligence for humans, and eventually realize the use of artificial intelligence to supervise artificial intelligence, and proposed a new concept of "automatic alignment of researchers".

Coincidentally, the reporter learned from the 2023 WAIC World Artificial Intelligence Conference that a domestic security general artificial intelligence company called RealAI has landed the "automatic alignment researcher" artificial intelligence safety improvement method.

On July 7, Relais Intelligent RealAI, an incubator of the Institute of Artificial Intelligence of Tsinghua University, released a new artificial intelligence security platform RealSafe 3.0 at the 2023WAIC conference, which has similar functions to the "automatic alignment researcher" proposed by OpenAI, optimizing large models through automated training methods, in order to build a security shield for mankind against artificial intelligence threats while accelerating the empowerment of human society by general artificial intelligence.

Xiao Zihao, co-founder of Relais Intelligent RealAI and algorithm scientist

The "double-edged sword" restricts the landing of large models

As with all general-purpose technologies, from the moment AI was born, there was an asymmetry between the power to create technology and the power to control it. New technologies inevitably bring new security issues, which is the two-sided nature of technology. The same is true of large models, although its power has given mankind a glimpse of the dawn of general artificial intelligence, but it has also made many academics and industry people worried. Not long ago, more than 400 experts around the world issued a joint open letter, warning that the rapid development of artificial intelligence and the lack of supervision may endanger human survival.

Their concerns are not alarmist. Recently, there have been many security risks about large models, such as: confidential documents are leaked, models give completely opposite answers after adding meaningless characters, output illegal and harmful content, implicit prejudice and discrimination against certain human communities, and so on.

The risks brought by this emerging technology have attracted great attention from countries around the world. On April 11, the Cyberspace Administration of China drafted the Measures for the Administration of Generative Artificial Intelligence Services (Draft for Comments) for public comments; The European Union voted to pass the Artificial Intelligence Bill on June 14, with the hope that laws and regulations will lead the development of technology for better and better.

Xiao Zihao, co-founder of Relais Wisdom and algorithm scientist, believes that the essence of the "difficulty in landing" of large models lies in the fact that it is still in the stage of "barbaric growth", and has not yet found a balance between scenarios, risks and norms. In the process of exploring this balance, there is a lack of easy-to-use and standardized tools, that is, there is a lack of a strong grip at the technical level, which can scientifically evaluate whether the large model can meet the specifications and low risk in the scene, and can further locate the problem and give optimization suggestions to help the model run online.

Defeat "magic" with "magic"

At the 2023 World Artificial Intelligence Conference, Relais officially released version 3.0 of the artificial intelligence security platform RealSafe. It integrates mainstream and RealAI's unique world-leading security evaluation technology, which can provide an end-to-end model security evaluation solution and solve the pain point that the current general large model security risks are difficult to audit.

Compared with the previous version, RealSafe 3.0 has added the evaluation of general large models, in the evaluation dimension, it has covered nearly 70 evaluation dimensions such as data security, cognitive tasks, vulnerabilities unique to general models, abuse scenarios, etc., comprehensively and multi-dimensionally evaluate the performance of general large models, and will continue to expand the number of evaluation dimensions in the future.

"Evaluation is just a means, and helping GM models improve their own security is the core purpose." Xiao Zihao said that the fear of being eaten back by technology should not stop because of the worry, and the creation of new technologies and the control of technological hazards should be carried out simultaneously, "Relais' wisdom's method is to find the crux of the problem from the source, and then use 'magic' to defeat 'magic'." ”

If an AI model is compared to an "engine", data is the "fuel" for the model. It can be said that the quality of the dataset directly affects the endogenous safety of the model. Therefore, RealSafe 3.0 integrates multiple self-developed models and expert demonstration high-quality data sets to help users fix problems in the model.

For the unexplained general large model of black box, the self-developed red team confrontation model replaces the manual design problem, which significantly improves the attack success rate and sample diversity. In other words, the model dataset not only contains its own dataset, but also contains the data generated by the model, which is remarkable in terms of data quality and data scale, so it can automatically dig out more vulnerabilities and truly alleviate security problems from the source. The coaching model conducts multiple rounds of question-answer training on the large model under test, scores the question and answer results with the trained scoring model, and then feeds the scoring results back to the large model, so that it continuously strengthens and learns the key points and differences between good and bad answers until the question answering ability is gradually iterated to the optimal. In addition to the customized training framework, the ideal model effect of the coaching model also benefits from the solid data foundation, and Relais Wisdom's own data set has been verified by dozens of experts in the field of values to ensure that the input data is correct, high quality and diversified fields, and will continue to be updated and supplemented in the future.

Xiao Zihao revealed: "These technologies are based on the self-developed multi-modal large model base. ”

Build a safe shield for the big model! Relais Smart RealSafe 3.0 released

RealSafe 3.0 intelligent artificial intelligence security platform RealSafe 3.0 general large model evaluation and optimization workflow

Harnessing artificial intelligence with human intelligence

In addition to RealSafe 3.0, which can improve the security of generative large models, Relais also brought DeepReal 2.0, which can prevent the malicious abuse of generative artificial intelligence. It is reported that DeepReal, previously named the deepfake content detection platform, has been officially renamed the generative artificial intelligence content detection platform, because in addition to being able to detect deepfake content, it also adds two new functional modules, which can detect the data generated by Diffusion, LLM, two new methods, and support the detection of whether images, videos, audio, and text are forged. Application scenarios include combating online fraud and reputation infringement, detecting online content compliance, and detecting the authenticity of audio and video physical evidence, which can control and govern the abuse of generative artificial intelligence technology.

Since its establishment in 2018, Relais has been committed to the research and development of safe and controllable third-generation artificial intelligence technology, establishing general AI model capabilities and artificial intelligence safety capabilities: adapting to various complex intelligent application scenarios and tasks through general artificial intelligence models; And through artificial intelligence safety capabilities, AI can truly benefit the overall interests of mankind.

Relais adheres to source innovation and underlying research, and continues to publish world-leading scientific research results in the fields of generative artificial intelligence. Since its establishment, the core members of Relais Intelligent R&D team have published hundreds of papers at top conferences in the field of artificial intelligence, won many championships in international evaluations and competitions, and obtained more than 100 authorized patents. At the same time, Relais actively promotes industry standardization and has participated in the formulation of more than 30 national standards and industry standards. In addition, Relais has carried out in-depth project cooperation with the Cyberspace Administration of China, the Ministry of Industry and Information Technology, the Ministry of Public Security and many subordinate units.

"From ancient times to the present, technology has always been a 'double-edged sword'. The era of general artificial intelligence is bound to come, and how to make artificial intelligence develop its strengths and avoid its weaknesses, and how to use human intelligence to control artificial intelligence, is a long-term topic for practitioners. Xiao Zihao said that this is also the direction that Relais Wisdom has been working towards. In the future, RealSafe 3.0 will play a strong role in ensuring that the general large model and proprietary model are safe, reliable and controllable. Relais will also continue to iterate technology and polish products to ensure that it is always invincible in this "offensive and defensive war" of artificial intelligence security, and transform the "key variable" of artificial intelligence into the "maximum increment" of high-quality development.