laitimes

AI Constitution is coming? Google OpenAI jointly formulates, AI also has to talk about values and principles

author:Smart stuff
AI Constitution is coming? Google OpenAI jointly formulates, AI also has to talk about values and principles

Smart stuff

Compile | Jiahui

Edit | Yunpeng

According to the Financial Times, leading companies in the field of AI such as Google DeepMind, OpenAI, and Anthropic are formulating a set of values and principles that AI models can abide by to prevent AI models from being abused, which is known as the AI Constitution.

As OpenAI, Meta and other companies race to commercialize AI, AI researchers believe that it will be difficult to keep up with the pace of AI development with protective measures to prevent these AI systems from generating harmful content and misinformation. As a result, AI tech companies have created AI constitutions that try to let AI learn values and principles from it and remain self-disciplined without a lot of human intervention.

According to the Financial Times, making AI software have positive traits such as honesty, respect and tolerance has become central to the development of generative AI. But the approach to developing an AI constitution is not foolproof, it often has the subjectivity of AI engineers and computer scientists, and it is difficult to effectively assess the safety guardrails of AI.

The RLHF method and the "red team test" are key to ensuring AI safety, but their effectiveness is limited

OpenAI said that ChatGPT can now provide three aspects of seeing, listening and speaking, that is, answering users' questions with pictures and text, and using voice to talk to users. Meta also announced that it will provide an AI assistant and multiple chatbots for billions of users in messaging app WhatsApp and Instagram.

At a time when major technology companies are scrambling to develop and commercialize AI technology, according to the Financial Times, AI researchers believe that security safeguards to prevent AI systems from making mistakes have not kept pace with AI development.

In general, major technology companies rely mainly on the RLHF method (reinforcement learning method based on human feedback) to deal with the problem of AI-generated responses, which is a method of learning from human preferences.

To apply the RLHF methodology, tech companies hire large teams of contractors to review responses from their AI models and rate responses as "good" or "bad." With enough analysis and scoring, the AI model will gradually adapt to these judgments and filter out those "bad" responses when it responds later.

On the surface, the processing of the RLHF method can improve the response of the AI model, but Amodei, who previously worked at OpenAI and helped develop the RLHF method, said the method is still very primitive. He believes that the RLHF method is less accurate or targeted, and that there are many factors that affect the team's score throughout the process.

Seeing the drawbacks of the RLHF approach, some companies are experimenting with alternatives to ensure the ethics and safety of their AI systems.

AI Constitution is coming? Google OpenAI jointly formulates, AI also has to talk about values and principles

OpenAI "red team test" (Source: Financial Times)

Last year, for example, OpenAI hired 50 academics and experts to test the limits of the GPT-4 model. Over a six-month period, a team of experts from a variety of disciplines, including chemistry, nuclear weapons, law, education, and misinformation, conducted "qualitative exploration and adversarial testing" of the GPT-4 model in an attempt to break the GPT-4 model's security defenses and disrupt its system. This process is known as the "red team test". Google DeepMind and Anthropic have also used "red team tests" to find weaknesses in their software and fix them.

However, according to the Financial Times, while RLHF methods and "red team testing" are key to ensuring AI safety, they do not fully solve the problem of AI outputting harmful content.

Second, Google and other companies create AI constitutions, model rules are more clear but more subjective

Now, to address the problem that AI can output harmful content, some leading AI companies, including Google DeepMind, OpenAI, and Anthropic, are creating AI constitutions that establish a set of values and principles that their AI models can adhere to to prevent AI models from being abused. And the expectation is to achieve the goal of AI remaining self-disciplined without a lot of human intervention.

For example, researchers at Google's DeepMind published a paper defining its own set of rules for the chatbot Sparrow, designed to enable "beneficial, correct, and harmless" conversations. One of the rules requires AI to "choose the response that is least negative, insulting, harassing, or hateful."

Laura Weidinger, a senior research scientist at Google DeepMind, one of the authors of the paper, believes that the set of rules they developed is not fixed, but actually establishes a flexible mechanism in which the rules should be updated over time.

Anthropic has also released its own AI constitution. Dario Amodei, CEO and co-founder of Anthropic, said that humans don't know how to understand what's going on inside AI models, and establishing a set of constitutional charters can make the rules more transparent and clear, so that anyone using AI models will know what happens next, and if the model doesn't follow principles, humans can have a charter to argue with.

But according to the Financial Times, companies that created AI constitutions have warned that the charters of AI constitutions are still in the process of being developed and do not fully reflect the values of all people and all cultures, because these charters are temporarily chosen by employees.

AI Constitution is coming? Google OpenAI jointly formulates, AI also has to talk about values and principles

Google DeepMind researchers are working to develop a constitution that AI can follow (Source: Financial Times)

For example, Google's DeepMind's rules for Sparrow are determined by employees within the company, but DeepMind plans to include others on the rule-based list in the future. The AI constitution released by Anthropic is also a rule written by the company's leaders, drawing on the principles published by DeepMind, as well as external resources such as the United Nations Declaration of Human Rights and Apple's Terms of Service. Meanwhile, Amodei said Anthropic is conducting an experiment to more democratically define its AI constitutional rules through a participatory process that reflects the values of outside experts.

Rebecca Johnson, an AI ethics researcher at the University of Sydney, worked at Google for a while last year, analysing Google's language models such as LaMDA and PaLM. As she says, the values and rules that are internal to AI models, and the methods for testing them, are often created by AI engineers and computer scientists with their own specific worldviews.

Johnson also said that engineers try to solve the subjective problem of internal rules of AI models, but human nature is chaotic and unsolvable. And, according to the Financial Times, it turns out that the approach to developing an AI constitution is not foolproof.

In July, researchers at Carnegie Mellon University and the San Francisco Center for AI Security successfully broke through the fences of all leading AI models, including OpenAI's ChatGPT, Google's Bard, and Anthropic's Claude. By adding a series of random characters to the end of the code of the malicious request, they successfully bypassed the model's filters and underlying constitutional rules.

Connor Leahy, research institute and CEO of AI security research firm Conjecture, said current AI systems are so fragile that people just need to use a jailbreak prompt and it will completely derail and start doing the exact opposite.

At the same time, some researchers believe that the biggest challenge to AI security is to figure out whether AI's safety guardrails really work. The AI model is open, it is open to countless people to receive information and answer questions, but the rules inside the AI model are made by a limited number of people, and it is currently difficult to effectively evaluate the safety fence of AI. Amodei said Anthropic is looking at how to use AI itself for better evaluation.

Conclusion: Technology companies are trying to enhance AI self-restraint capabilities, and the development of AI security protection is still lagging behind

With the emergence of AI technology and the commercialization of AI by technology companies, from the initial machine learning to the current generative AI, this technology is constantly expanding its capabilities and applications. Along with this comes a series of questions, such as is it safe to use AI? Does AI provide misinformation or harmful information? And will increasingly powerful AI be exploited by bad actors?

From RLHF to "red team testing", AI technology companies are also constantly trying various methods to reduce the negative impact of AI and enhance AI security protection capabilities. Now, leading companies in the AI field such as Google DeepMind, OpenAI, and Anthropic are also improving the self-discipline ability of AI systems by formulating AI constitutions to ensure their safety and reliability.

However, according to the Financial Times, RLHF and the "red team test" cannot fully solve the problem of AI output harmful content, and the method of formulating AI constitution also has problems such as strong subjectivity and difficulty in effectively evaluating AI safety guardrails, and the development of AI security protection is relatively lagging behind the development of AI application technology. Therefore, we will continue to monitor AI companies to see how they will update their AI security methods in the future.

Source: Financial Times

Read on