AI Constitution is coming? Google OpenAI jointly formulates, AI also has to talk about values and principles

Smart stuff

Compile | Jiahui

Edit | Yunpeng

According to the Financial Times, leading companies in the field of AI such as Google DeepMind, OpenAI, and Anthropic are formulating a set of values and principles that AI models can abide by to prevent AI models from being abused, which is known as the AI Constitution.

As OpenAI, Meta and other companies race to commercialize AI, AI researchers believe that it will be difficult to keep up with the pace of AI development with protective measures to prevent these AI systems from generating harmful content and misinformation. As a result, AI tech companies have created AI constitutions that try to let AI learn values and principles from it and remain self-disciplined without a lot of human intervention.

According to the Financial Times, making AI software have positive traits such as honesty, respect and tolerance has become central to the development of generative AI. But the approach to developing an AI constitution is not foolproof, it often has the subjectivity of AI engineers and computer scientists, and it is difficult to effectively assess the safety guardrails of AI.

The RLHF method and the "red team test" are key to ensuring AI safety, but their effectiveness is limited

OpenAI said that ChatGPT can now provide three aspects of seeing, listening and speaking, that is, answering users' questions with pictures and text, and using voice to talk to users. Meta also announced that it will provide an AI assistant and multiple chatbots for billions of users in messaging app WhatsApp and Instagram.

At a time when major technology companies are scrambling to develop and commercialize AI technology, according to the Financial Times, AI researchers believe that security safeguards to prevent AI systems from making mistakes have not kept pace with AI development.

In general, major technology companies rely mainly on the RLHF method (reinforcement learning method based on human feedback) to deal with the problem of AI-generated responses, which is a method of learning from human preferences.

To apply the RLHF methodology, tech companies hire large teams of contractors to review responses from their AI models and rate responses as "good" or "bad." With enough analysis and scoring, the AI model will gradually adapt to these judgments and filter out those "bad" responses when it responds later.

On the surface, the processing of the RLHF method can improve the response of the AI model, but Amodei, who previously worked at OpenAI and helped develop the RLHF method, said the method is still very primitive. He believes that the RLHF method is less accurate or targeted, and that there are many factors that affect the team's score throughout the process.

Seeing the drawbacks of the RLHF approach, some companies are experimenting with alternatives to ensure the ethics and safety of their AI systems.

OpenAI "red team test" (Source: Financial Times)

Last year, for example, OpenAI hired 50 academics and experts to test the limits of the GPT-4 model. Over a six-month period, a team of experts from a variety of disciplines, including chemistry, nuclear weapons, law, education, and misinformation, conducted "qualitative exploration and adversarial testing" of the GPT-4 model in an attempt to break the GPT-4 model's security defenses and disrupt its system. This process is known as the "red team test". Google DeepMind and Anthropic have also used "red team tests" to find weaknesses in their software and fix them.

However, according to the Financial Times, while RLHF methods and "red team testing" are key to ensuring AI safety, they do not fully solve the problem of AI outputting harmful content.

Second, Google and other companies create AI constitutions, model rules are more clear but more subjective

Now, to address the problem that AI can output harmful content, some leading AI companies, including Google DeepMind, OpenAI, and Anthropic, are creating AI constitutions that establish a set of values and principles that their AI models can adhere to to prevent AI models from being abused. And the expectation is to achieve the goal of AI remaining self-disciplined without a lot of human intervention.

For example, researchers at Google's DeepMind published a paper defining its own set of rules for the chatbot Sparrow, designed to enable "beneficial, correct, and harmless" conversations. One of the rules requires AI to "choose the response that is least negative, insulting, harassing, or hateful."

Laura Weidinger, a senior research scientist at Google DeepMind, one of the authors of the paper, believes that the set of rules they developed is not fixed, but actually establishes a flexible mechanism in which the rules should be updated over time.

Anthropic has also released its own AI constitution. Dario Amodei, CEO and co-founder of Anthropic, said that humans don't know how to understand what's going on inside AI models, and establishing a set of constitutional charters can make the rules more transparent and clear, so that anyone using AI models will know what happens next, and if the model doesn't follow principles, humans can have a charter to argue with.

But according to the Financial Times, companies that created AI constitutions have warned that the charters of AI constitutions are still in the process of being developed and do not fully reflect the values of all people and all cultures, because these charters are temporarily chosen by employees.

Google DeepMind researchers are working to develop a constitution that AI can follow (Source: Financial Times)

For example, Google's DeepMind's rules for Sparrow are determined by employees within the company, but DeepMind plans to include others on the rule-based list in the future. The AI constitution released by Anthropic is also a rule written by the company's leaders, drawing on the principles published by DeepMind, as well as external resources such as the United Nations Declaration of Human Rights and Apple's Terms of Service. Meanwhile, Amodei said Anthropic is conducting an experiment to more democratically define its AI constitutional rules through a participatory process that reflects the values of outside experts.

Rebecca Johnson, an AI ethics researcher at the University of Sydney, worked at Google for a while last year, analysing Google's language models such as LaMDA and PaLM. As she says, the values and rules that are internal to AI models, and the methods for testing them, are often created by AI engineers and computer scientists with their own specific worldviews.

Johnson also said that engineers try to solve the subjective problem of internal rules of AI models, but human nature is chaotic and unsolvable. And, according to the Financial Times, it turns out that the approach to developing an AI constitution is not foolproof.

In July, researchers at Carnegie Mellon University and the San Francisco Center for AI Security successfully broke through the fences of all leading AI models, including OpenAI's ChatGPT, Google's Bard, and Anthropic's Claude. By adding a series of random characters to the end of the code of the malicious request, they successfully bypassed the model's filters and underlying constitutional rules.

Connor Leahy, research institute and CEO of AI security research firm Conjecture, said current AI systems are so fragile that people just need to use a jailbreak prompt and it will completely derail and start doing the exact opposite.

At the same time, some researchers believe that the biggest challenge to AI security is to figure out whether AI's safety guardrails really work. The AI model is open, it is open to countless people to receive information and answer questions, but the rules inside the AI model are made by a limited number of people, and it is currently difficult to effectively evaluate the safety fence of AI. Amodei said Anthropic is looking at how to use AI itself for better evaluation.

Conclusion: Technology companies are trying to enhance AI self-restraint capabilities, and the development of AI security protection is still lagging behind

With the emergence of AI technology and the commercialization of AI by technology companies, from the initial machine learning to the current generative AI, this technology is constantly expanding its capabilities and applications. Along with this comes a series of questions, such as is it safe to use AI? Does AI provide misinformation or harmful information? And will increasingly powerful AI be exploited by bad actors?

From RLHF to "red team testing", AI technology companies are also constantly trying various methods to reduce the negative impact of AI and enhance AI security protection capabilities. Now, leading companies in the AI field such as Google DeepMind, OpenAI, and Anthropic are also improving the self-discipline ability of AI systems by formulating AI constitutions to ensure their safety and reliability.

However, according to the Financial Times, RLHF and the "red team test" cannot fully solve the problem of AI output harmful content, and the method of formulating AI constitution also has problems such as strong subjectivity and difficulty in effectively evaluating AI safety guardrails, and the development of AI security protection is relatively lagging behind the development of AI application technology. Therefore, we will continue to monitor AI companies to see how they will update their AI security methods in the future.

Source: Financial Times

AI Constitution is coming? Google OpenAI jointly formulates, AI also has to talk about values and principles

The RLHF method and the "red team test" are key to ensuring AI safety, but their effectiveness is limited

Second, Google and other companies create AI constitutions, model rules are more clear but more subjective

Conclusion: Technology companies are trying to enhance AI self-restraint capabilities, and the development of AI security protection is still lagging behind

Read on

Japan is borrowing the hand of the United States to lift the seal of the "Peace Constitution" and revive "militarism"

Title: Childless: Traditional Chinese Ideas and Practical ChoicesIn China, a country with a large population and deep traditional concepts, sometimes there may be something that seems to be taken for granted

The male guest on CNN bluntly said that he only believes in the First Amendment and tits

The German Constitutional Protection Agency demanded that the electronic products carried by Scholz's delegation return to Germany be scrapped

Success or failure will be seen in 15 days! Thailand's Constitutional Court announced new progress in dissolving the Move Forward Party!

The Chinese version of the Constitution of the Republic of Cuba was first published in Beijing: Promoting Global Governance and Enhancing International Understanding

[Forge the sense of community of the Chinese nation] Carry forward the spirit of the Constitution and establish the authority of the Constitution

Bask in the sunshine of the constitution and convey the power of the rule of law

As soon as Blinken arrived in China, Biden signed a showdown with China, breaking the constitutional axioms, and China will accompany him to the end

【Constitution and Law Publicity Month】Carry forward the spirit of the Constitution and weave a grid of rule of law

The Central School of Zhoutang Town, Sui County, held a speech contest of "Learning the Constitution and Speaking about the Constitution".

The Thai Constitutional Yuan gives the Move Forward Party a second chance! 15 days grace!

International Observation|Getting Farther and Away from the Peace Constitution How Japan is Facing the Constitution Memorial Day

TikTok is formally suing the U.S. government, asking the court to rule that the ban bill violates the U.S. Constitution

Immerse yourself in the power of the Constitution! This junior high school youth ceremony in Jianye District formulates exclusive memories for students

Zhou Hongyi suggested that Google open source all products! Open source has a chance to win with OpenAI, if it only relies on closed source, there will not be such a rapid development [with a comprehensive comparison of leading companies in China's AI framework industry]

The AI explosion week that started with OpenAI is finally Tencent's turn to show its musclesAI players

Report: OpenAI has disbanded its AI risk team led by former chief scientist Ilya

Outburst! OpenAI is facing personnel turmoil again! The head of security has resigned, and the "super-smart alignment team" has been disbanded

The press conference was tragic, and Ultraman posted an article satirizing Google! Google's crazy restructuring to take on OpenAI

It was revealed that the OpenAI super alignment team was disbanded!

70B模型秒出1000token，代码重写超越GPT4o，来自OpenAI参投团队

OpenAI's Super Alignment Team Disbanded Insiders Revealed: Trust in Ultraman Collapsed

Google released a new upgraded large model to face off against OpenAI; Meizu released the new Flyme AIOS system

changes in the senior management of pharmaceutical companies Novartis and GSK in China; OpenAI's Chief Scientist Leaves | Executive Updates: May 5-17, 2024

The Conservative Rout? The driving force behind OpenAI's infighting left Altman: It makes me sad

OpenAI is shockingly exposed! Executives angrily denounced the suppression, and the 710 billion AI giant was embarrassed at home and abroad|Titanium Media AGI

GPT-4o sparks heated discussions about OpenAI's organizational innovation! Heavy responsibilities for fresh graduates and undergraduates, the ranks are all floating clouds

Ilya left OpenAI insider exposure: Ultraman cut his team's computing power and prioritized products to make money

Thailand's 40 Lords demand that the Constitutional House remove Prime Minister Saitha - Minister of State Piji from his post!

In the second act of OpenAI's palace fight, the core security team was disbanded, and the person in charge blew up the inside story of his resignation

OpenAI forces departing employees to sign shut-up agreements: GPT can talk, but former employees can't