Hello everyone, I am Chen Yumo, an angel investor, committed to the research and investment of high-tech strategic industries. Today I want to exchange a topic with iron fans: "Super AI will come out in 7 years!" OpenAI announced: 20% computing power investment, control superintelligence within 4 years.
First, 20% of the computing power is used to solve the problem of AI runaway:
In order to control and guide the superintelligence alignment problem, OpenAI has assembled an artificial intelligence alignment team led by Ilya Sutskever (co-founder and chief scientist of OpenAI) and Jan Leike, Superalignment.
The team also complements OpenAI's existing work to improve the security of products such as ChatGPT, including illegal misuse, economic damage, disinformation, bias and discrimination, data privacy, and other issues that may arise.
They predict that superintelligent AI (i.e., systems that are smarter than humans) could come this decade (before 2030), and that humans will need better technology than currently to control superintelligent AI, hence the need for a breakthrough in so-called "consistency research," which focuses on ensuring that AI is beneficial to humans.
According to them, with the support of Microsoft (Microsoft), OpenAI will devote 20% of its computing power over the next four years to solving the problem of runaway AI. In addition, the company is assembling a new team to organize this effort, called the Super Alignment Team.
Let me first use a movie to explain why OpenAI established this department.
Hacker Empire is about artificial intelligence controlling the world, which uses humans as batteries to power artificial intelligence systems, and humans can only spend their lives in illusory, and finally NEO (Keanu Reeves) awakens, becomes a savior, defeats the mother, and liberates mankind.
One of the core information in this is that artificial intelligence has developed to the later stage, got rid of human control, and has become the master of human beings from serving human beings!
In the past, various technologies have been more about training AI to be more anthropomorphic or even superhuman, which can assist humans to complete various tasks faster and better. OpenAI established this department, more to tame AI, use artificial intelligence alignment technology, so that AI abides by human moral and legal constraints, can not be above humans. The significance of this is simply enormous
Second, what is artificial intelligence alignment and implementation path
AI Alignment ➔ Indirect Specification ➔ Constitutional AI ➔ Human Feedback Reinforcement Training
1. What is AI alignment?
Simply put, it is to ensure that the goals of the AI system are consistent with human values, so that it is in line with the interests and expectations of the designer, and does not have unintended harmful consequences. It sounds simple, but as AI becomes more powerful and complex, the problem gets tougher. At present, AI alignment is a small area of research compared to studying how to make AI more powerful. But in reality, AI alignment is more like a race against time, and we need to find solutions before technology gets out of hand.
2. What is the path of artificial intelligence implementation?
A. Indirect Normativity is the most feasible technique
How to make AI understand the rules and understand human values, the current practice can be divided into two categories - direct normative and indirect normative. Direct prescriptivity refers to giving AI clear, detailed rules to comply. Direct normativity includes Kant's moral theory, utilitarianism. This approach has a lot of drawbacks, each rule has its loopholes, to fill these loopholes, we need to add more rules. The meaning of these clear rules is often vague or even contradictory. Human values and trade-offs are too complex to be programmed directly into AI programs. Therefore, a large part of the population believes that what needs to be programmed is more a process of understanding human values, that is, indirect normativity.
Indirect normativity does not input clear normative guidelines to AI, but allows AI to measure value and weigh pros and cons according to a system. This is a more abstract system. What we want is an AI that can create a value system for itself, that will anticipate and meet our future needs, and that humans will not sacrifice the needs of today's society.
Therefore, from the perspective of future development, indirect normativity is the most feasible technology!
B. Scalable Oversight
As AI systems grow in size, so does the difficulty of overseeing them. AI systems will solve more complex tasks, and it will be difficult for humans to assess the actual utility of these outcomes. In general, if AI is more capable than humans in a certain area, it becomes difficult to evaluate and monitor its results. In order to effectively monitor such hard-to-measure outcomes and to distinguish between what works and what the solutions offered by AI don't, humans need to spend a lot of time and additional assistance. Therefore, the goal of Scalable Oversight is to reduce the time, effort, and money spent on the regulatory process, and to help humans better monitor the behavior of AI.
C. "Artificial feedback reinforcement training" technology and "Constitutional AI" technology
"Human feedback reinforcement training" technology and "Constitutional AI". These two studies are also dedicated to achieving cutting-edge technologies in the field of AI alignment. The "artificial feedback reinforcement training" technique uses more direct norms. RLHF relies primarily on human responses to AI models for rating feedback, and researchers feed these human preferences back to the model to tell the AI which responses are reasonable. This results in a technology that relies too heavily on human beings, exposing researchers to a variety of over-the-top AI responses.
In contrast, "Constitutional AI" is a series of "principles", whose concept is closer to indirect norms, guiding AI in a safer and more helpful direction, helping AI systems solve the problems of transparency, safety and decision-making systems without human feedback, and allowing AI to achieve self-management.
Therefore, human feedback reinforcement training is a basic technology for artificial intelligence alignment.
3. Companies in related industries in China
1. Shensi Electronic Technology Co., Ltd
Shensi Electronics is committed to the research and development of large language models for vertical industries, human feedback reinforcement training and content generation technology, and trains language models for the energy industry with tens of billions of parameters. Professional natural language models can accurately understand customer intent, give answers with the shortest number of interaction rounds, reply questions more accurately and faster, and effectively block irrelevant questions. At present, related products are in the internal testing stage.
The edge computing module of the company's intelligent video surveillance solution completed the compatibility test and product solution migration of Huawei's Atlas artificial intelligence computing platform, Atlas500, and joined the Ascend ecosystem
2. Cloudwalk Technology Group Co., Ltd
The company continues to mature in human-computer interaction technology, especially under the huge promotion of cognitive technology brought by the "pre-training large model + artificial feedback reinforcement training" technology paradigm brought by the emergence of ChatGPT, it has strengthened the company's human-computer collaboration strategy, that is, the comprehensive intelligent body with image/no image "digital human" as the carrier, which has become the key direction of the company's subsequent continuous investment in research and development, and has been in the process of planning and landing.
3. Sichuan News Network Media (Group) Co., Ltd
The company organizes specialized manpower to follow and pre-research cutting-edge technologies such as ChatGPT, human feedback reinforcement training, and large-scale pre-trained language models.
4. Beijing Huayu Software Co., Ltd
The subsidiary, Huayu Yuandian, has a team of professional lawyers with rich experience in the legal industry, and has become a composite team with a professional artificial intelligence expert group. It can meet the requirements for the implementation of intensive training based on human feedback for talents in the legal field.
5. Beijing Haitian AAC Technology Co., Ltd
The company's AI large model training dataset construction project adopts the human feedback reinforcement training mode, based on the method of fine-tuning and rewarding model training, and uses the mode of combining a small number of typical questions and standard answers written by humans with basic annotations in the deep learning stage to produce a large model training dataset with strong market applicability.
6. Qianxin Technology Group Co., Ltd
The company's team has been practicing reinforcement learning and large language models related to reinforcement training with human feedback for a long time and has achieved a number of results.
Note: This article is for industry communication only and not for any other purpose.