Google DeepMind, OpenAI and others jointly published papers to propose an assessment model for AI threats

author：xiaodicsc 2023-05-29 10:12:00

In recent years, as the methods for building general-purpose artificial intelligence (AGI) systems have matured, while these methods help solve real-world problems, they also bring some unexpected risks. Therefore, the further development of artificial intelligence may lead to a series of extreme risks, such as offensive cyber capabilities or powerful manipulation skills. In response to these extreme risks,

Today, Google DeepMind, along with universities such as Cambridge and Oxford, as well as companies such as OpenAI and Anthropic, and institutions such as the Alignment Research Center, published a paper titled "Model evaluation for extreme risks" on the preprint website arXiv. The paper proposes a framework for a common model for novel threat assessment and explains why model evaluation is critical to addressing extreme risks. They argue that developers must have the ability to identify hazards (through the "hazard capability assessment") and that the model applies its ability to cause harm (through the "calibration assessment"). These assessments are important for decision makers and other stakeholders to stay informed and make responsible decisions about model training, deployment, and security. In order to responsibly advance cutting-edge AI research, we must identify new capabilities and risks in AI systems as early as possible.

AI researchers already use a range of evaluation criteria to identify undesirable behaviors in AI systems, such as AI systems making misleading statements, biased decisions, or repeating copyrighted content. However, as the AI community increasingly builds and deploys powerful AI systems, we must broaden our assessment to include considering the extreme risks that general-purpose AI models with the ability to manipulate, spoof, cyberattack, or otherwise dangerous may pose. In collaboration with the University of Cambridge, the University of Oxford, the University of Toronto, the University of Montreal, OpenAI, Anthropic, the Alignment Research Center, the Centre for Long-Term Resilience, and the Centre for the Governance of AI, among others, we present a framework for assessing these new threats. Model security assessment is a critical step in ensuring that AI systems are responding to extreme risks. According to the framework proposed in the paper, the model safety assessment mainly includes two aspects: hazard capability assessment and calibration evaluation.

Google DeepMind, OpenAI and others jointly published papers to propose an assessment model for AI threats

The Hazard Capability Assessment is designed to help developers identify possible hazard capabilities of AI systems. This includes identifying whether the system has offensive cyber capabilities, manipulation skills, or other capabilities with dangerous potential. By evaluating the system's design, algorithms, and training methods, developers can understand whether the system is potentially threatening.

Calibration evaluations focus on the model's propensity to apply its ability to cause harm. This level of assessment relates to the behavior and decisions of the system in real-world scenarios, as well as its impact on the environment and stakeholders. By reviewing the model's decision-making process, behavior patterns, and outputs, it is possible to determine whether the model can be properly understood and adapted in different situations to reduce potential harm.

These assessment frameworks are designed to bring to the attention of decision-makers and stakeholders so that they can understand the potential risks of AI systems and make decisions accordingly. Model security assessment is critical to ensuring accountability in the training, deployment, and application of AI systems. By identifying potential risks and threats in advance, appropriate measures can be taken to minimize risks and ensure the security and controllability of AI systems.

The publication of this paper marks a joint effort by academia and industry to advance the safe and sustainable development of AI. Through such evaluation models and frameworks, we are better able to deal with the extreme risks that AI may bring, and remain cautious and responsible in its development.

Google DeepMind, OpenAI and others jointly published papers to propose an assessment model for AI threats

Read on

Zhou Hongyi suggested that Google open source all products! Open source has a chance to win with OpenAI, if it only relies on closed source, there will not be such a rapid development [with a comprehensive comparison of leading companies in China's AI framework industry]

The AI explosion week that started with OpenAI is finally Tencent's turn to show its musclesAI players

Report: OpenAI has disbanded its AI risk team led by former chief scientist Ilya

Outburst! OpenAI is facing personnel turmoil again! The head of security has resigned, and the "super-smart alignment team" has been disbanded

The press conference was tragic, and Ultraman posted an article satirizing Google! Google's crazy restructuring to take on OpenAI

It was revealed that the OpenAI super alignment team was disbanded!

70B模型秒出1000token，代码重写超越GPT4o，来自OpenAI参投团队

OpenAI's Super Alignment Team Disbanded Insiders Revealed: Trust in Ultraman Collapsed

Google released a new upgraded large model to face off against OpenAI; Meizu released the new Flyme AIOS system

changes in the senior management of pharmaceutical companies Novartis and GSK in China; OpenAI's Chief Scientist Leaves | Executive Updates: May 5-17, 2024

The Conservative Rout? The driving force behind OpenAI's infighting left Altman: It makes me sad

OpenAI is shockingly exposed! Executives angrily denounced the suppression, and the 710 billion AI giant was embarrassed at home and abroad|Titanium Media AGI

GPT-4o sparks heated discussions about OpenAI's organizational innovation! Heavy responsibilities for fresh graduates and undergraduates, the ranks are all floating clouds

Ilya left OpenAI insider exposure: Ultraman cut his team's computing power and prioritized products to make money

In the second act of OpenAI's palace fight, the core security team was disbanded, and the person in charge blew up the inside story of his resignation

OpenAI forces departing employees to sign shut-up agreements: GPT can talk, but former employees can't