laitimes

Google DeepMind, OpenAI and others jointly published papers to propose an assessment model for AI threats

author:xiaodicsc

In recent years, as the methods for building general-purpose artificial intelligence (AGI) systems have matured, while these methods help solve real-world problems, they also bring some unexpected risks. Therefore, the further development of artificial intelligence may lead to a series of extreme risks, such as offensive cyber capabilities or powerful manipulation skills. In response to these extreme risks,

Today, Google DeepMind, along with universities such as Cambridge and Oxford, as well as companies such as OpenAI and Anthropic, and institutions such as the Alignment Research Center, published a paper titled "Model evaluation for extreme risks" on the preprint website arXiv. The paper proposes a framework for a common model for novel threat assessment and explains why model evaluation is critical to addressing extreme risks. They argue that developers must have the ability to identify hazards (through the "hazard capability assessment") and that the model applies its ability to cause harm (through the "calibration assessment"). These assessments are important for decision makers and other stakeholders to stay informed and make responsible decisions about model training, deployment, and security. In order to responsibly advance cutting-edge AI research, we must identify new capabilities and risks in AI systems as early as possible.

AI researchers already use a range of evaluation criteria to identify undesirable behaviors in AI systems, such as AI systems making misleading statements, biased decisions, or repeating copyrighted content. However, as the AI community increasingly builds and deploys powerful AI systems, we must broaden our assessment to include considering the extreme risks that general-purpose AI models with the ability to manipulate, spoof, cyberattack, or otherwise dangerous may pose. In collaboration with the University of Cambridge, the University of Oxford, the University of Toronto, the University of Montreal, OpenAI, Anthropic, the Alignment Research Center, the Centre for Long-Term Resilience, and the Centre for the Governance of AI, among others, we present a framework for assessing these new threats. Model security assessment is a critical step in ensuring that AI systems are responding to extreme risks. According to the framework proposed in the paper, the model safety assessment mainly includes two aspects: hazard capability assessment and calibration evaluation.

Google DeepMind, OpenAI and others jointly published papers to propose an assessment model for AI threats

The Hazard Capability Assessment is designed to help developers identify possible hazard capabilities of AI systems. This includes identifying whether the system has offensive cyber capabilities, manipulation skills, or other capabilities with dangerous potential. By evaluating the system's design, algorithms, and training methods, developers can understand whether the system is potentially threatening.

Calibration evaluations focus on the model's propensity to apply its ability to cause harm. This level of assessment relates to the behavior and decisions of the system in real-world scenarios, as well as its impact on the environment and stakeholders. By reviewing the model's decision-making process, behavior patterns, and outputs, it is possible to determine whether the model can be properly understood and adapted in different situations to reduce potential harm.

Google DeepMind, OpenAI and others jointly published papers to propose an assessment model for AI threats

These assessment frameworks are designed to bring to the attention of decision-makers and stakeholders so that they can understand the potential risks of AI systems and make decisions accordingly. Model security assessment is critical to ensuring accountability in the training, deployment, and application of AI systems. By identifying potential risks and threats in advance, appropriate measures can be taken to minimize risks and ensure the security and controllability of AI systems.

The publication of this paper marks a joint effort by academia and industry to advance the safe and sustainable development of AI. Through such evaluation models and frameworks, we are better able to deal with the extreme risks that AI may bring, and remain cautious and responsible in its development.

Google DeepMind, OpenAI and others jointly published papers to propose an assessment model for AI threats

Read on