laitimes

Consensus | Manage AI risk in an era of rapid progress

author:YiyiKing

A short consensus document (Managing AI Risks in an Era of Rapid Progress), drafted by scholars such as Yao, Yaqin Zhang, Bengio Yoshua, and Hinton Geoffrey, systematically outlines the risks of upcoming advanced AI systems.

Consensus | Manage AI risk in an era of rapid progress
Consensus | Manage AI risk in an era of rapid progress

In this short consensus document, we outline the risks of upcoming advanced AI systems. We study large-scale societal harm and malicious use, as well as the irreversible loss of human control over autonomous AI systems. Given the rapid and continuous advancement of AI, we have set out urgent priorities for AI R&D and governance.

Artificial intelligence is developing rapidly

In 2019, GPT-2 could not reliably count to ten. After only four years, deep learning systems can write software that generates realistic scenarios as needed, advises on intellectual topics, and combines language and image processing to guide robots. As AI developers scale these systems, unforeseen abilities and behaviors emerge spontaneously without the need for explicit programming [1]. The advancement of AI is very rapid and surprising to many.

The speed of progress may surprise us again. Current deep learning systems still lack important capabilities, and we don't know how long it will take to develop them. However, companies are racing to create general AI systems that match or exceed human capabilities in most cognitive jobs [2, 3]. They are rapidly deploying more resources and developing new technologies to enhance AI capabilities. Advances in AI have also led to faster progress: AI assistants are increasingly being used for automated programming [4] and data collection [5, 6] to further improve AI systems [7].

There is no fundamental reason why advances in AI have slowed down or stagnated at the human level. In fact, AI has surpassed human capabilities in narrow areas such as protein folding or strategy games [8,9,10]. AI systems can act faster, absorb more knowledge, and communicate with higher bandwidth than humans. In addition, they can scale to use huge computing resources and can be replicated millions of times.

The rate of improvement is already staggering, and tech companies have the cash reserves they need to scale up the latest training by a factor of 100 to 1000 very quickly[11]. Combined with the continued growth and automation of AI R&D, we must take seriously the possibility of general AI systems surpassing human capabilities in many key areas this decade or in the next decade.

What happens then? When properly managed and distributed equitably, advanced AI systems can help humans cure diseases, improve living standards, and protect our ecosystems. The opportunities offered by AI are enormous. But with the advent of advanced AI capabilities, we are not yet well positioned to handle risks at scale. Humans are investing a lot of resources to make AI systems more powerful, but less so on safety and harm mitigation. In order for AI to be a boon, we must reposition; It's not enough to just push AI capabilities.

Our adjustments are already behind schedule. We must anticipate the expansion of ongoing hazards and new risks, and prepare for the greatest risks before they occur. It took decades for people to recognize and respond to climate change; Decades may be too long for artificial intelligence.

Societal scale risk

AI systems are likely to rapidly surpass humans in a growing number of tasks. If such systems are not carefully designed and deployed, they pose a range of risks on a societal scale. They have the potential to exacerbate social injustices, erode social stability, and weaken our shared understanding of the realities that underpin society. They may also contribute to large-scale criminal or terrorist activities. Especially in the hands of a few powerful players, AI may entrench or exacerbate global inequalities, or facilitate automated warfare, customized mass manipulation, and ubiquitous surveillance [12, 13].

As companies are developing autonomous AI: systems that can plan, act, and pursue goals in the world, many of these risks may soon be amplified and create new ones. While the autonomy of current AI systems is limited, efforts to change this are underway [14]. For example, non-autonomous GPT-4 models quickly adapted to web browsing[15], designing and performing chemical experiments[16], and leveraging software tools[17], including other AI models[18].

If we build highly advanced autonomous AI, we have the potential to create systems that pursue undesirable goals. Malicious actors may deliberately embed harmful targets. In addition, no one currently knows how to reliably combine AI behavior with complex values. Even well-intentioned developers can inadvertently build AI systems that pursue unexpected goals – especially if they ignore expensive security testing and human supervision in order to win the AI race.

Once autonomous AI systems pursue malicious actors or undesirable targets that are accidentally embedded, we may not be able to control them. Software control is an old and unsolved problem: computer worms have long been able to spread and evade detection [19]. However, AI is making progress in key areas such as hacking, social manipulation, spoofing, and strategic planning [14, 20]. Advanced autonomous AI systems will present unprecedented control challenges.

In order to achieve undesirable goals, future autonomous AI systems may use undesirable strategies (learning from humans or independently developing) as a means to an end [21,22,23,24]. AI systems can earn human trust, access financial resources, influence key decision-makers, and form alliances with human actors and other AI systems. Avoiding human intervention [24], they can replicate their algorithms like computer worms on a global network of servers. AI assistants have co-written a lot of computer code around the world [25]; The AI systems of the future can plug in and exploit security vulnerabilities to take control of the computer systems behind our communications, media, banking, supply chains, militaries, and governments. In open conflicts, AI systems may threaten or use autonomous or biological weapons. Artificial intelligence that acquires such technologies will only continue the existing trend of automated military activities, biological research, and AI development itself. If the AI system executes such a strategy with sufficient skill, it will be difficult for humans to intervene.

Finally, if an AI system is free to hand over influence, it may not need to curate influence. As autonomous AI systems become faster and more cost-effective than human workers, a dilemma arises. Companies, governments, and militaries may be forced to deploy AI systems widely and reduce costly manual validation of AI decisions or risk competition [26, 27]. As a result, autonomous AI systems can increasingly assume key social roles.

Without sufficient care, we risk irreversibly losing control of autonomous AI systems, resulting in ineffective human intervention. Large-scale cybercrime, social manipulation, and other prominent harms can escalate rapidly. This uncontrolled progress in AI could eventually lead to the loss of life and the biosphere on a large scale, as well as the marginalization or even extinction of humanity.

Harms such as misinformation and algorithmic discrimination are now evident [28]; Other hazards are showing signs [20]. Addressing ongoing hazards and anticipating emerging risks is critical. It's not an either/or question. Current and emerging risks often have similar mechanisms, patterns, and solutions [29]; Investments in governance frameworks and AI security will bear fruit on multiple fronts [30].

The way forward

If advanced autonomous AI systems are developed today, we will not know how to ensure their safety or how to properly test their safety. Even if we do, the government will lack the institutions to prevent abuse and maintain safe practices. However, this does not mean that there is no viable path forward. To ensure positive outcomes, we can and must seek breakthroughs in AI safety and ethics, and establish effective government regulation in a timely manner.

Adjust the direction of technology research and development

We need research breakthroughs to address some of the technological challenges of creating AI with safety and ethical goals today. Some of these challenges are unlikely to be solved by simply improving the capabilities of AI systems [22, 31, 32, 33, 34, 35]. These include:

  • Supervision and honesty: More capable AI systems are better able to exploit weaknesses in supervision and testing [32,36,37] – for example, by producing false but compelling outputs [35, 38].
  • Robustness: The behavior of AI systems in new situations is unpredictable (under distribution shifts or adversarial inputs) [39,40,34].
  • Explainability: AI decisions are opaque. Until now, we have only been able to test large models through trial and error. We need to learn to understand their inner workings [41].
  • Risk assessment: Cutting-edge AI systems develop unforeseen features that are only discovered during training or even after deployment [42]. Better assessment is needed to detect hazardous capabilities as early as possible [43, 44].
  • Tackling emerging challenges: Future AI systems with more capabilities may exhibit failure modes that we have only seen so far in theoretical models. For example, AI systems may learn to pretend to obey or exploit weaknesses in our security goals and shutdown mechanisms to advance specific goals [24, 45].

Given the risks, we call on major tech companies and public funders to devote at least one-third of their AI R&D budgets to ensuring safe and ethical use, which is comparable to their funding for AI capabilities. Solving these problems [34] with an eye to powerful future systems must be at the heart of our field.

Emergency management measures

We urgently need national institutions and international governance to enforce standards to prevent recklessness and abuse. Many areas of technology, from pharmaceuticals to the financial system and nuclear energy, show that society needs and effectively uses governance to reduce risk. However, there is currently no similar governance framework for AI. Without them, companies and countries may seek a competitive advantage by pushing AI capabilities to new heights while taking shortcuts in security, or by delegating key social roles to AI systems with little to no human oversight [26]. In the same way that manufacturers dump waste into rivers to reduce costs, they may try to reap the rewards of AI development while leaving society to deal with the consequences.

To keep up with rapid progress and avoid rigid laws, state institutions need strong technical expertise and the power to act quickly. To respond to international racial dynamics, they need to be able to promote international agreements and partnerships [46, 47]. To protect low-risk use and academic research, they should avoid bureaucratic hurdles that are improperly placed on small and predictable AI models. The most pressing scrutiny should be cutting-edge AI systems: a handful of the most powerful AI systems – trained on multibillion-dollar supercomputers – will have the most dangerous and unpredictable capabilities [48, 49].

In order to achieve effective regulation, governments urgently need to have a comprehensive understanding of the development of AI. Regulators should require model registration, whistleblower protection, incident reporting, and monitoring of model development and supercomputer use [48,50,51,52,53,54,55]. Regulators also need access to advanced AI systems prior to deployment to assess their dangerous capabilities, such as autonomous replication, breaking into computer systems, or spreading epidemic pathogens widely [43,56,57].

For AI systems with dangerous capabilities, we need a combination of governance mechanisms [48, 52, 58, 59] to match their level of risk. Regulators should develop national and international safety standards based on model capabilities. They should also hold cutting-edge AI developers and owners legally liable for reasonably foreseeable and preventable harm caused by their models. These measures can prevent harm and create a much-needed incentive to invest in safety. Further steps are needed for highly capable future AI systems, such as models that can circumvent human control. Governments must be prepared to license their development, suspend development in response to worrying capabilities, enforce access controls, and require strong information security measures against state-level hackers until adequate protections are in place. To shorten the time it takes for regulations to be introduced, major AI companies should immediately make an "if-then" commitment that if a specific redline feature is found in their AI systems, they will take specific security measures. These commitments should be detailed and independently reviewed.

Artificial intelligence may be the technology that will shape this century. While AI capabilities are advancing rapidly, progress in security and governance is lagging behind. In order to steer AI towards positive outcomes and away from disaster, we need to reorient. If we have the wisdom to go down, there is a responsible path.

bibliography

  1. Emergent Abilities of Large Language Models  [link]

    Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S. and others,, 2022. Transactions on Machine Learning Research.

  2. About  [link]

    DeepMind,, 2023.

  3. About  [link]

    OpenAI,, 2023.

  4. ML-Enhanced Code Completion Improves Developer Productivity  [HTML]

    Tabachnyk, M., 2022. Google Research.

  5. GPT-4 Technical Report  [PDF]

    OpenAI,, 2023. arXiv [cs. CL].

  6. Constitutional AI: Harmlessness from AI Feedback  [PDF]

    Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A. and others,, 2022. arXiv [cs. CL].

  7. Examples of AI Improving AI  [link]

    Woodside, T. and Safety, C.f.A., 2023.

  8. Highly Accurate Protein Structure Prediction with AlphaFold

    Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O. and others,, 2021. Nature, pp. 583--589.

  9. Superhuman AI for Multiplayer Poker

    Brown, N. and Sandholm, T., 2019. Science, pp. 885--890.

  10. Deep Blue

    Campbell, M., Hoane, A. and Hsu, F., 2002. Artificial Intelligence, pp. 57--83.

  11. Alphabet Annual Report, page 33  [PDF]

    Alphabet,, 2022.

  12. An Overview of Catastrophic AI Risks  [PDF]

    Hendrycks, D., Mazeika, M. and Woodside, T., 2023. arXiv [cs.CY].

  13. Taxonomy of Risks Posed by Language Models

    Weidinger, L., Uesato, J., Rauh, M., Griffin, C., Huang, P., Mellor, J. and others,, 2022. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 214--229.

  14. A Survey on Large Language Model based Autonomous Agents  [PDF]

    Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J. and others,, 2023. arXiv [cs. AI].

  15. ChatGPT plugins [link]

    OpenAI,, 2023.

  16. ChemCrow: Augmenting Large Language Models with Chemistry Tools  [PDF]

    Bran, A., Cox, S., White, A. and Schwaller, P., 2023. arXiv [physics.chem-ph].

  17. Augmented Language Models: a Survey  [PDF]

    Mialon, G., Dessì, R., Lomeli, M., Nalmpantis, C., Pasunuru, R., Raileanu, R. and others,, 2023. arXiv [cs. CL].

  18. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face  [PDF]

    Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y. and others,, 2023. arXiv [cs. CL].

  19. The Science of Computing: The Internet Worm

    Denning, P., 1989. American Scientist, pp. 126--128.

  20. AI Deception: A Survey of Examples, Risks, and Potential Solutions  [PDF]

    Park, P., Goldstein, S., O’Gara, A., Chen, M. and Hendrycks, D., 2023. arXiv [cs.CY].

  21. Optimal Policies Tend to Seek Power  [PDF]

    Turner, A., Smith, L., Shah, R. and Critch, A., 2019. Thirty-Fifth Conference on Neural Information Processing Systems.

  22. Discovering Language Model Behaviors with Model-Written Evaluations  [PDF]

    Perez, E., Ringer, S., Lukošiūtė, K., Nguyen, K., Chen, E. and Heiner, S., 2022. arXiv [cs. CL].

  23. Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

    Pan, A., Chan, J., Zou, A., Li, N., Basart, S. and Woodside, T., 2023. International Conference on Machine Learning.

  24. The Off-Switch Game

    Hadfield-Menell, D., Dragan, A., Abbeel, P. and Russell, S., 2017. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 220--227.

  25. GitHub Copilot  [link]

    Dohmke, T., 2023.

  26. Natural Selection Favors AIs over Humans  [PDF]

    Hendrycks, D., 2023. arXiv [cs.CY].

  27. Harms from Increasingly Agentic Algorithmic Systems

    Chan, A., Salganik, R., Markelius, A., Pang, C., Rajkumar, N. and Krasheninnikov, D., 2023. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 651--666. Association for Computing Machinery.

  28. On the Opportunities and Risks of Foundation Models  [PDF]

    Bommasani, R., Hudson, D., Adeli, E., Altman, R., Arora, S. and von Arx, S., 2021. arXiv [cs. LG].

  29. AI Poses Doomsday Risks—But That Doesn’t Mean We Shouldn’t Talk About Present Harms Too  [link]

    Brauner, J. and Chan, A., 2023. Time.

  30. Existing Policy Proposals Targeting Present and Future Harms  [PDF]

    Safety, C.f.A., 2023.

  31. Inverse Scaling: When Bigger Isn’t Better  [PDF]

    McKenzie, I., Lyzhov, A., Pieler, M., Parrish, A., Mueller, A. and Prabhu, A., 2023. Transactions on Machine Learning Research.

  32. The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models  [link]

    Pan, A., Bhatia, K. and Steinhardt, J., 2022. International Conference on Learning Representations.

  33. Simple Synthetic Data Reduces Sycophancy in Large Language Models  [PDF]

    Wei, J., Huang, D., Lu, Y., Zhou, D. and Le, Q., 2023. arXiv [cs. CL].

  34. Unsolved Problems in ML Safety  [PDF]

    Hendrycks, D., Carlini, N., Schulman, J. and Steinhardt, J., 2021. arXiv [cs. LG].

  35. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback  [PDF]

    Casper, S., Davies, X., Shi, C., Gilbert, T., Scheurer, J. and Rando, J., 2023. arXiv [cs. AI].

  36. Consequences of Misaligned AI

    Zhuang, S. and Hadfield-Menell, D., 2020. Advances in Neural Information Processing Systems, Vol 33, pp. 15763--15773.

  37. Scaling Laws for Reward Model Overoptimization

    Gao, L., Schulman, J. and Hilton, J., 2023. Proceedings of the 40th International Conference on Machine Learning, pp. 10835--10866. PMLR.

  38. Learning from human preferences  [link]

    Amodei, D., Christiano, P. and Ray, A., 2017.

  39. Goal Misgeneralization in Deep Reinforcement Learning  [link]

    Langosco di Langosco, A. and Chan, A., 2022. International Conference on Learning Representations.

  40. Goal Misgeneralization: Why Correct Specifications Aren’t Enough For Correct Goals  [PDF]

    Shah, R., Varma, V., Kumar, R., Phuong, M., Krakovna, V., Uesato, J. and others,, 2022. arXiv [cs. LG].

  41. Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

    Räuker, T., Ho, A., Casper, S. and Hadfield-Menell, D., 2023. 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pp. 464--483.

  42. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F. and others,, 2022. Advances in Neural Information Processing Systems, Vol 35, pp. 24824--24837.

  43. Model evaluation for extreme risks  [PDF]

    Shevlane, T., Farquhar, S., Garfinkel, B., Phuong, M., Whittlestone, J., Leung, J. and others,, 2023. arXiv [cs. AI].

  44. Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries  [PDF]

    Koessler, L. and Schuett, J., 2023. arXiv [cs.CY].

  45. The Alignment Problem from a Deep Learning Perspective  [PDF]

    Ngo, R., Chan, L. and Mindermann, S., 2022. arXiv [cs. AI].

  46. International Institutions for Advanced AI

    Ho, L., Barnhart, J., Trager, R., Bengio, Y., Brundage, M., Carnegie, A. and others,, 2023. arXiv [cs.CY]. DOI: 10.48550/arXiv.2307.04699

  47. International Governance of Civilian AI: A Jurisdictional Certification Approach  [PDF]

    Trager, R., Harack, B., Reuel, A., Carnegie, A., Heim, L., Ho, L. and others,, 2023.

  48. Frontier AI Regulation: Managing Emerging Risks to Public Safety  [PDF]

    Anderljung, M., Barnhart, J., Korinek, A., Leung, J., O’Keefe, C., Whittlestone, J. and others,, 2023. arXiv [cs.CY].

  49. Predictability and Surprise in Large Generative Models

    Ganguli, D., Hernandez, D., Lovitt, L., Askell, A., Bai, Y., Chen, A. and others,, 2022. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 1747--1764. Association for Computing Machinery.

  50. It’s Time to Create a National Registry for Large AI Models  [link]

    Hadfield, G., Cuéllar, M. and O’Reilly, T., 2023. Carnegie Endowment for International Piece.

  51. Model Cards for Model Reporting

    Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B. and others,, 2019. FAT* ’19: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 220--229.

  52. General Purpose AI Poses Serious Risks, Should Not Be Excluded From the EU’s AI Act | Policy Brief  [link]

    2023. AI Now Institute.

  53. Artificial Intelligence Incident Database  [link]

    Database, A.I.I., 2023.

  54. The Promise and Perils of Tech Whistleblowing  [link]

    Bloch-Wehba, H., 2023. Northwestern University Law Review, Forthcoming.

  55. Proposing a Foundation Model Information-Sharing Regime for the UK  [link]

    Mulani, N. and Whittlestone, J., 2023. Centre for the Governance of AI.

  56. Auditing Large Language Models: a Three-Layered Approach

    Mökander, J., Schuett, J., Kirk, H. and Floridi, L., 2023. AI and Ethics. DOI: 10.1007/s43681-023-00289-2

  57. Can Large Language Models Democratize Access to Dual-Use Biotechnology?  [PDF]

    Soice, E., Rocha, R., Cordova, K., Specter, M. and Esvelt, K., 2023. arXiv [cs.CY].

  58. Towards Best Practices in AGI Safety and Governance: A survey of Expert Opinion  [PDF]

    Schuett, J., Dreksler, N., Anderljung, M., McCaffary, D., Heim, L., Bluemke, E. and others,, 2023. arXiv [cs.CY].

  59. Regulatory Markets: The Future of AI Governance  [PDF]

    Hadfield, G. and Clark, J., 2023. arXiv [cs. AI].

postscript

Welcome to leave a message to the @YiyiKing on related topics such as AI risk and governance.

Read on