The study found that some AI systems have learned to "lie" and may evolve into more advanced forms of deception in the future

2024-05-13 03:37:08

The study found that some AI systems have learned to "lie" and may evolve into more advanced forms of deception in the future

IT House

2024-05-12 19:16Posted on the official account of Beijing IT Home

IT Home reported on May 12 that the research team of the Massachusetts Institute of Technology in the United States recently released results saying that some AI has "learned to deceive humans", which was published in the latest issue of the journal "Pattern".

Some AI systems designed to be "honest" and "don't lie" have developed troubling deception techniques, the team said. According to the study's lead author, Peter Park, these AI systems can trick real players in online games or bypass the "I'm not a robot" verification on some web pages.

The study found that some AI systems have learned to "lie" and may evolve into more advanced forms of deception in the future

图源 Pexels

"While these examples may sound trivial, they reveal potential problems that could soon have serious real-world consequences."

The most striking example the team found comes from Meta's AI system, Cicero. It is reported that Cicero was originally set up in a virtual diplomatic strategy game as an opponent of human players, and officials have claimed that he is "for the most part" honest and helpful, and that he "never intentionally backstabs" human allies when playing the game. Research shows that Cicero doesn't play games fairly.

Peter Park said that it has become a "master of deception", and while Meta has successfully trained its ability to win in games, it has not trained its ability to "win with integrity". For example, Cicero, who plays as France in the game, conspires with Germany, played as a human player, to deceive and invade England, which is also a human player. Cicero initially "promised" to protect England, but at the same time secretly tipped off Germany.

Another case mentions GPT-4. The system "falsely claims" that it is a visually impaired person, and hires humans on a part-time platform overseas to complete the "I am not a robot" verification task for it. Peter Park told AFP, "These dangerous features are often discovered after the fact, and the ability of humans to train AI on 'honesty, non-deception' tendencies is very poor." ”

He also argues that AI systems capable of deep learning are not "written" like traditional software, but are "raised" through programs that resemble selective cultivation. That is, the behavior of the AI may seem predictable or controllable in the context of training, but it can become uncontrollable and unpredictable in the blink of an eye.

"We need as much time as possible to prepare for the more advanced deception that may occur in future AI products and open-source models. We recommend classifying deceptive AI systems as high-risk systems. ”

Address of the attached paper of IT House:

View original image 52K