laitimes

Can reinforcement learning AI take you 1 to 5? MIT New Research: AI is not the best teammate of humanity [New Zhiyuan Guide] Reinforcement learning AI is very powerful in the confrontation game, but only human players are abused. What if such a strong AI does a teammate? Recent research by MIT has shown that the cooperation between AI and human players can be said to be uncooperative, and they can't understand the various hints given by teammates!

author:New Zhiyuan

Edit: lrs

Reinforcement learning AI crushes human players with absolute superiority in games such as Go, StarCraft, and Glory of Kings, and also proves that thinking ability can be obtained through simulation.

But if such a strong AI becomes your teammate, can it be taken flying?

Recent collaborative research between humans and ai agenet in the card game hanabi (Hanabi) showed that although the personal performance ability of the rl agent is excellent, when matched with human players, it only makes people call taik.

Can reinforcement learning AI take you 1 to 5? MIT New Research: AI is not the best teammate of humanity [New Zhiyuan Guide] Reinforcement learning AI is very powerful in the confrontation game, but only human players are abused. What if such a strong AI does a teammate? Recent research by MIT has shown that the cooperation between AI and human players can be said to be uncooperative, and they can't understand the various hints given by teammates!

https://arxiv.org/pdf/2107.07630.pdf

Hanabi is a game that requires players to communicate with each other and cooperate to win, in which human players prefer predictable rules-based AI systems to black-box neural network models.

Can reinforcement learning AI take you 1 to 5? MIT New Research: AI is not the best teammate of humanity [New Zhiyuan Guide] Reinforcement learning AI is very powerful in the confrontation game, but only human players are abused. What if such a strong AI does a teammate? Recent research by MIT has shown that the cooperation between AI and human players can be said to be uncooperative, and they can't understand the various hints given by teammates!

In general, the most advanced game bots use deep reinforcement learning algorithms. Learning is first made through feedback mechanisms from the environment by providing an agent and a set of possible candidate actions in the game. During training, random exploration actions are also used to maximize the target, resulting in the optimal sequence of actions.

Early research into deep reinforcement learning relied on game data provided by human players for learning. Recently, researchers have been able to develop l agents purely on self-game without human data.

Researchers at MIT Lincoln Labs are more focused on how making such powerful AI teammates can also give us a better understanding of what prevents reinforcement learning applications from being limited to video games and not being extended to real-world applications.

Can reinforcement learning AI take you 1 to 5? MIT New Research: AI is not the best teammate of humanity [New Zhiyuan Guide] Reinforcement learning AI is very powerful in the confrontation game, but only human players are abused. What if such a strong AI does a teammate? Recent research by MIT has shown that the cooperation between AI and human players can be said to be uncooperative, and they can't understand the various hints given by teammates!

Most of the recent reinforcement learning research has been applied to single-player games (atari breakout bricks) or confrontational games (StarCraft, Go), where ai's main opponents are human players or other AI robots.

In these confrontations, reinforcement learning has been unprecedentedly successful, as robots do not have some preconceived biases and assumptions about these games, but instead learn to play games from scratch and train with the best player data.

In fact, after ai learns to play games, they will even create some skills by themselves. A famous example is deepmind's alphago taking a move in its game, but analysts at the time thought the move was a mistake because it went against the gut of human experts.

But the same move brought about a different result, ai finally succeeded in defeating humans with this hand. So when the RL agent works with humans, the researchers think the same ingenuity can also work.

Can reinforcement learning AI take you 1 to 5? MIT New Research: AI is not the best teammate of humanity [New Zhiyuan Guide] Reinforcement learning AI is very powerful in the confrontation game, but only human players are abused. What if such a strong AI does a teammate? Recent research by MIT has shown that the cooperation between AI and human players can be said to be uncooperative, and they can't understand the various hints given by teammates!

In the experiments of the MIT researchers, the card game Hanabi was selected, which included two to five players who had to cooperate to play the cards in a specific order. Hanabi is simple, but it's also a game that requires cooperation and limited information.

The hanabi game was invented in 2010 and consists of two or five players, and players are required to play five different colored cards together in the correct order. Game features: All players can see each other's cards, but they can't see their own cards.

According to the rules of the game, players can cue each other the cards in their hand (but only the color or number of the cards), allowing other players to deduce what cards they should play, but there is a limit to the number of cues.

It is this act of efficient communication that gives Hanabi a scientific charm. For example, humans can naturally understand the hints of other players, which cards are available, but machines are inherently unable to understand these prompts.

So far, AI programs have been able to earn high scores when playing hanabi fireworks games, but only to play with other similar intelligent robots. In cases where there are players who are unfamiliar with other players' play styles or have "ad hoc" (never played together), the challenge to the program is the greatest and closer to the real situation.

In recent years, several research teams have explored the development of AI robots that can play hanabi, some of which are reinforcement learning agents using symbolic AI.

Ai's evaluations mainly use their performance, including self-play (playing with themselves), cross-play (playing with other types of agents), and human-play (working with humans).

Can reinforcement learning AI take you 1 to 5? MIT New Research: AI is not the best teammate of humanity [New Zhiyuan Guide] Reinforcement learning AI is very powerful in the confrontation game, but only human players are abused. What if such a strong AI does a teammate? Recent research by MIT has shown that the cooperation between AI and human players can be said to be uncooperative, and they can't understand the various hints given by teammates!

Cross-play with human players, which is especially important for measuring collaboration between humans and machines, is also the basis of the paper's experiments.

To test the effectiveness of AI collaboration, the researchers used smartbot, a rules-based self-play AI system, and a hanabi robot, which ranks highest in cross-game and RL algorithms.

Can reinforcement learning AI take you 1 to 5? MIT New Research: AI is not the best teammate of humanity [New Zhiyuan Guide] Reinforcement learning AI is very powerful in the confrontation game, but only human players are abused. What if such a strong AI does a teammate? Recent research by MIT has shown that the cooperation between AI and human players can be said to be uncooperative, and they can't understand the various hints given by teammates!

In the experiment, the human participants played the hanabi game with the ai agent several times, and each time the ai of the teammates was different, and the experimenter did not know which model to play with.

The researchers assessed the level of human AI cooperation based on objective and subjective metrics. Objective metrics include scores, error rates, and more. Subjective metrics include the experience of human players, including their level of trust and comfort in AI team members, and their ability to understand AI motivations and predict their behavior.

There is no significant difference in the objective performance of the two AI models. But the researchers expect that human players have a more positive subjective experience with other-play because they are trained to work with other players.

According to a survey of participants, experienced hanabi players had less experience with other game RL algorithms than rules-based smartbot agents, and a key point of success was the skill of providing other players with camouflage cues.

For example, say "one block" card on the table and your teammate has two blocks in his hand. When you point to a card and say, "This is two" or "This is a square," you secretly tell your teammates to play the card without telling him everything about the card. An experienced player will be able to grasp this prompt immediately. But it's much harder to provide ai teammates with the same type of proof of information.

One participant said that I had given my teammates obvious hints, but he was useless at all, and I don't know why.

An interesting reality is that other-play has been avoiding creating "secret" conventions, they are just these predetermined rules developed when performing self-play. This makes other AI-play the best teammates for other AI algorithms, even though ai algorithms are not part of their training plans. But the researchers believe it's something he's already assumed during training about what types of teammates he'll encounter.

Notably, the other-play hypothesis is that teammates are also optimized for zero-shot coordination. In contrast, human hanabi players don't usually use this assumption to learn.

Pre-game routine setup and post-game replay are common practices for human hanabi gamers, making it easier for humans to learn the ability to get bow-shot coordination.

The researchers say the current findings suggest that ai-state's objective task performance (self-play and cross-play) may not be related to human trust and preference when it comes to working with other AI models.

This raises the question: Which objective indicators are relevant to subjective human preferences?

Can reinforcement learning AI take you 1 to 5? MIT New Research: AI is not the best teammate of humanity [New Zhiyuan Guide] Reinforcement learning AI is very powerful in the confrontation game, but only human players are abused. What if such a strong AI does a teammate? Recent research by MIT has shown that the cooperation between AI and human players can be said to be uncooperative, and they can't understand the various hints given by teammates!

Given the sheer amount of data required to train an RL-based agent, the people in the training ring are not viable. Therefore, if we want to train ai agents that are accepted and evaluated by human collaborators, we need to find objective functions that can be trained, alternatively, or closely related to human preferences.

At the same time, the researchers explained that the results of the hanabi experiment should not be extrapolated to other environments, games, or fields that they could not test.

The paper also acknowledges some of the limitations in the experiment, which researchers are working to address. For example, the subject group was small (only 29 participants) and biased towards people who were proficient in hanabi, which meant that they had predetermined behavioral expectations for AI team members and were more likely to have a negative experience with the rl agent.

However, the findings have important implications for future enhanced learning research.

If the most advanced rl agents can't even be an acceptable collaborator in a restrictive and narrow-scope game, then we really should expect the same rl technique to only work when applied to more complex, subtle, and consequenceal games and real-world situations.

In both the technical and academic fields, there is much debate about reinforcement learning, and indeed it is, and the results of the study suggest that the significant performance of the RL system should not be considered as achieving the same high performance in all possible applications.

More theoretical and applied work is needed before learning agents can become effective collaborators in situations such as complex human-robot interactions.

Resources:

.AI

Read on