Meta’s algorithm tackles both language and strategy in a classic board game that involves negotiation
Diplomacy, many a statesperson has argued, is an art: one that requires not just strategy, but also intuition, persuasion, and even subterfuge—human skills that have long been off-limits to even the most powerful artificial intelligence (AI) approaches. Now, an AI algorithm from the company Meta has shown it can beat many humans in the board game Diplomacy, which requires both strategic planning and verbal negotiations with other players. The work, researchers say, could point the way toward virtual exercise coaches and dispute mediators. International chatbot diplomacy may not be far behind.
“These are spectacular new results,” says Yoram Bachrach, a computer scientist at DeepMind who has worked on the game Diplomacy but was not involved in the new research. “I’m particularly excited about Diplomacy because it’s an exceptional environment for studying cooperative AI,” in which machines don’t just compete, but collaborate.
AI has already bested humans in games of strategy such as chess, Go, poker, and the video game Dota 2. It is also proving powerful at natural-language processing, in which it can generate humanlike text and carry on conversations. The game of Diplomacy requires both. It involves seven players vying for control of Europe. On each turn, players issue orders regarding the movement of army and naval units, following discussion with other players, whom they can attack or support. Success typically requires building trust—and occasionally abusing it. Both former President John F. Kennedy and former Secretary of State Henry Kissinger were fans of the game.
Previous AI research has focused on a version of the game called no-press Diplomacy, in which players do not communicate. That itself is a challenge for computers because the game’s combination of cooperation and competition requires pursuing conflicting goals. The new work, published this week in Science, is the first to achieve respectable results in the full game. Noam Brown, a computer scientist at Meta who co-authored the paper, says when he started on the project, in 2019, he thought success would require a decade. “The idea that you can have an AI that’s talking strategy with another person and planning things out and negotiating and building trust seemed like science fiction.”
Meta’s AI agent, CICERO, welds together a strategic reasoning module and a dialogue module. As in other machine learning AIs, the modules were trained on large data sets, in this case 125,261 games that humans had played online—both the game plays and transcripts of player negotiations.
The researchers trained the strategic reasoning module by having the agent play against copies of itself. It learned to choose actions based on the state of the game, any previous dialogue, and the predicted actions of other players, looking several moves ahead. During training, the researchers also rewarded it for humanlike play so that its actions wouldn’t confound other players. In any domain, whether dinner-table manners or driving, conventions tend to ease interactions.
The dialogue module also required tuning. It was trained not only to imitate the kinds of things people say in games, but to do so within the context of the state of the game, previous dialogue, and what the strategic planning module intended to do. On its own, the agent learned to balance deception and honesty. In an average game, it sent and received 292 messages that mimicked typical game slang. For example, one message read, “How are you thinking Germany is gonna open? I may have a shot at Belgium, but I’d need your help into Den[mark] next year.”
Jonathan Gratch, a computer scientist at the University of Southern California who studies negotiation agents—and provided early guidance for a Defense Advanced Research Projects Agency program that is also trying to master Diplomacy—notes two technical innovations. First, CICERO grounds its communication in multistep planning, and second, it keeps its remarks and game play within the realm of human convention.
To test its skill, the researchers had CICERO play 40 online games against humans (who mostly assumed it was a human). It placed in the top 10% of players who’d played at least two games. “In a game that involves language and negotiation, that agents can reach human parity is very exciting,” says Zhou Yu, a computer scientist at Columbia University who studies dialogue systems.
Gratch says the work is “impressive” and “important.” But he questions how much CICERO’s dialogue, as opposed to its strategic planning, contributed to its success. According to the paper, Diplomacy experts rated about 10% of CICERO’s messages as inconsistent with its plan or game state. “That suggests it’s saying a lot of crap,” Gratch says. Yu agrees, noting that CICERO sometimes utters non sequiturs.
Brown says the work could lead to practical applications in niches that now require a human touch. One concrete example: Virtual personal assistants might help consumers negotiate for better prices on plane tickets. Gratch and Yu both see opportunities for agents that persuade people to make healthy choices or open up during therapy. Gratch says negotiation agents could help resolve disputes between political opponents.
Researchers also see risks. Similar agents could manipulate political views, execute financial scams, or extract sensitive information. “The idea of manipulation is not necessarily bad,” Gratch says. “You just have to have guardrails,” including letting people know they are interacting with an AI and that it will not lie to them. “Ideally people are consenting, and there’s no deception.”