In multi-agent reinforcement learning experiments, researchers try to optimize the performance of a learning agent on a given task, in cooperation or competition with one or more agents. These agents learn by trial-and-error, and researchers may choose to have the learning algorithm play the role of two or more of the different agents. When successfully executed, this technique has a double advantage:
Czarnecki et al1 argue that most of the games that people play for fun are "Games of Skill", meaning games whose space of all possible strategies looks like a spinning top. In more detail, we can partition the space of strategies into sets L 1 , L 2 , . . . , L n {\displaystyle L_{1},L_{2},...,L_{n}} , such that any i < j , π i ∈ L i , π j ∈ L j {\displaystyle i<j,\pi _{i}\in L_{i},\pi _{j}\in L_{j}} , the strategy π j {\displaystyle \pi _{j}} beats the strategy π i {\displaystyle \pi _{i}} . Then, in population-based self-play, if the population is larger than max i | L i | {\displaystyle \max _{i}|L_{i}|} , then the algorithm would converge to the best possible strategy.
Self-play is used by the AlphaZero program to improve its performance in the games of chess, shogi and go.2
Self-play is also used to train the Cicero AI system to outperform humans at the game of Diplomacy. The technique is also used in training the DeepNash system to play the game Stratego.34
Self-play has been compared to the epistemological concept of tabula rasa that describes the way that humans acquire knowledge from a "blank slate".5
Czarnecki, Wojciech M.; Gidel, Gauthier; Tracey, Brendan; Tuyls, Karl; Omidshafiei, Shayegan; Balduzzi, David; Jaderberg, Max (2020). "Real World Games Look Like Spinning Tops". Advances in Neural Information Processing Systems. 33. Curran Associates, Inc.: 17443–17454. arXiv:2004.09468. https://proceedings.neurips.cc/paper/2020/hash/ca172e964907a97d5ebd876bfdd4adbd-Abstract.html ↩
Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (5 December 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]. /wiki/David_Silver_(programmer) ↩
Snyder, Alison (2022-12-01). "Two new AI systems beat humans at complex games". Axios. Retrieved 2022-12-29. https://www.axios.com/2022/12/01/ai-beats-humans-complex-games ↩
Erich_Grunewald (22 December 2022), "Notes on Meta's Diplomacy-Playing AI", LessWrong https://www.lesswrong.com/posts/oT8fmwWddGwnZbbym/notes-on-meta-s-diplomacy-playing-ai ↩
Laterre, Alexandre (2018). "Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization". arXiv:1712.01815 [cs.AI]. /wiki/ArXiv_(identifier) ↩