Self-play

<h2 id="definition-and-motivation">Definition and motivation</h2>
<p>In <a href="/facts/Multi-agent_reinforcement_learning/X7v1Wr2u">multi-agent reinforcement learning</a> experiments, researchers try to optimize the performance of a learning agent on a given task, in cooperation or competition with one or more agents. These agents learn by trial-and-error, and researchers may choose to have the learning algorithm play the role of two or more of the different agents. When successfully executed, this technique has a double advantage:
</p>
<ol><li>It provides a straightforward way to determine the actions of the other agents, resulting in a meaningful challenge.</li>
<li>It increases the amount of experience that can be used to improve the policy, by a factor of two or more, since the viewpoints of each of the different agents can be used for learning.</li></ol>
<p>Czarnecki et al<a class="footnote-ref" id="fnref:1" href="#fn:1"><sup>1</sup></a> argue that most of the games that people play for fun are "Games of Skill", meaning games whose space of all possible strategies looks like a spinning top. In more detail, we can partition the space of strategies into sets 
  
    
      
        
          L
          
            1
          
        
        ,
        
          L
          
            2
          
        
        ,
        .
        .
        .
        ,
        
          L
          
            n
          
        
      
    
    {\displaystyle L_{1},L_{2},...,L_{n}}
  
, such that any 
  
    
      
        i
        <
        j
        ,
        
          π
          
            i
          
        
        ∈
        
          L
          
            i
          
        
        ,
        
          π
          
            j
          
        
        ∈
        
          L
          
            j
          
        
      
    
    {\displaystyle i<j,\pi _{i}\in L_{i},\pi _{j}\in L_{j}}
  
, the strategy 
  
    
      
        
          π
          
            j
          
        
      
    
    {\displaystyle \pi _{j}}
  
 beats the strategy 
  
    
      
        
          π
          
            i
          
        
      
    
    {\displaystyle \pi _{i}}
  
. Then, in population-based self-play, if the population is larger than 
  
    
      
        
          max
          
            i
          
        
        
          |
        
        
          L
          
            i
          
        
        
          |
        
      
    
    {\displaystyle \max _{i}|L_{i}|}
  
, then the algorithm would converge to the best possible strategy.
</p>
<h2 id="usage">Usage</h2>
<p>Self-play is used by the <a href="/facts/AlphaZero/6uujYyM5">AlphaZero</a> program to improve its performance in the games of <a href="/facts/Chess/47ykvkHF">chess</a>, <a href="/facts/Shogi/z12HJ3vB">shogi</a> and <a href="/facts/Go_(game)/1fKhemww">go</a>.<a class="footnote-ref" id="fnref:2" href="#fn:2"><sup>2</sup></a>
</p><p>Self-play is also used to train the Cicero AI system to outperform humans at the game of <a href="/facts/Diplomacy_(game)/Aste10ta">Diplomacy</a>. The technique is also used in training the DeepNash system to play the game <a href="/facts/Stratego/pV1BLD2J">Stratego</a>.<a class="footnote-ref" id="fnref:3" href="#fn:3"><sup>3</sup></a><a class="footnote-ref" id="fnref:4" href="#fn:4"><sup>4</sup></a>
</p>
<h2 id="connections-to-other-disciplines">Connections to other disciplines</h2>
<p>Self-play has been compared to the epistemological concept of <a href="/facts/Tabula_rasa/esxlgDCh">tabula rasa</a> that describes the way that humans acquire knowledge from a "blank slate".<a class="footnote-ref" id="fnref:5" href="#fn:5"><sup>5</sup></a>
</p>
<h2 id="further-reading">Further reading</h2>
<ul><li>DiGiovanni, Anthony; Zell, Ethan; et al. (2021). "Survey of Self-Play in Reinforcement Learning". <a href="/facts/ArXiv_(identifier)/H6EtgnBe">arXiv</a>:<a href="https://arxiv.org/abs/2107.02850">2107.02850</a> [<a href="https://arxiv.org/archive/cs.GT">cs.GT</a>].</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Czarnecki, Wojciech M.; Gidel, Gauthier; Tracey, Brendan; Tuyls, Karl; Omidshafiei, Shayegan; Balduzzi, David; Jaderberg, Max (2020). "Real World Games Look Like Spinning Tops". Advances in Neural Information Processing Systems. 33. Curran Associates, Inc.: 17443–17454. arXiv:2004.09468. <a href="https://proceedings.neurips.cc/paper/2020/hash/ca172e964907a97d5ebd876bfdd4adbd-Abstract.html" target="_blank">https://proceedings.neurips.cc/paper/2020/hash/ca172e964907a97d5ebd876bfdd4adbd-Abstract.html</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (5 December 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI]. <a href="/wiki/David_Silver_(programmer)" target="_blank">/wiki/David_Silver_(programmer)</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>Snyder, Alison (2022-12-01). "Two new AI systems beat humans at complex games". Axios. Retrieved 2022-12-29. <a href="https://www.axios.com/2022/12/01/ai-beats-humans-complex-games" target="_blank">https://www.axios.com/2022/12/01/ai-beats-humans-complex-games</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>Erich_Grunewald (22 December 2022), "Notes on Meta's Diplomacy-Playing AI", LessWrong <a href="https://www.lesswrong.com/posts/oT8fmwWddGwnZbbym/notes-on-meta-s-diplomacy-playing-ai" target="_blank">https://www.lesswrong.com/posts/oT8fmwWddGwnZbbym/notes-on-meta-s-diplomacy-playing-ai</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>Laterre, Alexandre (2018). "Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization". arXiv:1712.01815 [cs.AI]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
</ol>

Self-play open-in-new

Self-play