Is AlphaZero really a scientific breakthrough in AI?

Garry Kasparov vs IBM Deep Blue. 1997. Source: Reuters
  • Availability/Reproducibility. None of the AlphaZero systems developed by DeepMind are accessible to the public: the code is not publicly available and there is not even a commercial version for users to test it. This is an important impediment, as from the scientific point view these approaches can be neither validated nor built upon it by other experts. This lack of transparency makes it also almost impossible for their experiments to be reproduced.
  • 4-hour training. The amount of training of AlphaZero has been one of the most confusing elements as explained by general media. According to the paper, after 4 hours of training on 5000 TPUs the level of AlphaZero was already superior to the open-source chess engine Stockfish (the fully-trained AlphaZero took a few more hours to train). This means that the time spent by AlphaZero per TPU was roughly two years, a time which would be considerably higher on a normal CPU. So, even though the 4-hour figure may seem impressive (and it is indeed impressive), this is mainly due to the large capacities of computing power available nowadays with respect to some years ago, especially for a company like DeepMind investing heavily on it. For example, by 2012 all chess positions with seven pieces or less had been mathematically solved, using significantly less computing power [9]. This improvement on computing power paves the way for the development of newer algorithms, and probably in a few years a game like chess could be almost solved by heavily relying on brute force.
  • Experimental setting versus Stockfish. In order to prove the superiority of AlphaZero over previous chess engines, a 100-game match against Stockfish was played (AlphaZero beat Stockfish 64–36). The selection of Stockfish as the rival chess engine seems reasonable, being open-source and one of the strongest chess engines nowadays. Stockfish ended 3rd (behind Komodo and Houdini) in the most recent TCEC (Top Chess Engine Competition) [10], which is considered the world championship of chess engines. However, the experimental setting does not seem fair. The version of Stockfish used was not the last one but, more importantly, it was run in its released version on PC, while AlphaZero was ran using considerably higher processing power. For example, in the TCEC competition engines play against each other using the same processor. Additionally, the selection of the time seems odd. Each engine was given one minute per move. However, in the vast majority of human and engine competitions each player is given a fixed amount of time for the whole game, and then this time is administered individually. As Tord Romstad, one of the original developers of Stockfish, declared, this was another questionable decision in detriment of Stockfish, as “lot of effort has been put into making Stockfish identify critical points in the game and decide when to spend some extra time on a move” [10]. Tord Romstad also pointed out to the fact that Stockfish “was playing with far more search threads than has ever received any significant amount of testing”. Generally, the large percentage of victories of AlphaZero against Stockfish has come as a huge surprise for some top chess players, as it challenges the common belief that chess engines had already achieved an almost unbeatable strength (e.g. Hikaru Nakamura, #9 chess player in the world, showed some scepticism about the low draw-rate in the AlphaZero-Stockfish match [11]).
  • 10 games against Stockfish. Along with the paper only 10 sample games were shared, all of them victories of AlphaZero [12]. These games have been praised by all the chess community in general, due to the seemingly deep understanding displayed by AlphaZero in these games: Peter Heine Nielsen [13], chess Grandmaster and coach of the world champion Magnus Carlsen, or Maxime Vachier Lagrave [11], #5 chess player in the world, are two examples of the many positive reactions about the performance of AlphaZero against Stockfish in these games. However, the decision to release only ten victories of AlphaZero raises other questions. It is customary in scientific papers to show examples on which the proposed system displays some weaknesses or may not behave as well in order to have a more global understanding and for other researchers to build upon it. Another question which does not seem clear from the paper is if the games started from a particular opening or from scratch. Given the variety of openings displayed in these ten games, it seems that some initial positions were predetermined.
Game between AlphaZero and Stockfish. Last move: 26. Qh1! Top Grandmaster Francisco Vallejo Pons defined this game as “science-fiction”. Source: chess24
  • Self-play. Does AlphaZero completely learn from self-play? This seems to be true according to the details provided in the paper, but with two important nuances: the rules and the typical number of moves have to be taught to the system before starting playing with itself. The first nuance, although looking obvious, is not as trivial as it seems. A lot of work has to be dedicated to find a suitable neural network architecture on which these rules are encoded, as also explained in the AlphaZero paper. The initial architecture based on convolutional neural networks used in AlphaGo was suitable for Go, but not for other games. For instance, unlike Go, chess and shogi are asymmetric and some pieces behave differently depending on their position. In the newest AlphaZero, a more generic version of the AlphaGo algorithm was introduced, englobing games like chess and Shogi. The second nuance (i.e. the typical number moves was given to AlphaZero to “scale the exploration noise”) also requires some prior knowledge of the game. The games that exceeded a maximum number of steps were terminated with a draw outcome (this maximum number of steps is not provided) and it is not clear whether this heuristic was also used in the games against Stockfish or only during training.
  • Generalization. The use of a general-purpose reinforcement learning that can succeed in many domains is one of the main claims in AlphaZero. However, following the previous point on self-play, a lot of debate has been going around with regards to the capability of AlphaGo and AlphaZero systems to generalize to other domains [14]. It seems unrealistic to think that many situations in real-life can be simplified to a fixed predefined set of rules, as it is the case of chess, Go or Shogi. Additionally, not only these games are provided with a fixed set of rules, but also, although with different degrees of complexity, these games are finite, i.e. the number of possible configurations is bounded. This would differ with other games which are also given a fixed set of rules. For instance, in tennis the number of variables that have to be taken into account are difficult to quantify and therefore to take into account: speed and direction of wind, speed of the ball, angle of the ball and the surface, surface type, material of the racket, imperfections on the court, etc.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jose Camacho Collados

Jose Camacho Collados

Mathematician, AI/NLP researcher and chess International Master. http://www.josecamachocollados.com