Is AlphaZero really a scientific breakthrough in AI?

As you may probably know, DeepMind has recently published a paper on AlphaZero [1], a system that learns by itself and is able to master games like chess or Shogi.

Before getting into details, let me introduce myself. I am a researcher in the broad field of Artificial Intelligence (AI), specialized in Natural Language Processing. I am also a chess International Master, currently the top player in South Korea although practically inactive for the last few years due to my full-time research position. Given my background I have tried to build a reasoned opinion on the subject as constructive as I could. For obvious reasons, I have focused on chess, although some arguments are general and may be extrapolated to Shogi or Go as well. This post represents solely my view and I may have misinterpreted some particular details on which I am not an expert, for which I apologize in advance if it is the case.

Chess has arguably been the most widely studied game in the context “human vs machine” and AI in general. One of the first breakthroughs in this area was the victory of IBM Deep Blue in 1997 over the world champion at the time Garry Kasparov [2]. At that time machines were considered inferior to humans in the game of chess, but from that point onwards, the “battle” has been clearly won by machines.

Garry Kasparov vs IBM Deep Blue. 1997. Source: Reuters

On a related note, DeepMind released a couple of years ago AlphaGo, a Go engine which was able to beat some of the best human players of Go [3]. Note that the complexity of Go is significantly larger than in chess. This has been one of the main reasons why, even with the more advanced computation power available nowadays, Go was still a game on which humans were stronger than machines. Therefore, this may be considered a breakthrough in itself. This initially impressive result was improved with AlphaGo Zero which, as claimed by the authors, learnt to master Go entirely by self-play [4]. And more recently AlphaZero, a similar model that trains a neural network architecture with a generic reinforcement learning algorithm which has beaten some of the best engines in Shogi and chess [1].

This feat has been extensively covered by mass media [5,6] and chess-specialized media [7,8], with bombastic notes about the importance of the breakthrough. However, there are reasonable doubts about the validity of the overarching claims that arise from a careful reading of AlphaZero’s paper. Some of these concerns may not be considered as important by themselves and may be explained by the authors. Nevertheless, all the concerns added together cast reasonable doubts about the current scientific validity of the main claims. In what follows I enumerate some general concerns:

  • Availability/Reproducibility. None of the AlphaZero systems developed by DeepMind are accessible to the public: the code is not publicly available and there is not even a commercial version for users to test it. This is an important impediment, as from the scientific point view these approaches can be neither validated nor built upon it by other experts. This lack of transparency makes it also almost impossible for their experiments to be reproduced.
Game between AlphaZero and Stockfish. Last move: 26. Qh1! Top Grandmaster Francisco Vallejo Pons defined this game as “science-fiction”. Source: chess24
  • Self-play. Does AlphaZero completely learn from self-play? This seems to be true according to the details provided in the paper, but with two important nuances: the rules and the typical number of moves have to be taught to the system before starting playing with itself. The first nuance, although looking obvious, is not as trivial as it seems. A lot of work has to be dedicated to find a suitable neural network architecture on which these rules are encoded, as also explained in the AlphaZero paper. The initial architecture based on convolutional neural networks used in AlphaGo was suitable for Go, but not for other games. For instance, unlike Go, chess and shogi are asymmetric and some pieces behave differently depending on their position. In the newest AlphaZero, a more generic version of the AlphaGo algorithm was introduced, englobing games like chess and Shogi. The second nuance (i.e. the typical number moves was given to AlphaZero to “scale the exploration noise”) also requires some prior knowledge of the game. The games that exceeded a maximum number of steps were terminated with a draw outcome (this maximum number of steps is not provided) and it is not clear whether this heuristic was also used in the games against Stockfish or only during training.

We should scientifically scrutinize alleged breakthroughs carefully, especially in the period of AI hype we live now. It is actually responsibility of researchers in this area to accurately describe and advertise our achievements, and try not to contribute to the growing (often self-interested) misinformation and mystification of the field. In fact, this early December in NIPS, arguably the most prestigious AI conference, some researchers showed important concerns about the lack of rigour of this scientific community in recent years [15].

In this case, given the relevance of the claims, I hope these concerns will be clarified and solved in order to be able to accurately judge the actual scientific contribution of this feat, a judgement that it is not possible to make right now. Probably with a better experimental design as well as an effort on reproducibility the conclusions would be a bit weaker as originally claimed. Or probably not, but it is hard to assess unless DeepMind puts some effort into this direction. I personally have a lot of hope in the potential of DeepMind in achieving relevant discoveries in AI, but I hope these achievements will be developed in a way that can be easily judged by peers and contribute to society.

— — — — — -

[1] Silver et al. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm.” arXiv preprint arXiv:1712.01815 (2017). https://arxiv.org/pdf/1712.01815.pdf

[2] https://en.wikipedia.org/wiki/Deep_Blue_versus_Garry_Kasparov

[3] https://www.theguardian.com/technology/2016/mar/15/googles-alphago-seals-4-1-victory-over-grandmaster-lee-sedol

[4] Silver et al. “Mastering the game of go without human knowledge.” Nature 550.7676 (2017): 354–359. https://www.gwern.net/docs/rl/2017-silver.pdf

[5] https://www.theguardian.com/technology/2017/dec/07/alphazero-google-deepmind-ai-beats-champion-program-teaching-itself-to-play-four-hours

[6] http://www.bbc.com/news/technology-42251535

[7] https://chess24.com/en/read/news/deepmind-s-alphazero-crushes-chess

[8] https://www.chess.com/news/view/google-s-alphazero-destroys-stockfish-in-100-game-match

[9] http://chessok.com/?page_id=27966

[10] https://hunonchess.com/houdini-is-tcec-season-10-champion/

[11] https://www.chess.com/news/view/alphazero-reactions-from-top-gms-stockfish-author

[12] Link to reproduce the 10 games of AlphaZero against Stockfish: https://chess24.com/en/watch/live-tournaments/alphazero-vs-stockfish/1/1/1

[13] https://www.twitch.tv/videos/207257790

[14] https://medium.com/@karpathy/alphago-in-context-c47718cb95a5

[15] Ali Rahimi compared current Machine Learning practices with “alchemy” in his talk at NIPS 2017 following the reception of his test of time award: https://www.youtube.com/watch?v=ORHFOnaEzPc

--

--

Mathematician, AI/NLP researcher and chess International Master. http://www.josecamachocollados.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store