Thank you for the comments. In general I think we have a very different opinion on what a scientific contribution (and more so an alleged scientific breakthrough) entails:
1) Reproducibility: It is not a matter of whether others have computational resources or not. AlphaZero could be evaluated because Stockfish is open-source (the same as your engine if I am not mistaken). Therefore it can be used by anyone to compare, and also to test its weakness and strengths. If another researcher has a new groundbreaking idea, how can he/she test his/her hypothesis? If we were to believe AlphaZero is the state of the art in game playing, what could we do to test our system against AlphaZero? According to your comment, each researcher should develop AlphaZero from scratch by just following the indications given in the paper (which also lacks important details for its reproduction, but that would be another discussion). With no code to train AlphaZero, no trained version available and only a few cherry-picked victories released, there is few we can test/verify about AlphaZero. In fact, if AlphaZero already trained were to be released, the testing could be considered affordable by some strong institutions/companies as “only” 4 TPUs were used for testing, and probably less processing power is needed by sacrificing some speed.
2) Fairness of results vs. Stockfish: I would ask your question differently. Hypothetically, what would be the point of evaluating a system against another system which is not used at its best and under a questionable setting? Proper evaluation settings and solid baselines help providing better scientific insights and to draw more reliable conclusions. In general I very much agree with a reflection made by Rodney Brooks on this subject: “We examine and poke holes so that we understand and draw accurate conclusions, rather than simply ‘appreciate’. It’s called science.” https://twitter.com/rodneyabrooks/status/941065409502912513
3) Being pre-programmed with the rules: My point here is that for each new game it is not trivial how to encode rules in a neural network architecture. In fact, this could be arguably considered as one of the main achievements in this paper, as the authors found a suitable architecture for encoding the particularities of chess and shogi. Each game has its own specific features (this was also argued on AlphaZero’s paper) and therefore it may not to be easily generalizable to other domains, or even other games (I gave the example of tennis here), despite what claimed by the authors who defined their methodology as general-purpose.