Referee ReportThe manuscript at hand reports an overview of the computational alchemy approach to evaluate catalysts and provides a standard procedure to perform such computations for specific applications. The paper does not present any new research, but rather outlines the computational alchemy method and provides an open-source and user-friendly tool to the community for them to efficiently perform their own catalyst search. Computational alchemy is an approximate method that allows us to obtain large data from a single DFT calculation, thus allowing us to perform large-scale screening of catalysts. This method was initially described by Straatsma and McCammon in 1992 and has been recently shown to hold great promise in accelerating catalyst search with less computational effort. The current work gives a detailed review of this approach and describes all the approximations that go into the methodThe manuscript provides two examples of implementation of the method, which include prediction of binding energies for OH adsorption on Pt(111) surface and predictions of reaction barriers of CH4* dehydrogenation on Pt(111) surface. These two examples serve as benchmarks of the method for binding energy estimation and reaction barrier estimation, respectively. Additionally, the authors have developed a web interface, using Jupyter Notebooks, for an easy understanding of the implementation of the method.  In conclusion, the manuscript is predominantly well written, presents a clear description of the methodology, benchmarks the methods using two examples, and provides clear and user-friendly software. I believe the tools provided will be of significant use to the materials design community and aid in the design of new catalyst materials. My recommendation, therefore, is to accept the manuscript to be published in the International Journal of Quantum Chemistry.I identified a few typos in the manuscript, which I mentioned as comments on the Authorea platform. 
Referee ReportPaper by Folmsbee and Hutchison is a great example of reproducible benchmark paper and weel suited for IJQC special issue. Both codes and data are available on GitHub. This paper compares the accuracy of various computational methods to evaluate single point energies of molecular conformers. Authors used DLPNO-CCSD(T) as a reference level of theory and benchmarked small-molecule force fields, semiempirical, DFT and several emerging machine learning (ML) techniques. This paper provides computational chemists with a substantial body of high accuracy data. Overall this paper could serve a solid practical guideline for applying approximate computational methods to a problem of conformer search. However, I identified several problems to be addressed before this paper could be accepted for publication. Specific comments (not in a particular order ) are below.The beauty of the availability of the code to review allowed me to run and reproduce some of the results of this paper. As one of the developers of ANI ML potential, I naturally tested our method first. Overall I applaud authors for advocating open science and open data. 1. All ANI models were are fitted to wB97x  DFT functional data *minus* D dispersion term. This is done because dispersion is an analytical ad hoc correction. The intention that at the run time dispersion should be added back. D3 could be easily computed with ASED3 code referenced in our GitHub.2. ANI timings are simply wrong. Therefore the TOC and Figure 4 are misleading. ANI timing is at least 100 times faster. The author's script is re-loading all python dependencies and compiles the neural network model for every conformer. This takes 2.45 out of 2.5 seconds of the run. Even with sequential energy evaluation on a CPU, it should be around 0.05s for the 2x model and probably ~0.025s for 1x/ccx . Additionally, the model could be pre-compiled with JIT and embedded into applications for even faster runs. Overall our code is native GPU and also naturally batch evaluation with multidimensional tensors. Therefore the recommended use is to load all conformers and evaluate them at once. Theoretically, all conformers for the same molecule could be computed for 0.1 seconds. I am happy to share my scripts. Most of them already are in out GitHub repos anyway. 3. Authors write:  In this work, in order to expand our range of computational methods, we only consider the relative single point energies from the same set of density-functional optimized geometries, comparing multiple current methods to a high-quality coupled cluster baseline.I think there is a fundamental flow of logic here, that ultimately hurts the value of this paper. In practical research settings where conformed sampling is used, there is no access to 3D geometries obtained with high-level QM methods. Therefore, I think the meaningful comparison would be conformed energies with geometries obtained by respective approximate methods. 4. A comparison between BOB/BAT/BATTY and ANI is also one-sided. BOB models are just molecular scorers, they just give you a number. In contrast, ANI and force fields are true automatic potentials with forces and analytic hessian. We can do geometry minimization, MD, etc.5. There is also a small pesky bug in the authors' scripts. They use different conversion factors au to kcal/mole in different places, therefore some of the energies are inaccurate to ~0.2 kcal.   

Anonymous IJQC Reviewer

Referee ReportThe paper describes a computational framework that can automate the simulations of adsorbates on a given surface. The script, starting from an initial cell containing a few layers of a substrate and an adsorbate, can detect the latter and then create a new simulation cell where the adsorbate is onto a different metal slab. Various methods are employed to determine the correct adsorbate position by automating many of the required steps.The authors mention other relevant databases and tools existing in the field and actually use some of them. Interestingly, they show how results can be deposited in the ioChem-BD online database, facilitating access to the computational results.The general concept is interesting, and the use of the interactive features of the Authorea platform can facilitate understanding (e.g. with Fig. 1, where one can click on the boxes describing intuitively the steps of the workflow and see the crystal structure at that specific stage, or the nice animated visualization of normal frequencies in Fig. 2). However, I think that the manuscript has a few weaknesses that should be addressed by the authors.At the beginning of Sec. 3.4, authors define the results of the workflows as "solid". However, e.g. in the case of MER, 88% of the VASP relaxations just work without any need of error management, and the workflow only deals with 0.1% more. In many cases (the remaining ~23%) still there is the need of manual preparation. Therefore, this strikes me as suprising, since one of the focuses of the paper is to describe how the described platform can remove human intervention. In this case, human intervention is still needed, and it is not even significantly reduced from when VASP itself would require it (only 0.1% of the total cases, with 23% of cases where manual preparation is anyway needed). I'd like to stress that I understand that humans are helped in creating the input cells; still, the results don't strike me as "solid", and also the authors acknowledge in the abstract that performance is only "good" or even just "decent". Therefore, I don't believe that, for instance, the sentence in the conclusion "Our framework has proven to successfully automate two different ..." is accurately describing the advantage of the framework over VASP itself.In addition to the point above, one way they mention they used to achieve convergence is to replace the metal slab with another one. But isn't this a different system? What if I really want to simulate that specific material?Authors mention that putting data on ioChem-BD makes their research FAIR. However, I could only find less than 10 systems in the database, while in Fig. 3 they report over 300 runs. Do they intend to make this data public to make the paper really FAIR? Otherwise, this is just a proof-of-concept demonstration but not really a FAIR paper.It is not clear to me what amount of reproducibility the ioChem-BD guarantees. Can the authors describe advantages and limitations of the database? For instance:what is available to allow an external researcher to reproduce the simulation, and what is not? Are input files of VASP available (I think some of them are available, but some only in parsed form like the initial coordinates?);Are output files of VASP available (only a parsed .cml is provided? are raw outputs available? Is it possible to add a link to the CML specifications/schema? Is it possible to provide information on the code and version that performed the parsing?)Is it possible to get inputs and outputs also for the other computational steps (I think only the final ones of VASP are provided?)Is it possible to retrieve information on how the inputs of the simulations were obtained? (e.g. if the input of VASP was obtained by a relaxation, or the simulation was the restart of another one, is this specified somewhere?)Also, at the end of Sec. 3.4 they speak about NEB calculations - is is possible to inspect them and see the results? I believe that the paper requires an overall revision for what concerns the use of the English language.There are quite a few grammar mistakes (e.g. "Our framework show" instead of "Our framework shows" in the abstract, "that can be search" instead of "that can be searched" at the end of Sec. 1, "a Gamma-centered mesh have" instead of "a Gamma-centered mesh has" in Sec. 2, "Niquel" instead of "Nickel" in one of the captions of Fig. 1, etc. (there are quite a few more occurrences later). Also, the reference of "FAIR" as "functional, accessible, interoperable, recyclable" is incorrect (F is findable, R is reusable).Moreover, I never found (or could find) the use of the term "avoidhuman", that also sounds to be as having a negative connotation, and I would suggest therefore to replace with some other term ("automation"?).In addition, sometimes the use of wording is unusual or incorrect, and in some cases I feel that it make it hard to understand the actual meaning of some sentences. I report here some examples: "infinite xyz coordinate listing" in Sec. 1 (I guess they mean "very long" rather than infinite); the mention in the abstract that the framework performs an "experimental" procedure is very confusing (I understood much later that it is instead a computational paper, and it is not describing an experimental protocol); some sentences are long and not clear, like in Sec. 1 "As the applications grow and the access to massive computers and robust codes extends worldwide structural data, spectroscopic fingerprints, general properties can be generated as databases for molecules, nanostructures and materials." or in Sec. 2 "All the intermediates belong to the same reaction network, being the transition states all the possible elemental steps involving the intermediates.". At the end of Sec. 3.1, authors say "After a few tests, further improvements were integrated to the transfer algorithm." but it is not clear in detail which improvements were integrated, and the technical details of these (i.e., it is not possible with the information provided to try to reproduce their results).Is the code described in the paper available somewhere? In order to have a really "FAIR" and reusable dataset, it would be important to be able to rerun the same simulations/workflow.