In silico chemistry: Pursuit of chemical accuracy

November 16, 2017

Kirk A. Peterson from the Department of Chemistry, Washington State University discusses the fundamentals of in silico chemistry

In silico chemistry simply refers to carrying out investigations of chemical processes entirely by computational methods. Over the last few decades, computational chemistry has been an invaluable tool in understanding chemical reactivity, structure, and thermodynamics. This is particularly true for short-lived species such as free radicals and reaction intermediates, as well as novel species that have yet to be observed by experiment. Computer modelling can also provide for the study of a chemical system in a pristine, well-defined environment without some of the additional complexities occurring in an experiment that might complicate the interpretation of a fundamental process.

With the increasing power of modern computing resources, in silico chemistry has seen significant success in the prediction of thermochemical properties of molecules in the gas phase, e.g., bond enthalpies, heats of formation, ionization potentials, etc. The benchmark standard has long been the so-called “chemical accuracy” threshold, loosely defined as an accuracy of 1 kcal/mol (~4 kJ/mol).

For molecules consisting solely of atoms ranging from hydrogen to chlorine, this threshold can now almost routinely be met, and for relatively small molecules (perhaps not more than 5 non-hydrogen atoms) accuracies on the order of 0.25 kcal/mol (1 kJ/mol) are possible. The latter certainly then becomes competitive with or even exceeds the accuracy attainable with many experimental approaches to these quantities.

However, as the numbers of electrons increase and the electronic structure of the elements become more complicated, e.g., transition metals and heavy elements such as lanthanides and actinides, achieving 1 kcal/mol accuracy becomes much more difficult for purely first principle or ab initio methods. It does seem clear from current research, however, that accuracies of ~3 kcal/mol are possible even in these instances. So, what exactly is required to attain such a high level of accuracy that is reliable enough to perhaps even replace experiment in some cases?

Schrödinger’s equation

Just as in calculating the physics of everyday macroscopic particles where Newton’s equation, F=ma, must be solved, the relevant equation of quantum mechanics that describes the properties of atoms, electrons, and hence molecules, is the Schrödinger equation (SE), HΨ=EΨ. This modest-looking equation yields the possible energy states (E) of the system, as well as the wavefunction (Ψ), which is related to the probability of finding the quantum mechanical particles at some location in space.

For a given molecule (or collection of molecules), this equation describes the motion of the individual nuclei together with all their associated electrons, which, unfortunately except for the very simplest of molecules, makes this equation impossible to exactly solve, and intractable to even obtain approximate solutions.

Fortunately, the Born-Oppenheimer approximation, which recognises that nuclei and the much lighter electrons move at very different speeds, allows their motion to be separated. This leads to two separate Schrödinger equations, one for the nuclei and one for the electrons moving in the presence of the nuclei fixed in space. One then solves the latter electronic SE at different positions of the nuclei (bond lengths, angles, etc.) and the resulting potential energy function is used in the nuclear SE to obtain energies of molecular rotation, vibration, etc.

Relevant to the present discussion, thermodynamic properties can also be extracted from these calculations – with the main limitation to the final accuracy coming from solutions of the electronic SE. Unfortunately, even this cannot be exactly solved for any molecule larger than H2+, and approximate solutions yielding the desired accuracy can be very computationally demanding. In particular, this cost very steeply increases with the size of the chemical system, in terms of both the number of electrons and nuclei.

The way forward for accurate ab initio thermochemistry is via so-called composite methods [1, 2]. In these calculations, the results of a series of smaller, tractable calculations are combined to approximate the results of a single large target calculation that would presumably be impossible or impractical to carry out. In order to achieve chemical or sub-chemical accuracy, all appreciable sources of error in a calculation must be accounted for in a systematic way. The two central ones are related to how the wavefunction Ψ is approximated in the solution of the SE, and they are strongly coupled: (a) how is Ψ represented in terms of the underlying atomic orbitals and (b) how are these orbitals numerically represented.

The first is generally referred to as the quantum mechanical method, while the second refers to what is called the basis set, generally consisting of Gaussian-type functions. A major breakthrough for quantitative in silico chemistry was made more than 25 years ago when Dunning [3] introduced the family of correlation consistent Gaussian basis sets, which had the unique property of providing systematic convergence towards the complete basis set (CBS) limit, i.e., a limiting result that corresponds to the exact solution of the chosen quantum mechanical method. This effectively eliminates one of the major sources of error in a very systematic way. With contributions from our research group at Washington State University, correlation consistent families of basis sets are now available for nearly the entire periodic table [4, 5].

Hence a composite thermochemistry calculation with a goal of chemical accuracy begins with the use of an accurate, but not exact, quantum mechanical method with a sequence of correlation consistent basis sets of increasing size. These individual solutions to the electronic SE are then extrapolated to the CBS limit to remove basis set errors. Smaller contributions are then accounted for which may be chosen based on their appropriateness for the chemical system under study. Generally, these always include the effects of special relativity and molecular vibrational effects, but could also involve more esoteric contributions such as Born-Oppenheimer breakdown terms (when hydrogen atoms are involved) or quantum electrodynamics (QED). The resulting accuracy is then nearly completely dictated by the initial choice of quantum mechanical method. Coupled cluster methods are often the best choice since they can in principle be extended towards the exact solution, albeit with high computational cost.

The key to chemically accurate ab initio thermochemistry is clear – a systematic approach that in principle leads towards the exact solution of a relativistic SE is mandatory, and fortuitous error compensation must be avoided at all costs. This is what leads to truly predictive in silico chemistry.

1 Peterson, K. A.; Feller, D.; Dixon, D. A. Chemical accuracy in ab initio thermochemistry and spectroscopy: current strategies and future challenges. Theoretical Chemistry Accounts 2012, 131, 1079.

2 Dixon, D. A.; Feller, D.; Peterson, K. A. A practical guide to reliable first principles computational thermochemistry predictions across the periodic table. In Annual Reports in Computational Chemistry; Elsevier, 2012; Vol. 8, pp 1-28.

3 Dunning, T. H., Jr. Gaussian-basis sets for use in correlated

molecular calculations. I. The atoms boron through neon and

hydrogen. The Journal of Chemical Physics 1989, 90, 1007.

4 Feng, R.; Peterson, K. A. Correlation consistent basis sets for

actinides. II. The atoms Ac and Np–Lr. The Journal of Chemical Physics 2017, 147, 084108.

5 Figgen, D.; Peterson, K. A.; Dolg, M.; Stoll, H. Energy-consistent pseudopotentials and correlation consistent basis sets for the

5d elements Hf-Pt. The Journal of Chemical Physics 2009, 130, 164108.

Please note: this is a commercial profile

Kirk A Peterson

Edward R Meyer

Professor of Chemistry

Department of Chemistry