Putational Chemistry v2
Putational Chemistry v2
September 3, 2022
Contents
Preface 4
Introduction 5
4 Thermochemistry 34
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Procedure & Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1
8 Chemical Reaction Mechanisms: Intrinsic Reaction Coordinate (IRC) 65
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.2 Procedure & Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2
17 Ab-initio Molecular Dynamics (AIMD) Simulations 131
17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
17.2 Procedure & Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
17.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
18 Appendices 142
18.1 Programs & Software for Computational Chemistry . . . . . . . . . . . . . . . . . . . 142
18.2 Atomic Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
18.3 Common Acronyms Used in Computational Chemistry . . . . . . . . . . . . . . . . . 143
18.4 Miscellaneous Methods In Computational Chemistry . . . . . . . . . . . . . . . . . . 143
18.5 Energy Gaps & Reactivity Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . 146
18.6 Excited Elecronic States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
18.7 Redox Potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
18.8 Dye-Sensitized Solar Cells (DSSC’s) Efficiency Variables . . . . . . . . . . . . . . . . 149
18.9 Notation in Quantum Chemistry (IUPAC Recom.) . . . . . . . . . . . . . . . . . . . . 151
18.9.1 Ab-initio HF-SCF & Hückel Molecular Orbital (HMO) Theory1 . . . . . . . . 151
18.9.2 HF-Roothan SCF using LCAO-MO1 . . . . . . . . . . . . . . . . . . . . . . . 152
18.9.3 Other Commonly Notation Used . . . . . . . . . . . . . . . . . . . . . . . . . 152
18.10Other Useful Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
18.10.1 Fundamental Physical Constants . . . . . . . . . . . . . . . . . . . . . . . . . 153
18.10.2 The Greek Alphabet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
18.10.3 Periodic Table of the Elements . . . . . . . . . . . . . . . . . . . . . . . . . . 154
18.10.4 Atomic Masses for Selected Isotopes . . . . . . . . . . . . . . . . . . . . . . . . 154
18.10.5 Energy Conversion Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
3
Preface
This manual is intended to cover various experiments in computational chemistry. Students and
readers are assumed to have a reasonable background in quantum chemistry, acquired through an
advanced undergraduate or an introductory postgraduate standard quantum chemistry course. How-
ever, a brief exposure of some basic ideas needed to follow the material of many experiments are
presented when needed in blue boxes entitled: “Theoretical Background”. In general, the procedure
and the technical details of the experiments can be performed without the theoretical background
found in these blue boxes. Nevertheless, a good understanding of the theory behind will definitely
help in better “interpretation” and “control” of the methods and techniques of the corresponding
experiment. More details about the theoretical models used in modern quantum and computational
chemistry are found in many excellent textbooks, some of which are listed in the bibliography at the
end of the manual.
In the first part of the present manual, some of the principal features of the Gaussview and Gaus-
sian programs are highlighted to enable the student to start working productively with these two
related programs. The examples/descriptions are inevitably brief and do not aim to be a compre-
hensive guide. Experience in using the programs and consulting the manuals supplied with them is
the only means of achieving proficiency. It is hoped that this training manual plus the accompanying
exercises will help ease the initial learning curve. The experiments of this part, especially the first
five experiments, are particularly designed for those with no previous experience or background in
any computational chemistry technique.
Some more advanced experiments using the ORCA program are presented in the second part of the
manual. The reader of these experiments is assumed to have some basic computational background
acquired through a working experience in some previous computational projects or through working
out at least the first five experiments of part I. It should be emphasized that the present manual is
far to be comprehensive and do not cover all aspects of the field.
Wissam Helal
4
Introduction
Computational Chemistry
The computational chemical methods and tools are widely used in chemistry, not just by theoretical
chemists but by many experimental chemists as well. Theoretical chemical models and methods aid
to predict and interpret experimental results.
Chemistry is knowing the energy as a function of nuclear coordinates 1 . In fact, structure, chem-
ical reactions and reactivity, thermodynamics, spectroscopy and many more chemical and physical
properties can be investigated by studying how energy changes with structure, or more precisely,
how energy changes with the reaction coordinate.
The experiments in this manual will try to demonstrate how we can understand chemistry by
relating energy to structural geometry changes. One useful way to do so is by solving the Schrodinger
equation for the chemical system of interest.
The stationary electronic states are determined by the time-independent Schrödinger equation:
Ĥψ = Eψ (1)
where ψ is the wavefunction written with space-spin coordinates and E is the associated energy. Ĥ
is the Hamiltonian of the system. The molecular Hamiltonian operator for a system of M nuclei and
N electrons described by position vectors RA and ri respectively, is the sum of all kinetic energy
operator terms of electrons (T̂e ) and nuclei (T̂n ); and the potential energy operator terms of nuclei
and electrons attractions (V̂ne ), interelectronic repulsions (V̂ee ), and internuclear repulsions (V̂nn ):
N M N XM N N M M
X 1 X1 X ZA X X 1 X X ZA ZB
Ĥ = − −∇2i 2
∇ − + + (2)
i=1
2 A=1
2MA A i=1 A=1 riA i=1 j>i rij A=1 B>A RAB
| {z } | {z } | {z } | {z } | {z }
T̂e T̂n V̂ne V̂ee V̂nn
MA is the ratio of the mass of nucleus A to the mass of an electron. The restrictions j > i and B > A
in V̂ee and V̂nn avoids counting each interparticle repulsion twice and avoids 1/rii and 1/RAA terms.
1
F. Jensen, Introduction to Computational Chemistry, 3rd ed, Wiley, 2017
5
Theoretical Background: The Born-Oppenheimer Approximation
Born and Oppenheimer pointed out that since nu- The total energy for fixed nuclei must include the
clei are thousands of times more massive than elec- nuclear repulsion constant Vnn :
trons, they move much more slowly and can be
treated as being stationary in considering the mo- Etot = Eel + Vnn (6)
tions of the electrons in molecules. According to
the Born-Oppenheimer (BO) approximation, the
where Vnn is
electrons in a molecule are considered to be mov-
ing in the field of fixed nuclei. Within this approx- M X M
imation, T̂n = 0 and V̂nn term is a constant. The X ZA ZB
Vnn = (7)
remaining terms in (2) are called the electronic RAB
A=1 B>A
Hamiltonian (Ĥel ),
N N X M N N
X 1 X ZA X X 1 Apart from making calculation on molecular sys-
Ĥel = − ∇2i − + (3)
2 r r tems feasible, the BO approximation introduces
i=1 i=1 A=1 iA i=1 j>i ij
| {z } | {z } | {z } the concept of potential energy surface, PES, given
T̂e V̂ne V̂ee by Eel . Within the adiabatic approximation, the
nuclei move on a Potential Energy Surface (PES)
The solution to a Schrödinger equation involving
obtained by calculating Ψel and Eel using the BO
the electronic Hamiltonian
approximation at each nuclear configuration on
Ĥel Ψel = Eel Ψel (4) the PES.
is the electronic wave function, The BO and the adiabatic approximations are
extensively used in quantum chemical methods.
Ψel = Ψel (ri ) (5) However, non-adiabatic processes are present in
which describes the motion of the electrons and many important chemical systems. This is partic-
explicitly depends on the electronic coordinates, ularly true in some charge transfer reactions and
as does the electronic energy, Eel . non-crossing (or avoided crossing) regions.
Recall that Schrodinger equation can be solved exactly only for systems that contains one electron.
In atoms and molecules that contain two or more electrons, Schrodinger equation can be solved using
approximations. Therefore, almost every aspect of chemistry has been described in a qualitative or
approximate quantitative (semiquantitative) computational scheme.
The biggest mistake that chemists can make is to assume that any molecular property (a number)
obtained from a quantum chemical calculation is exact! However, often a qualitative or approximate
computation can give us useful and valuable insight into chemistry if you understand what it tells you
and what it doesn’t. Accuracy of the computed molecular properties depends on both the method
an basis set used.
1. Basis Set
A basis set is a set of functions used to represent the shapes of atomic orbitals used to build molec-
ular orbitals in molecules (recall that molecular orbitals can be approximated as linear combination
of atomic orbitals, LCAO).
6
As a general guide, the “larger” the basis set, the “better” it can represent the atoms and bonds
that constitute a given molecule.
We will explain more about the meaning of all these basis sets later, but for the moment, we can
say that 6-31G(d,p) is widely and commonly used basis set in computational quantum chemistry
calculations, with an acceptable accuracy for many applications. The larger basis set 6-311G(d,p)
can be used if more accuracy is desired, but of course take more time to finish a calculation.
2. Computational Methods
There are so many methods that can be used to calculate, energy and other properties. Basically,
most computational methods can be either:
1. Non-Quantum Chemical Methods: Such as Molecular Mechanics (MM), which uses classi-
cal mechanics to predict molecular properties. MM is a theory of molecules “without electrons”,
since there is no consideration of electrons interaction. Therefore the results are very approxi-
mate even qualitatively, but it has the advantage of being quick and simple. MM can be used
to very large molecules, e.g. proteins and DNA.
2. Quantum Chemical Methods: Are based on solving the Schrodinger equations using dif-
ferent levels of approximations.
2. Density Functional Theory (DFT) Methods: Much more accurate than semiempirical
methods, but more computationally demanding. Sometimes use some empirical parametriza-
tions. DFT became very popular due to its reasonable accuracy in comparison with its cost
(time of calculation).
DFT method uses different functionals. Some of the most common functionals are: B3LYP,
PBE0, M06-2X, etc.
7
Ab initio quantum chemical methods can be classified as:
1. Hartree-Fock (HF) Methods: The starting point for more accurate ab-intio methods. Less
accurate than DFT.
2. Electron Correlation Methods (or Post-HF): Very demanding computationally, in particular for
large molecules.
These methods calculates the electron correlation using different approximation schemes:
(a) variational, such as Configuration Interaction methods (CIS, CID, CISD, CISDT, etc);
(b) perturbative, such as Moller-Plesset methods (MP2, MP3, MP4, etc.);
(c) or other, such as couple cluster methods (CCSD, CCSDT, etc).
We will briefly expose and explain some of these different methods and basis sets later on when
needed in the corresponding experiments.
The notation “method/basis set” is widely used in computational chemistry, e.g. HF/6-31G(d,p).
This notation defines the level of theory, also sometimes called model chemistry, of the calculation.
8
Part I
9
Experiment 1
Molecule Building & Basic Calculations
1.1 Introduction
Gaussian calculations are best prepared using the Gaussview software. Gaussview is the graphical
user interface (GUI), or the molecular editor, of Gaussian program.
Gaussview allow you to build the required molecule on your screen and using menu pull-downs
you can load the input file into the Gaussian program for execution. After the Gaussian run has
finished you can use Gaussview to view and/or read the output file written by Gaussian and also
you can generate various graphical surfaces through particular binary files.
File extensions for input files are .gjf or .com, while for output files are .log or .out. File
extension for the binary files is .chk.
After installing Gaussian and Gaussview on your computer, a desktop icon is created for the Gaussview
program. Double clicking on the desktop icon starts the program as shown below. Here we can see
two windows: the main “gray” window, and the viewer “blue” window.
10
At the top of the main window, we see the GaussView control panel, containing the menu bar
with a variety of toolbars: File, Edit, Calculate, Results, etc.
The gray area in the main window is the fragment builder. The current (default) fragment in the
builder is tetrahedral carbon (methane). You can move the mouse in this area to rotate the fragment
in the builder. If you click on the benzene ring icon on the main window, you will see that a benzene
ring now is the fragment on the builder. Click on the atom type icon 6 C in the main window to
obtain the tetrahedral carbon again as the fragment in the builder. The different fragments are used
to build any molecule or reaction system in the viewer.
The blue empty window entitled NEW is the viewer window where the required molecule is built.
To build toluene, for example, click on the benzene ring icon on the main window. Place the cursor
in the viewer window and click. A benzene ring will appear in the viewer.
11
Then click on the atom type icon 6 C in the main window. If you click this icon twice, a periodic
table will appear, click on C and select the tetrahedral atom type (tetrahedral carbon is however the
default atom in the atom type).
Now click on any H atom of the benzene ring in the viewer and the toluene molecule should be
built. Be sure that the methyl group is properly attached to the benzene ring. Undo and repeat if
you did any mistake in building the molecule.
It is useful to practice on the mouse actions in Gaussview. The following Table summarize these
actions. Of course, it is not reasonable to memorize these actions. The best way to learn any software
or program is practice! The more you practice, the more you’ll master the tiny details of any software.
12
Mouse Button Action Function
Left Click Selects or inserts item
Drag Left/Right Rotates about Y-axis
Drag Up/Down Rotates about X-axis
Center/Left-Right Drag Translation of molecule
Right Drag Left/Right Rotates about Z-axis
Drag Up/Down Zooms in and out
You can change the molecule display by the View menu then selecting Display Format. A variety
of formats Ball and Bond, Wireframe, and Tube are available. Choose Tube and note change in
the viewer window. Click OK to save this display. Of course, you can select the display format that
you feel comfortable with, but sometimes its is more convenient to work with a particular format.
Calculations using Gaussian program are set-up and run by Gaussview using the Calculate menu
(Calculate > Gaussian Calculation Setup), or simply ctrl+G. Upon opening you’ll see:
13
This allows various types of electronic structure (or quantum chemical) calculations to be per-
formed using Gaussian. As an example you can choose Energy under the Job Type sub-menu. Note
one can also choose a variety of other jobs such as Optimization for geometry optimizations or
Frequency for vibrational frequency calculations etc. In this and the next experiments we will only
calculate energy, but we will encounter different other job types in subsequent experiments.
Under the Method sub-menu we can choose the method of calculation we wish to perform. This
can be Hartree-Fock (HF) calculation, a more simple method such as Semi-empirical (e.g PM6,
AM1, . . . ), or a more accurate method such Density Functional Theory (DFT) calculations using an
appropriate functional (e.g B3LYP, BP86, . . . ).
For HF, DFT, and all ab-initio methods, we also have to select an appropriate basis set from
the Method sub-menu (see Figure below). For the toluene example, choose the HF method with the
3-21G basis set (abbreviated HF/3-21G, or method/basis). Both the method and the basis set define
the level of theory of the calculation. The default level of theory in Gaussview is HF/3-21G.
It is also necessary to define the charge and multiplicity of the molecule or system to be calculated
(multiplicity = 2S + 1, where S is the total spin quantum number). For instance, in our toluene
example we use a charge of zero and a multiplicity of 1 (singlet). If for example we needed a calculation
for the toluene cation radical we would use a charge of +1 and a multiplicity of 2 (doublet).
14
Theoretical Background: Hartree-Fock (HF) and HF-SCF Calculations
The solution of the time-independent Schrödinger In one sentence, the Hartree-Fock (HF) method is
equation Ĥψ = Eψ gives the wavefunction ψ and a numerical procedure to find the coefficients csi in
the total energy E of any system if the Hamil- (1.3) that minimize the variational integral (1.1).
tonian operator Ĥ is known. However, the fa- The derivation of such a numerical procedure is
mous equation is insolvable except for one-electron long and complicated, so we will now summarize
chemical systems. In order to find a solution, and simplify. The “best” wavefunction that cor-
we may use different approximations. One basic responds to the lowest total energy, must satisfy
approximation used in many quantum chemical a modified version of the Schrödinger equation,
methods, including HF, is the variational theory: which is written as a set of HF equations:
The energy of a trial, or guess, wavefunction Φ F̂ ϕi = εi ϕi (1.4)
calculated as
Z for each molecular orbital ϕi , where εi is the en-
Φ∗ ĤΦ dτ ≥ E (1.1) ergy of orbital i, and F̂ is the Fock operator which
has a complicated form. To solve HF Eqs one
is always greater than the true energy of the sys- starts with an initial guess for the orbitals ϕi us-
tem E. Therefore, the lower ing (1.3) (i.e an initial guess of the coefficients of
R the energy we find for
the variational integral Φ∗ ĤΦ dτ , the closer we the AOs), which allows one to calculate an initial
get to the true E. This can be done by “search- guess for F̂ . One uses this initial estimate of F̂
ing” for the best trial wavefunction Φ that gives to solve (1.4) for an improved set of orbitals, and
the lower possible value of energy. then uses these orbitals to calculate an improved
F̂ , which is then used to solve for further improved
The trial wavefunction Φ used in the variational
orbitals, etc. The process is continued until no fur-
integral (1.1) should be antisymmetric upon inter-
ther significant change in the orbitals occurs from
change of electron coordinates; it should be writ-
one iteration (repeating cycle of steps) to the next,
ten as a single Slater determinant (SD):
i.e, self-consistency is reached and the energy is
ϕ1 (1)α(1) ϕ1 (1)β(1) ϕ2 (1)α(1)· · · ϕm (1)β(1) said to be converged. Therefore, the set of MOs
ϕ1 (2)α(2) ϕ1 (2)β(2) ϕ2 (2)α(2)· · · ϕm (2)β(2) leading to the lowest energy obtained by this iter-
Φ= .. .. .. .. ..
. . . . . ative procedure is referred to as a self-consistent
ϕ1 (n)α(n) ϕ1 (n)β(n) ϕ2 (n)α(n)· · · ϕm (n)β(n) field (SCF) procedure, hence the name HF-SCF.
where m = n/2; n is the number of electrons. (1.2)
The Where is the big error in a HF model? Actually,
wavefunction (1.2) is not normalized (normaliza- the wavefunction (1.2) assumes that the electrons
tion factor √1n! ). The functions ϕi α and ϕi β are move independently of each other, i.e. each of the
the spin orbitals. Each spin-orbital being a prod- n electrons feels the presence of an average field
uct of a spatial orbital (or molecular orbital MO) made up of all of the other (n − 1) electrons. This
ϕi and a spin function (either α or β). The spa- approximation is grave, because it assumes that
tial MOs ϕi are expressed as linear combinations the motion of electrons is uncorrelated, i.e. the in-
of one-electron orbitals, χs : stantaneous e-e interactions are neglected. In fact,
electrons do repel one another and correlate their
b
X motions to avoid being close together; this phe-
ϕi = csi χs (1.3)
s=1
nomenon is called electron correlation. This crude
approximation in HF model is better treated in
To exactly represent the MOs ϕi , the basis func- post-HF ab-initio methods (see box in §??) that
tions χs should form a complete set (an infinite add electron correlation on a HF wavefunction.
number of basis functions). However, if b is large
We may use a restricted HF (RHF) wavefunc-
enough and the functions χs are well chosen, one
tion for closed shell (or singlet) systems, and un-
can represent the MOs with negligible error. In
restricted HF (RHF) wavefunction for open shell
equation (1.3), the coefficients csi are the “un-
(doublets, triplets, etc) systems (see box in §5.2).
known” molecular orbital coefficients. Because
the χ are usually centered at the nuclear positions, HF ab-initio is seldom used, nevertheless, it has a
they are referred to as atomic orbitals, hence (1.3) great importance since most correlation ab-initio
is a linear combination of atomic orbitals (LCAO). and many DFT methods “use” HF wavefunctions.
15
In order to execute, or run, the calculation, we have to submit our job by clicking on the Submit
button found in the bottom left corner of the Gaussian Calculation Setup window. You are asked
for a file name to save the job. Choose an appropriate and instructive name e.g. toluene hf321g.
Save the file and continue with the submission. A confirmation window will open, click OK, then a new
window will open where the progress of the run can be monitored. This new window is the Gaussian
program running and solving very long and complicated numerical equations (approximations of the
Schrodinger equation) for our example molecule. When the calculation is successfully finished you
are notified.
This particular calculation (toluene at HF/3-21G level) will finish after couple of seconds on a
regular PC. However, some other calculations (big molecules using high levels of theory) may take
many hours or even many days on powerful workstations or supercomputers.
Close the Gaussian window (by clicking OK in the small notification window) and you will be
asked next if you wish to view a results file. The output data and results are written in two types of
files
A .log file, which is a text file listing all the steps and the results calculated by Gaussian.
A .chk file, which is a binary file that can be used to generate various orbitals, electronic
densities. etc. We will deal with checkpoint files (.chk) in the next experiment.
Choose the .log file and the output file of the toluene molecule will appear in a new viewer
window. You can also open the file using File > Open.
Our job was an energy calculation on the toluene molecule. So the question is: where we can find
the energy value and any other results of this calculation? One way is to open Results > Summary
(while the .log viewer file is still open) and you’ll have the following window (or Table)
16
Take a look at the information contained in this Table. Among others, we have a reference of the
level of calculation used (method and basis) together with the charge and multiplicity. Moreover,
we also have the energy, E(RHF) in atomic units (a.u.). The RHF stands for “Restricted Hartree-
Fock”. It is worth to mention that the energy calculated here or by any quantum chemical (electronic
structure) program is the total electronic energy. The word “electronic” here refer to electronic
structure (Should not be confused with the electronic energy that arises from the solution of the
electronic Hamiltonian in BO Approximation (4)). You can save the data of this Table: Save Data.
If you need more details than that printed in the summary, open the .log as a text file by
clicking on View File. You can also view the text file by opening Results > View File. A
window containing the whole text file should appear as a Wordpad text file. Scroll down through
the file to see the information contained. Finally, you can open any input or output file using any
text editor (such as notepad in Windows OS and vi or gedit in Linux OS).
1.3 Exercises
Exercise 1
Repeat the calculation on the toluene using the semiempirical AM1 method and the DFT B3LYP/3-
21G and WB97XD/3-21G methods. Compare the time for completion of the three jobs (AM1,
B3LYP/3-21G, and WB97XD/3-21G) with that of HF/3-21G and comment on any trend observed.
Exercise 2
Repeat the calculation on the toluene using HF/STO-3G, HF/6-31G and HF/6-31(d,p) levels. Com-
pare the time for completion of the three jobs [HF/STO-3G, HF/6-31G and HF/6-31(d,p)] with that
of HF/3-21G and comment on any trend observed. Report the results of Exc 1 and 2 in a Table.
Exercise 3
Build each of the following molecules and perform a DFT B3LYP/STO-3G calculation: phenol,
aniline, anthracene, p-benzoquinone, p-methylphenol, benzothiazole, Mn(H2 O)6 , and [Cr(NH3 )6 ]3+ .
[Hint]: Double click on the benzene ring icon and the R icon on the main window in order to
select the most common aromatic systems and functional groups, respectively.
17
Experiment 2
Molecular Orbitals, Electron Densities & Electro-
static Potentials
2.1 Introduction
In addition to numerical quantities (energies, dipole moments, etc), quantum chemical calculations
furnish a wealth of information that is best displayed in the form of images. Among the results
of calculations that have proven to be of value to chemists are the molecular orbitals themselves,
the electron density, and the electrostatic potential. These can all be expressed as three-dimensional
functions of the coordinates.
Molecular orbitals, in particular, the highest energy occupied molecular orbital (the HOMO) and
the lowest energy unoccupied molecular orbital (the LUMO), are often quite familiar to chemists.
Both HOMO and LUMO are together referred to as the frontier molecular orbitals. The HOMO
holds the highest energy (most available) electrons and should be subject to attack by electrophiles,
whereas the LUMO provides the lowest energy space for additional electrons and should be subject
to attack by nucleophiles.
For example, the HOMO (bonding π orbital) in formaldehyde has the greater electron density
on oxygen atom (see Figure below), indicating that attack by an electrophile (like a proton) will
occur at oxygen. On the other hand, the LUMO (antibonding π ∗ orbital) has its larger lobe on
carbon, indicating that a nucleophile will be added to the carbonyl carbon, consistent with the
known nucleophilic chemistry.
LUMO C O
HOMO C O
Build the p-methylphenol molecule in the Gaussview viewer. Modify the molecule such that the OH
group lies in the ring plane. This can be done using the dihedral angle modifier as shown below:
18
Note that one can keep some atoms fixed while moving others. In the figure above, the OH group
is rotated by 90◦ while all the atoms in the ring are fixed in plane. The dihedral angle value can be
modified by moving the slider or the angle can be entered in the text box provided.
Set up a HF/3-21G single point calculation using the Calculate > Gaussian Calculation
Setup menu (a single point calculation is simply an energy calculation at a given geometry). In this
experiment we will be looking at graphical surfaces so we need to save a copy of the .chk (checkpoint)
file in our working directory. We do this by opening the Link 0 menu in the set-up box and then
clicking the checkpoint file box. Notice that saving the .chk file with a name that is the same
as the name of the input file is the default in Gaussview. The .chk file is required for graphical
representations of orbitals and densities.
When the calculation is done, open the .chk file (File > Open or choosing it after Gaussian
calculation is finished), the molecule appears in a separate viewer window. To examine the orbital
energy levels and the orbital electron densities use Edit > MOs.
An orbital energy level diagram is produced showing the lower 29 occupied MOs and the next
unoccupied MOs. The HOMO and LUMO are orbitals 29 and 30 respectively. Note that the values
of orbital energies in a.u. are printed side by side with each MO. Note also that the total number of
MOs is 88, as it should be for a 3-21G basis for 7 C, 1 O, and 8 H atoms (see the next box).
19
Theoretical Background: Basis Sets. Part I
The one-electron orbitals (or AOs) χ used in Eq have significant electron density at large distances
(1.3) are called basis functions. In numerical com- from the nuclei.
putations the basis functions are mainly Gaussian-
Pople Basis Sets: Minimal CGTF sets are of-
type orbitals (GTO), typically expressed as
ten formed by fitting Slater Type Orbitals STOs.
j 2 Each STO is approximated as a linear combina-
g(x, y, z)i,j,k = N xib yb zbk e−αrb (2.1)
tion of n Gaussian functions, where the Gaussian
where i, j, and k are integers, α is the orbital ex- orbital exponents and the coefficients in the lin-
ponent, xb , yb , zb are Cartesian coordinates with ear combination are chosen to give the best least-
the origin at nucleus b, rb is the distance to nucleus squares fit to the STO. The expansion of an STO
b, and N is the normalization constant. in terms of n primitive Gaussians is designated
STO-nG. A common choice is n = 3, giving a set
GTO do not properly represent the nuclear cusp
of contracted Gaussians referred to as STO-3G.
and the tail, but have the advantage of being
The STO-3G basis set for a compound of first-
fast in calculating multi-center molecular inte-
row atoms and H is denoted by (6s3p/3s) con-
grals. However, to overcome the inaccuracies
tracted to [2s1p/1s], where parentheses indicate
caused by GTO’s, several GTO’s, called Primi-
the primitive Gaussians and brackets indicate the
tive GTO could be used to approximate one GTO,
contracted Gaussians.
called Contracted GTO. CGTO’s are constructed
by linear combination of PGTO’s. The functions The 3-21G and 6-31G sets are VDZ of CGTFs.
used in (1.3) are the contracted GTOs. In the 3-21G, each inner-shell AO (1s for Li–Ne;
1s, 2s, 2px , 2py , 2pz for Na–Ar; . . .) is represented
Types of Basis Sets: A minimal basis set con-
by a single CGTF that is a linear combination of
sists of one basis function (CGTO) for each atomic
three PGTF. For each valence-shell AO (1s for
orbital in the atom. A double-zeta (DZ) basis
H; 2s and the 2p’s for Li–Ne; . . . ; 4s and the
set consists of two basis functions for each AO.
4p’s for K, Ca, Ga–Kr; 4s, the 4p’s, and the five
A triple-zeta (TZ) basis set consists of two ba-
3d’s for Sc–Zn), there are two basis functions, one
sis functions for each AO. We can continue for
of which is a CGTF that is a linear combination
quadruple-zeta (QZ), 5Z, 6Z, . . . A split-valence
of two primitives and one that is a single diffuse
basis (denoted VXZ, where X = D, T, Q, . . . )
Gaussian. The designation of first-row atoms and
uses only one basis function for each core AO, and
H for 3-21G is (6s3p/4s) contracted to [3s2p/2s].
a larger basis (X) for the valence AO’s.
The 6-31G set uses six primitives in each inner-
Polarization functions: AOs are distorted in shell CGTF and represents each valence-shell AO
shape and have their centers of charge shifted by one CGTF with three primitives and one Gaus-
upon molecule formation. To allow for this polar- sian with one primitive; (10s4p/3s) contracted
ization, one adds basis-function whose l quantum to [3s2p/2s]. 6-311G is a split-valence TZ basis
numbers are greater than the maximum l of the (VTZ); it adds one GTO with one primitive to
valence shell of the ground-state atom. Any such 6-31G; (11s5p/5s) → [4s3p/3s].
basis set is a polarized (P) basis set. A common
Diffuse functions are normally s- and p-functions.
example is a double-zeta plus polarization set (DZ
They are denoted by +, indicating one set of dif-
+ P or DZP), which typically adds to a double-
fuse s- and p-functions on heavy atoms, or ++,
zeta set a set of five 3d functions on each “first-
indicating that a diffuse s-function is added also
row” (Li, Be, B, C, O, N, F, Ne) atom and a set of
to H. Polarization functions are indicated after the
three 2p functions (2px , 2py , 2pz ) on each H atom.
G, with a separate designation for heavy atoms
Diffuse Functions: Diffuse functions have small α and hydrogen. The 6-31G(d) is a split valence ba-
exponents (typically, 0.01 to 0.1); this means the sis with a single d-type polarization function on
electron is held far away from the nucleus. Diffuse heavy atoms. A 6-311++G(2df,2pd) is similarly
functions are necessary for anions, compounds a triple split valence with additional diffuse sp-
with lone pairs, excited and Rydberg states, very functions, two d-functions and one f - on heavy
electronegative atoms (like F), hydrogen-bonded atoms, and diffuse s- and two p- and one d- on H.
dimers, and accurate polarizabilities or binding The 6-31G(d) and 6-31G(d,p) sets are sometimes
energies of van der Waals complexes, which all denoted as 6-31G* and 6-31G**, respectively.
20
To obtain the diagram of a MO for any orbital simply click on the orbital or combination of orbitals
and highlight them. In fact, the frontier orbitals HOMO and LUMO are highlighted by default, but
you can add to them more orbitals to obtain their diagram. Click on Visualize followed by Update
(do not click OK!) and after a few seconds an electron molecular orbital contour of the selected (or
highlighted) orbitals will be displayed. The orbital display can be alternated between the HOMO or
LUMO (or any other selected orbitals) by highlighting the desired display. Notice a red box handle
appears next to the displayed orbital. Finally, you can control the “size” of the orbital contour by
changing the value of the isovalue. Try with different isovalues to see what will be the effect on the
MO contours. Below is a Figure of HOMO (left) and LUMO (right) of cresol. The two orbitals are
generated at isovalue = 0.02 (the default value).
Only molecular orbitals can be generated using this procedure. To display other graphical sur-
faces, such as the electron densities and the electrostatic potentials, the Results > Surfaces/Contours
menus must be chosen from the main Gaussview window. When the Surfaces and Contours win-
dow is open, click on Cube Actions > New Cube then choose the type total density and submit.
21
After few seconds, the electron density plot is available in the cubes available. Under Surface
Actions choose new surface and the electron density plot will be displayed in a few seconds.
The display is a contour of electron density at a chosen value . This can be changed to any desired
value by typing the value in the isovalue for new surfaces text box. The “skin-like” nature of
the representation is demonstrated by introducing z-clipping. This is performed by right-button
clicking in the viewer display window, selecting View > Display Format and then in the resulting
new window selecting Surface. Moving the Z-Clip slider enables the interior of the display to be
shown as demonstrated below.
Another useful graphical display is a mapped surface. Here, two properties can be displayed at
the same time providing additional information. A good example is the display of the electrostatic
potential on a total electron density surface. While the total electron density surface represent the
steric requirements of the molecule, by mapping with the electrostatic potential, we can see in the
same time additional information about the electrostatic distribution on the molecule.
22
In our example, we first calculate the total electron density default contour, then in order to map
the electrostatic potential onto this surface, click Surface Actions > New Mapped Surface then
choose type ESP then OK (ESP stands for electrostatic potential), and the ESP surface mapped with
the total density will be displayed after few seconds.
The red color represents a more presence of the “partial negative charge” (to be more precise,
the negative potential), while the blue color is for the partial positive charge (positive potential). Of
course, in our molecule, it is the oxygen atom that is the most “negative” atom in the molecule. We
can Z-Clip the diagram as before, or even better, we can make the display transparent, to be able
to see molecule in the interior of the display. To make the display transparent, right-click in viewer
display window, the View > Display Format > Surface > Format then select Transparent or
Mesh instead of Solid.
Electrostatic potential maps are used for a many useful purposes. For instance, we can rapidly
convey which regions of a molecule are likely to be electron rich and which are likely to be electron
poor.
In addition, ESPs can be used to distinguish between molecules in which charge is localized from
those where it is delocalized.
Compare the electrostatic potential maps in the Figure below for the planar (top) and perpen-
dicular (bottom) structures of the benzyl cation. The latter reveals a heavy concentration of positive
charge (blue color) on the benzylic carbon and perpendicular to the plane of the ring. This is con-
sistent with the notion that only a single Lewis structure can be drawn. On the other hand, planar
benzyl cation shows no such buildup of positive charge on the benzylic carbon, but rather delocal-
ization onto ortho and para ring carbons, consistent with the fact that several Lewis structures can
be drawn.
23
Electrostatic potential maps can also be employed to characterize transition states (TS) in chem-
ical reactions. A good example is pyrolysis of ethyl formate (leading to formic acid and ethylene):
For the TS shown in the reaction, the electrostatic potential map shown in below (based on
an electron density surface appropriate to identify bonds) clearly shows that the hydrogen being
transferred (from carbon to oxygen) is positively charged; that is, it is an electrophile.
Moreover, for this TS, the electron density surface (which is also mapped with the ESP in the
Figure above), offers clear evidence of a late transition state, meaning that the CO bond is nearly
fully cleaved and the migrating hydrogen is more tightly bound to oxygen (as in the product) than
to carbon (as in the reactant). In fact, one of the most important advantage of electron density
surfaces is that they can be applied to elucidate bonding.
24
2.3 Exercises
Exercise 1
Build and calculate phenol at HF/3-21G. Obtain the MO contours of the four highest energy occupied
orbitals at isovalues of 0.09 and 0.05 e/au3 . respectively.
Exercise 2
Again for the Phenol molecule get contours of the total electron density at values of 0.002 and 0.05
e/au3 . Explain the differences between the two representations. Which representation could be used
to demonstrate bonding regions of the molecule. Plot the mapped surface of electrostatic potential
on the 0.09 e/au3 electron density surface and compare with the above.
Exercise 3
Compare the electron density/electrostatic potential mapped surfaces of benzene and pyridine. From
you surface representations predict where you would expect electrophilic attack to occur for each.
Exercise 4
Build the water molecule. Perform geometry optimization (Calculate > Gaussian Calculation
Setup > Job Type > Optimization) using three levels of theory: AM1, HF/STO-3G and HF/3-
21G. Examine the occupied molecular orbital energy levels and obtain MO contours for each.
The experimental photoelectron spectrum1 for water has four major bands at 12.6, 14.8 18.6,
and 32.1 eV. Do you find any correspondence between the molecular orbital energy levels you have
calculated using the three levels and these experimental spectral values2 . Explain any correspondence.
Explain the role of the level of theory as compared with experiment.
Exercise 5
Build ethanol (a weak acid), ethanoic acid (a moderately strong acid), nitric acid (a strong aid) and
sulfuric acid (a strong acid). Optimize their geometries at the HF/3-21G level. Map the electrostatic
potential for each molecule on to a total electron density surface. Display each map side by side
using the same scale for each. The relative acidity of each should be reflected in the value of the
electrostatic potential near the acidic H of the OH group. Those maps should predict the relative
acidities of these molecules. Explain your conclusions.
Exercise 6
Calculate the energy of ethene and cis-1,3-butadiene then generate HOMO and LUMO contours
for these two molecules. Use the symmetry of these frontier orbitals to explain why the reaction
of cis-1,3-butadiene and ethene is favorable to form cyclohexene (Diels-Alder cycloaddition), while
the reaction of two ethene molecules is not possible and cyclobutane would not be expected. [Hint]:
In both cases, a rxn will occurs if the HOMO of one molecule will interact properly (symmetry
considerations) with the LUMO of the other. Moreover, to facilitate orbital interactions, let the H
atoms in ethene perpendicular to the plane while the H atoms of butadiene in the plane.
1
Photoelectron spectroscopy is an experimental technique used to measure the energy of electrons emitted from
atoms and molecules by the photoelectric effect, in order to determine the binding energies of electrons in the substance
(i.e . ionization energies).
2
Note that the calculated orbital energy levels are reported in a.u. in Gaussian, so you will have to convert to eV
for comparison with the experimentally reported values. (1 a.u. = 27.21 eV)
25
Experiment 3
Geometry Optimization & Vibrational Frequencies
3.1 Introduction
When you build a molecule using Gaussview, or any other molecular editor, the geometry (bond
lengths, bond angles and dihedral angles) will not be accurate as the true molecules. After all,
Gaussview, or any other molecular editor, is an advanced molecular “drawing” software.
For instance, in experiment 2, you have started by building p-methylphenol, then you were asked
to modify the dihedral angle of the OH group from being perpendicular with the benzene plane (90◦ )
to end with an OH group that lies in the ring plane (0◦ ). Actually, experiments have shown that the
OH group in phenol is indeed more parallel with ring than being perpendicular to it (of course, this
is also known from electronic structure considerations). That is why you were asked to modify the
angle, otherwise you would have worked with a wrong geometry!
How could we know that a given molecular geometry is more reasonable, or more “likely”, than
another? The answer is simple: The geometry that has lower energy will be more stable! As a fast
check, calculate the energy for both conformers of p-methylphenol (one with an OH group dihedral
angle = 0◦ , and the other = 90◦ ) using the same level of theory (e.g. B3LYP/3-21G). Which of the
two geometries has the lowest energy?
The computational search for the geometry with lowest energy is called geometry optimization,
also known as energy minimization. In this experiment, you will optimize the geometry (i.e. find the
structure with the minimum energy) of p-methylphenol, starting from a nonequilibrium geometry.
Then, you’ll learn how to be sure that your optimized structure is a minimum point on a curve, or
to be more precise, an equilibrium geometry on the potential energy surface (PES).
In fact, in almost all computational studies, geometry optimization is in principle the first step
that should be done before calculating any property. We should first have the correct geometry.
Build the p-methylphenol molecule, do not modify any bond length or angle but notice the dihedral
angle of the OH group. Perform a geometry optimization calculation (Job Type > Optimization)
using the density functional (DFT) BPV86/6-31(d,p) level of theory.
26
When the calculation is done, open the .log file and look at the OH group dihedral angle: It is
in the plane of the benzene ring. In fact, not only the OH dihedral angle that has changed through
the process of finding the optimized geometry, but in principle all bond lengths, bond angles, and
dihedral angles in the molecules changed accordingly in order to correspond with the new geometry of
the lowest energy. Click in the ? icon on the main Gaussview window. Clicking on any 2 atoms then
gives the distance (bond length) between these atoms in angstroms in the window pane underneath.
Clicking 3 atoms consecutively gives the value for the angle in degrees and clicking 4 atoms gives the
dihedral angle value. See the next Figure for an inquiry of the OH dihedral angle which is found to
be (D = 0.042), i.e, almost zero. In addition to the OH dihedral angle, compare any bond length or
bond angle, before and after optimization, and observe the changes.
Now open the output .log file as a text file, scroll down through the file and examine the criteria
the program uses to test for geometry optimization. For instance, when you open the text file, search
for the phrase “converged” (search any text file using the command Ctrl+F). You will find this phrase
15 times in the file, which means that this particular calculation, geometry optimization is reached
after 15 steps. Note that geometry optimization is complete when all four tests have a YES in the
Converged column, as shown in the next Figure for the first step, a middle step, and the final step.
27
Theoretical Background: Density Functional Theory (DFT)
Density Functional Theory (DFT) is a compu- In the late 1980s, Becke showed that by tak-
tational method that derives properties of the ing Exc as an integral of a certain function of ρ
molecule based on a determination of the electron and the derivatives ∂ρ/∂x, ∂ρ/∂x, ∂ρ/∂x (these
probability density ρ(x, y, z) of the molecule. Un- derivatives constitute the gradient of ρ), one gets
like the wavefunction, which is not a physical real- greatly improved results for molecular atomization
ity but a mathematical construct, electron density energies. Such a functional is called a gradient-
is a physical characteristic of all molecules. DFT is corrected functional and use of a gradient-
based on a theorem proved Hohenberg and Kohn: corrected functional gives the generalized-gradient
the energy and all other properties of a ground- approximation (GGA).
state molecule are uniquely determined by the
In 1993, Becke proposed a further improvement in
ground-state electron probability density ρ(x, y, z). GGA by adding to it a term aE HF , where E HF
Exc x x
Actually, the ground-state electronic energy Egs is
has the form of the expression used for the ex-
a functional of ρ: Egs = Egs [ρ(x, y, z)] or simply
change energy in Hartree-Fock calculations but is
Egs = Egs [ρ], where the square brackets denote a
evaluated using KS rather than HF orbitals, and
functional relation. A functional is a function of a
a is an empirical parameter whose value was cho-
function. For instance Egs is a functional since it is
sen to optimize the performance of Exc in calcu-
a function of ρ(x, y, z), which is in turn a function
lations on a test series of molecules. A GGA Exc
of the coordinates (x, y, z).
that includes a contribution from ExHF is a hybrid
Unfortunately, the functional Egs [ρ] is unknown. GGA functional. The most widely used functional
Kohn and Sham (KS) method use approximations in calculations is the hybrid GGA called B3LYP,
to the unknown functional that allow accurate cal- where B indicates that it includes a term for ExGGA
culations of ρ and Egs . The basic approxima- devised by Becke, LYP indicates a term for EcGGA
tion and the challenge in DFT is in the exchange- devised by Lee, Yang, and Parr, and the 3 indi-
correlation energy functional Exc . cates that it contains three empirical parameters
whose values were chosen to optimize its perfor-
For convenience in devising approximations to
mance. The hybrid functional B3PW91 is similar
Exc , Exc is usually split into an exchange part
to B3LYP except that it uses the Perdew-Wang
and a correlation part: Exc [ρ] = Ex [ρ] + Ec [ρ],
1991 expression for Ec instead of the LYP formula.
and people devise separate approximations for Ex
and Ec . Kohn and Sham suggested the use of a For a GGA functional, Exc is taken as an integral
certain form for Exc [ρ] called the local (spin) den- of a function of ρ and its derivatives. An improve-
sity approximation (LDA or LSDA) that theory ment on GGA functionals is gotten by taking Exc
shows to be accurate when the electron density ρ as an integral of a function of ρ, the derivatives
varies very slowly with position, not the case in of ρ, and a quantity called the kinetic-energy den-
molecules! One finds that LSDA KS DFT calcu- sity τ , where τ is a certain function of the deriva-
lations give good results for molecular geometries, tives of the Kohn-Sham orbitals. Such a functional
dipole moments, and vibr frequencies, but rather is called a meta-GGA functional. A contribution
poor results for atomization energies. The LSDA from Exc HF can also be added to a meta-GGA func-
Exc is a definite integral of a certain function of ρ. tional to give a hybrid meta-GGA functional.
Family[Rung] Dependencies Examples
LDA ρ Density ρ VWN, GPW92
GGA ρ, ∇ρ Gradient of ρ BLYP, PBE, BP86
meta-GGA ρ, ∇ρ, ∇2 ρ and/or τ Kinetic energy of ρ M06-L, TPSS, τ HCTC
Hybrid GGA/ Occupied orbitals B3LYP, PBE0, mPW1K
Hybrid meta-GGA 2
ρ, ∇ρ, ∇ ρ, τ, ψocc exact exchange and M06-2X, TPSSh, ωB97X-D3∗
(HGGA/HmGGA) compatible correlation CAM-B3LYP∗∗
Double Hybrids ρ, ∇ρ, ∇2 ρ, τ Virtual orbitals B2PLYP, XYG3
(DH) ψocc , ψunocc exact partial correlation ωB2PLYP∗∗
* The D3 term: an empirical dispersion correction to account for dispersion forces. ** Range Separated Hybrid
(RSH) functionals, designed to account for charge-transfer electronic transitions.
28
After the last step, comes a long Table of all geometry parameters (bond lengths, bond angles,
and dihedral angles). The last column shows that the derivative of the energy with respect to atom
coordinates is zero (dE/dx = 0) denoting a stationary point (a minimum) on the PES.
How could we confirm that the geometry obtained in a geometry optimization job is a minimum?
By performing another subsequent calculation, harmonic vibrational frequency calculation, we can
confirm stationary points on the PES. Stationary points on PES (or a reaction coordinate) can be
a minimum (a reactant, a product, an intermediate, an equilibrium geometry, etc) or a maximum
(such as a TS). Vibrational frequencies allows one to classify a stationary point on the PES as a local
minimum (all real vibrational frequencies) or a TS (one imaginary frequency).
In addition to identify the nature of stationary points on PES, vibrational frequencies are also
important because they are used to:
1. Compute and analyzing IR and Raman spectra (frequencies, intensities, and normal modes)
on molecules.
2. Compute many important thermodynamic properties (∆H, ∆S, ∆G, and constant volume
molar heat capacity). It is essential to include the molecular vibrational zero-point energy
EZPE if accurate quantum-mechanical estimates of energy differences are wanted. Calculation
of EZPE requires knowing the molecular vibrational frequencies.
3. Predict properties dependent on the second and higher derivatives of the energy, such as po-
larizabilities and hyperpolarizabilities.
In this experiment, we will use vibrational frequencies to identify the nature of stationary points
on PES and to Compute vibrational IR spectra. Some important other applications of vibrational
frequencies will be discussed later.
It is important to realize that frequency calculations are valid only at stationary points on the
potential energy surface. Thus frequency calculations must be performed on optimized structures.
Moreover, a frequency calculation must use the same level of theory (method/basis) as used to obtain
the optimized geometry. Frequencies computed with a different basis set or method have no validity!
Harmonic vibrational frequency calculations are also set up via the Calculate menu (Job Type >
Frequency). As mentioned above, since it is essential to perform the vibrational frequency calculation
using the exact same method and basis as that used to perform a prior geometry optimization, it
is often better to perform the geometry optimization and the vibrational frequency in the same
calculation and this can be done most conveniently using the Job Type > Opt+Freq.
As another example, we will perform frequency analysis on the water molecule. Build the wa-
ter molecule and in the Gaussian Calculation Setup window choose Job Type > Opt+Freq and
B3LYP/3-21G level. Multiplicity is 1 and charge is zero.
Open the text .log file on completion and scroll down the text file. Note that the geometry opti-
mization is performed initially and, after completion, the resulting optimized geometry is submitted
automatically for a vibrational frequency calculation. The vibrational frequencies, intensities, and
the mode forms are given after the calculation is complete. Search for “Harmonic frequencies” or
simply “Harmonic” and look for the lines below this word. See the Figure below.
Water has three vibrational frequencies, or normal modes of vibrations. This is consistent with
29
the 3N − 6 normal modes of vibration for nonlinear molecules (N is the number of atoms). The three
normal modes (or frequencies) together with their corresponding symmetry, intensity and some other
properties are reported. Finally, notice that all three frequency values are positive; non of them is
negative (or imaginary), meaning that the optimized geometry of H2 O is a true minimum.
Alternatively, the vibrational data can be displayed from the main Gaussview window using the
Results/Vibrations menu. The three modes together with their frequency values and intensities
(Infrared) are displayed.
Each vibrational mode can be animated by highlighting the mode and clicking the Start Animation
button. This provides an easy and convenient way of analyzing the atom displacements in the mode
form. Displayed by highlighting the show displacement vectors. In addition, the calculated In-
frared spectrum can be calculated and displayed by clicking on the Spectrum button.
To end, we can also confirm the minimum nature of the optimized geometry of water by looking
the number of imaginary frequencies in the Results/Summary Table. You will find 0 imaginary
frequency for water, as expected and shown before.
30
3.3 Exercises
Exercise 1
Carbocations are electron deficient species. Since methyl groups are electron donating, the following
trend is observed for carbocations stability: 3◦ (tert) > 2◦ (sec) > 1◦ (prim) > CH+
3 (methyl carbocation).
In this exercise, you will optimize the geometries of four different carbocations and explain their sta-
bility according to their ESPs. Start by building: methane, ethane, propane, and isobutane. Remove
a H atom from methane, any hydrogen atom from ethane, a H atom from carbon 2 in propane, and
a H atom from 2 in isobutane. Now you have obtained neutral radical (neutral doublet) for all
molecules. In order to obtain the +ve charge cation, specify: Charge = 1 and Spin multiplicity
= Singlet in the Method menu for all molecules. Now, in order to obtain the correct sp2 structure of
those carbocations, you have to optimize their geometries and verify the optimized structures through
frequency calculations. Therefore, ask for Opt+Freq in the Job Type menu. When all calculations
are finished, verify that all structures have no imaginary frequencies (recall that carbocations are
intermediates not TSs, hence the energy of each should be at a minimum). Finally, draw ESP dia-
grams for all structures and observe the presence of the positive charge on each carbocation carbon
in each molecule.
Exercise 2
Build ten structures of p-methylphenol in ten Gaussview viewers (or build it once and copy paste
nine times the others). Leave the first structure as it is with a 90◦ OH dihedral angle. In each of
the following nine structures modify the OH dihedral angle in values of 80◦ , 70◦ , 60◦ , . . . , and 0◦ ,
respectively. Then calculate the energy Job Type > Energy (not Optimization nor Opt+Freq)1 ,
for each of the ten different structures at B3LYP/3-21G level. When all jobs are finished, plot the the
energy versus the dihedral angle. Which dihedral angle is at the minimum? Is this consistent with
the result of the optimized geometry done for this molecule in the beginning of this experiment?2
1
When you only calculate Energy for any structure, you are actually doing a single point energy (SPE) calculation,
meaning the energy is calculated at a fixed geometry. On the other hand, when you perform a geometry optimization,
through Optimization or Opt+Freq, then the program will generate many geometries, perform an SPE on each
geometry, then the program will decide, according to a certain criteria, the geometry with the lowest energy.
2
In fact the curve that you have obtained, is a one-dimensional PES. It is one-dimensional because the energy was a
function of one geometry variable only: the OH dihedral angle; all the rest was fixed. In a real geometry optimization
calculation, the geometry is minimized in an N dimensional PES! because energy in reality should be a function of all
geometry variables; so here everything will be changed! For instance, in the optimized structure that was performed in
the beginning of this experiment, one of the methyl H atoms in p-methylphenol is in the plane with the ring, however,
for all the ten calculated structures in this exercise, none of them contain a methyl H in plane with the ring, simply
because they were all fixed. Plotting PESs for more than 2 dimensions (2 geometry variable) is impossible.
31
Exercise 3
Construct a PES of n-butane, where the variable is the dihedral angle C1–C2–C3–C4 (rotation of C1
and C4 about the C2–C3 axis, in the language of organic chemistry, this is called torsion angle). Let
the dihedral angle range from 0◦ to 360◦ in intervals of 10◦ (So you’ll have in total 37 point). Choose
by yourself any appropriate level of theory (HF/3-21G, B3LYP/3-21G, B3LYP/6-31G, WB97XD/6-
31G(d,p), etc). The 37 input files in this exercise can all be run in one batch file: prepare all your
input files using Gaussview, then open the Gaussian program, then click on Utilities > Edit
Batch Files then select add, keep adding files that you want to include in the batch file, then select
exit. To run the batch file, click on ▶ in the Gaussian window.
After plotting the PES, can you distinguish the staggered and eclipsed structures on the PES?
Moreover, can you characterize the anti and the gauche conformers? Refer to any organic chemistry
textbook to compare with your PES, or compare with the Figure below. Calculate the differences
between all saddle points (all maxima and minima) in your PES, convert to kJ/mol (or kcal/mole)
and compare with values given the Figure. In the language of quantum chemistry, the anti conformer
at 180◦ in the PES is called global minimum, while the two gauche conformers at angles 60◦ and 300◦
are called local minima.
Exercise 4
Take two conformers of the previous exercise with dihedral (or torsion) angles 90◦ and 150◦ . Optimize
both geometries and calculate their frequency at same level of theory performed in the previous ex-
ercise. Measure the dihedral angle of the optimized geometry in both structures. Do both structures
contain any imaginary frequencies? This exercise demonstrate an important principle in geometry
optimizations: Geometry optimization finds a local minimum in the neighborhood of the initially
assumed molecular geometry.
For a molecule with several conformations, one must repeat the local-minimum search proce-
dure for each possible conformation, so as to locate the global minimum. This procedure could be
complicated for very large molecules with so many conformers.
32
Exercise 5
Perform a B3LYP/6-31G(d) optimization and vibrational frequency calculation on the formaldehyde
H2 CO molecule. Before the calculation, after building the molecule input file, impose C2v symmetry
on the structure: Open Edit > Point Group, activate by clicking Enable Point Group Symmetry
in the upper left. The current molecular symmetry will then be displayed at the upper right. You
can impose symmetry on the molecule using some controls The Tolerance popup specifies how close
the structure must be to symmetric before a point group can be applied. As you loosen the tolerance,
additional point groups will appear in the popup menu to the left. You can select the desired popup,
and then click the Symmetrize to modify the structure so that it attains that symmetry.
The experimental frequency results are given below. Compare experimental with calculated values
by completing the Table below. Experimental values represent the anharmonic frequencies and hence
are usually lower than the calculated harmonic modes. Give a reason for this. Note the symmetry
of the normal modes that belong to the C2 v point group.
Experimental IR Calculated
Mode Symmetry Frequency (cm−1 ) Intensity frequency (cm−1 )
CH2 wag (out-of plane) B1 1167 Strong
CH2 rock (in plane) B2 1249 Strong
CH2 scissors (in plane) A1 1500 Strong
C=O stretch A1 1746 Very strong
C–H symmetric stretch A1 2782 Strong
C–H asymmetric stretch B2 2843 Very strong
Exercise 6
Perform a geometry optimization and vibrational frequency calculation on benzoic acid at the
BPV86/3-21G level of theory3 . Below is the Figure of the experimental IR spectrum of benzoic
acid. Assign the nature of the calculated vibrations of the important bands (high intensity bands),
then correlate those bands with the experimental spectrum. Use an IR frequency Table to verify.
3
Optimizations followed by frequency calculations are usually long. If you work with a workstation or a PC (a
desktop or even a laptop), then most probably the processor is multicore (i3, i5, i7, i9, xenon . . . etc), this means that
you can run jobs using more than 1 core in order to speed up calculations. To do so, in the Gaussian Calculation
Setup window, open the Link 0 then change the Shared Processors from Default to Specify then write down in
the box the number of shared processors (2 or 4) depending on the number of cores you have in your machine. More
shared processors in Gaussian is only possible in the 64 bit software, or the linux version
33
Experiment 4
Thermochemistry
4.1 Introduction
Gaussian, and all major electronic structure calculation programs, can calculate thermodynamic
(thermochemical) quantities, H, S, G, and CV . The calculation of these quantities from the total
electronic energy is possible thanks to methods of statistical thermodynamics. The First step in a
thermochemical calculation is calculation of the total electronic energy of an optimized geometry of
the chemical system. The second step is the calculation of vibrational frequency. That’s it!
We need the vibrational frequencies in order to compute the vibrational zero-point energy (ZPE).
When we calculate the total electronic energy, the energy is the minimum on the potential energy
surface. We need the energy of the v = 0 vibrational level, which is slightly higher. This offset is the
vibrational zero-point energy.
Energy
v= 3
v= 2 De D0
v= 1
v= 0
ZPE
The zero point energy is a correction to the total electronic energy of the molecule to account for
the effects of molecular vibrations which persist even at 0K.
In addition to ZPE, at temperatures higher than 0K, a thermal energy correction must also be
added to the total electronic energy, which includes the effects of molecular translation, rotations,
34
and vibration at the specified temperature and pressure. The default of a thermochemical procedure
is carried out at 298.15 K and 1.000 atm of pressure, using the principal isotope for each element
type. Adding ZPE and thermal energy corrections (translational, rotational, and vibrational) to
the total electronic energy, gives the internal energy U . With the internal energy U being known,
enthalpy H, entropy S and Gibbs free energy can be also calculated:
E0 = Eelec + EZPE (4.1)
U = E0 + Evib + Erot + Etransl (4.2)
H = U + RT (4.3)
G = H − TS (4.4)
where Eelec is the total electronic energy from the electronic structure calculation within BO approx-
imation: Eelec = ET̂e + EV̂ne + EV̂ee + EV̂nn [Eq (6)], EZPE is the energy at the v = 0 vibrational level,
and E0 is the energy at 0K.
Figure 4.2: Potential Energy Surface (PES) and vibrational energy levels of a diatomic molecule
using both harmonic oscillator (green) and anharmonic (blue) models.
The following Table lists the recommended scale factors for some levels of theory (See Scale factors
at cccbdb.nist.gov for a more comprehensive list).
35
4.2 Procedure & Examples
We will calculate the enthalpy of reaction ∆Hr of: n-butane −→ iso-butane. This reaction is an
isomerization reaction. In order to calculate ∆Hr accurately and compare with experimental data,
we will use four different levels of theory: HF/6-31G(d), B3LYP/6-31G(d), MP2/6-31G(d), and
CBS-QB3. MP2 is an ab-initio post-HF correlation method based on perturbation theory (see the
Box next page). CBS-QB3 is a composite method that seeks for high accuracy by combining the
results of several calculations, at significantly lower cost (see the Box next page). CBS-QB3 and
other composite methods does not require a basis set specification.
The enthalpy of reaction ∆Hr , sometimes called heat of reaction, is simply calculated as the
enthalpy change between the products and the reactants at a given temperature and pressure (∆Hr
is designated ∆Hr◦ at standard conditions: 1 atm and 298.15 K):
Notice that in thermodynamics and thermochemistry, we do not care about the mechanism of the
reaction, i.e the steps and conditions of converting reactants to products, we don’t care about TSs
nor intermediates. The only thing we need is the enthalpy difference, entropy difference, or Gibbs
energy difference between the products and the reactants.
We start by optimizing the geometry and frequency calculation of both iso-butane and n-butane
with the simplest and fastest of the four levels: HF/6-31G(d). When the calculation is finished,
open the output (.log) files of both molecules. Before calculating thermochemical data for the
reaction (∆E0 , ∆U , and ∆H) let’s see what what is the value ∆E (E total electronic energy) for
this reaction. We find E = −157.298977837 a.u. and E = −157.298409304 a.u. for iso-butane and
n-butane, respectively. Therefore ∆E = −0.000568533 atomic unit. In order to be able to compare
with the experimental value, we have to convert to kJ/mol1 : ∆E = −1.49 kJ/mol. The reaction
is exothermic, meaning that iso-butane is more stable than n-butane. However, the experimental
value of the enthalpy change is −8.60 kJ/mol. Obviously, there is something wrong! Let’s see if
there will be any improvement if we now consider the thermochemical results of our calculations.
In the output file, the “Thermochemistry“ part comes after the Harmonic frequencies. Search for
“Zero-point correction“. You will see the following (iso-butane output file)
The first number is the zero point energy (EZPE ) which should be added to the total electronic
energy E to obtain the E0 value. The next three numbers are the thermal corrections to E, H, and
G. Note that in Gaussian, the value of the thermal energy in the output includes the ZPE, do not
add both of them to an energy value. However, the final values: U , H, and G, after adding total
electronic energies, ZPEs and thermal corrections printed in the last three lines.
1
The conversion factor is 1 a.u. = 2625.5 kJ/mol
36
Theoretical Background: Electron Correlation Methods
The correlation energy is defined as the difference The true molecular wave function Ψ contains con-
between the exact energy and the energy of the HF tributions from configurations other than the one
approximation for the state under consideration. that makes the main contribution, so we express
Electron correlation energy Ecorr for a system is Ψ as a linear combination of the CSFs Φi :
thus calculated, for a given basis set, as: X X
ΨCI = a0 ΦHF + aS ΦS + aD ΦD + · · ·
Ecorr = Eexact − EHF (4.6) S D
Subscripts S, D, T, etc., indicate determinants
where Eexact is the exact energy of the system. As that are Singly, Doubly, Triply, etc., excited rela-
guaranteed by the variational principle, the elec- tive to the HF configuration. Then Ψ is regarded
tron correlation energy Ecorr is always negative. as a linear variation function. Variation of the
Typically, the correlation energy is defined within coefficients ai to minimize the variational integral
the finite basis set used, and the convergence with leads toR the equation det(HRij − ESij ) = 0 where
respect to increasing the basis set size is then con- Hij = Φi ĤΦj and Sij = Φi Φj . The configu-
sidered separately. It should be noted that Ecorr ration functions in a CI calculation are classified
is not a constant through the whole PES, it be- as singly excited, doubly excited, triply excited,
comes greater at points far from the equilibrium. . . . , according to whether 1, 2, 3, . . . electrons are
The instantaneous correlations between motions excited from occupied to virtual orbitals.
of electrons is one source of Ecorr , called dynamic For n electrons and b basis functions, the number
correlation (see below). There is another contri- of configuration functions turns out to be roughly
bution to Ecorr called nondynamic correlation will proportional to bn . A CI calculation that includes
be discussed later. The most common ab-initio all possible configuration functions with proper
correlation methods post-HF that add electron dy- symmetry is called a full CI (FCI) calculation.
namic correlation on a HF wavefunction are: Because of the huge number of configuration func-
Configuration Interaction (CI); tions, FCI calculations are out of the question ex-
Moller-Plesset (MPn): MP2, MP3, MP4, etc; cept for small molecules and small basis sets. In
most calculations, the correlation energies involv-
Coupled Cluster (CC): CCSD, CCSD(T), etc. ing the inner-shell electrons change only slightly.
Configuration Interaction: First- and higher- Hence one usually makes the approximation of in-
order corrections to the wave function will mix cluding only configuration functions that involve
in contributions from excited configurations, pro- excitation of valence-shell electrons. The omis-
ducing configuration interaction (CI). One starts sion of excitations of inner-shell (core) electrons is
by choosing a basis set of one-electron functions called the frozen-core (FC) approximation.
χi . The SCF molecular orbitals ϕi are written In order to develop a computationally tractable
as linear combinations of the basis-set members,
model, the number of excited determinants in the
and the HF equations are solved to give the coef-
CI expansion must be truncated. CI based on
ficients in these linear combinations. The number
single-electron excitation only, the so-called CIS
of MOs obtained equals the number of basis func-
tions used. The lowest-energy orbitals are the oc- method, leads to no improvement of the HF en-
cupied orbitals for the ground state. The remain- ergy or wave function. The simplest procedure to
ing unoccupied orbitals are called virtual orbitals. use that actually leads to improvement over HF
is the so-called CID method, which is restricted
Using the set of occupied and virtual spin-orbitals,
to double-electron excitations. A somewhat less
one can form antisymmetric many-electron func-
restricted recipe, the so-called CISD method, con-
tions that have different orbital occupancies.
Moreover, more than one function can correspond siders both single- and double-electron excita-
to a given electron configuration. Each such tions. The next level in improvement is inclu-
many-electron function Φi is a Slater determinant sion of the triply excited determinants, giving
or a linear combination of a few Slater determi- the CISDT method. Taking into account also
nants. Use of more than one Slater determinant quadruply excited determinants yields the CIS-
is required for certain open-shell functions. Each DTQ method. An important problem of trun-
Φi is called a configuration state function (CSF). cated CI methods is their size-inconsistency.
37
Theoretical Background: and Electron Correlation Methods Contd’
Perturbation Theory: In a perturbation ap- Unlike variational methods, as CI, in which the
proximation, one divides Ĥ into two parts: energy is an upper bound to the exact energy, per-
0
Ĥ = Ĥ + λĤ ′
(4.7) turbation methods offer no such guarantee. Never-
theless, the size extensivity of MP methods com-
where Ĥ 0 , the unperturbed system, can be solved bined with the low cost relative to CI methods
exactly, Ĥ ′ is the perturbation, and the system make MPn a good choice for including correlation.
with Hamiltonian Ĥ = Ĥ 0 + λĤ ′ is the perturbed
The convergence of MPn series is oscillatory. In
system. When λ is zero, we have the unperturbed
many systems the electron correlation is not a
system. As λ increases, the perturbation grows
small perturbation and the convergence of per-
larger, and at λ = 1 the perturbation is full. The
turbation series MPn is not guaranteed. MPn
wave function ψn and energy En of state n of the
has convergence problems if HF is a poor start-
perturbed system can be written as
ing point or if spin contamination is large.
ψn = ψn(0) + λψn(1) + λ2 ψn(2) + · · · + λk ψn(k) + · · ·
Coupled-Cluster Method: Coupled-cluster
En = En(0) + λEn(1) + λ2 En(2) + · · · + λk En(k) + · · · theory expresses the exact wave function within
the basis set approximation as,
The Rayleigh-Schrodinger first-order energy cor-
rection is Ψ = eT̂ ΦHF (4.8)
where ΦHF is a single CSF HF determinant that
Z
En(1) = ψn(0)∗ Ĥ ′ ψn(0) dτ = ⟨ψn(0) |Ĥ ′ |ψn(0) ⟩ = Ĥnn
′
is used in the SCF process to generate a set of
and En ≈ En + En = En + Ĥnn . The first- spin-orbitals. The operator T̂ , called the cluster
(0) (1) (0) ′
38
Theoretical Background: Composite Methods
A quantum chemistry composite methods (also re- by −0.00481 × (number of valence electrons) −
ferred to as thermochemical recipes) are compu- 0.00019 × (number of unpaired valence electrons).
tational chemistry methods that aim for high ac- The two numbers are obtained calibrating the
curacy by combining the results of several calcu- results against many experimental results. The
lations. They combine methods with a high level scaled ZPE and HLC are added to the final E.
of theory and a small basis set with methods that
CBS Methods: The basis set has a finite num-
employ lower levels of theory with larger basis sets.
ber of members and hence is incomplete. The in-
They are commonly used to calculate thermody-
completeness of the basis set produces Basis-Set
namic quantities, and aim for chemical accuracy.
Incompleteness Error (BSIE), also called Basis-
Gaussian-n theories: The first systematic Set Truncation Error (BSTE). A common proce-
model chemistry of this type with broad appli- dure to reduce BSIE is to do a series of calcula-
cability was called Gaussian-1 (G1) introduced by tions using one method with two, three, or four
John Pople. This was quickly replaced by the G2 increasingly larger basis sets and extrapolate the
which has been used extensively. The G3 and G4 results to what one hopes is a value close to the
were introduced later. The G2 uses 6 calculations: complete-basis-set (CBS) limit. The extrapolation
is commonly done in two steps. One first does HF
1. Geometry optimization is obtained by
calculations with a series of basis sets to estimate
MP2/6-31G(d) including all electrons. HF . Then one does a
the CBS HF energy limit E∞
2. The highest level of theory is a series of a correlated calculations with the same
QCISD(T)/6-311G(d). MP2 and MP4 en- series of basis sets and uses an extrapolation for-
ergies are also calculated in this step. mula to estimate the CBS correlation energy E∞ corr .
3. The effect of polarization functions is as- The estimate of the CBS molecular energy is then
sessed using MP4/6-311G(2df,p) level. found as E∞ HF + E corr .
∞
4. The effect of diffuse functions is assessed us-
The Complete Basis Set (CBS) methods are a fam-
ing MP4/6-311+G(d,p) level.
ily of composite methods, the members of which
5. The largest basis set is 6-311+G(3df,2p)
are: CBS-4M, CBS-QB3, and CBS-APNO, in in-
used at the MP2 level of theory.
creasing order of accuracy. These methods offer
6. A HF/6-31G(d) geometry optimization fol- errors of 2.5, 1.1, and 0.7 kcal/mol when tested
lowed by frequency calculation to obtain the
against the G2 test set. The CBS methods extrap-
zero-point vibrational energy (ZPE). olate several single-point energies to the “exact”
The various energy changes are assumed to be ad- energy. In comparison, the G-n methods perform
ditive so the combined energy is given by: their approximation using additive corrections.
Weizmann-n theories: The Weizmann-n ab ini-
EQCISD(T) from 2+[EMP4 from 3 − EMP4 from 2]+
tio methods (Wn, n =1–4) are highly accurate
[EMP4 from 4 − EMP4 from 2]+ composite theories with no empirical parameters.
[EMP2 from 5 + EMP2 from 2− These theories are capable of sub-kJ/mol accura-
EMP2 from 3 − EMP2 from 4] cies in prediction of fundamental thermochemical
quantities such as heats of formation and atom-
The second term corrects for the effect of adding
ization energies, and unprecedented accuracies in
the polarization functions. The third term cor-
prediction of spectroscopic constants. The abil-
rects for the diffuse functions. The final term cor-
ity of these theories to successfully reproduce the
rects for the larger basis set with the terms from
steps 2, 3 and 4 preventing contributions from be- CCSD(T)/CBS (W1 and W2), CCSDT(Q)/CBS
ing counted twice. Two final corrections are made (W3), and CCSDTQ5/CBS (W4) energies relies
to this energy. The ZPE is scaled by 0.8929. And on judicious combination of very large Gaussian
an empirical correction is then added to account basis sets with basis-set extrapolation techniques.
for factors not considered above. This is called Thus, the high accuracy of Wn theories comes
the higher level correction (HLC) and is given with the price of a significant computational cost.
39
Let’s now compute ∆E0 for the reaction. ∆E0 = (−157.158207837+157.157239)×2625.5 kJ/mol =
−2.54 kJ/mol. This result is still very far from the experimental value (−8.60 kJ/mol), but it is in-
teresting to see that it is closer than that of ∆E, meaning that ZPEs are important in determining
energies of chemical systems. Let’s try ∆Ur = (−157.152785 + 157.151702) × 2625.5 kJ/mol =
−2.84 kJ/mol. A minor improvement due to the thermal corrections but still very far from being
correct. We also find that ∆Hr = −2.84 kJ/mol. This expected since we know from general chemistry
that ∆Ur = ∆Hr when the P V term = 0, or ∆n = 0.
Next is B3LYP/6-31G(d). Doing the same as for HF/6-31G(d) we obtain the results in the Table
below. The improvement of all results is clear. B3LYP and other DFT functionals contains some
electron correlation, but still ∆Hr = −3.45 kJ/mol is not acceptable. We need to include more
correlation. MP2 seems to be the best so far: ∆Hr = −8.40 kJ/mol compared with the experimental
value of ∆Hr = −8.60 kJ/mol. Clearly, electron correlation is important for this problem.
∆E ∆E0 ∆U ∆H
Level of Theory kJ/mol kJ/mol kJ/mol kJ/mol
HF/6-31G(d) −1.49 −2.54 −2.84 −2.84
B3LYP/6-31G(d) −2.04 −3.29 −3.45 −3.45
MP2/6-31G(d) −1.45 −8.16 −8.40 −8.40
CBS-QB3 −7.98 −8.12 −8.12
Experiment −8.60
To extract the thermochemical results of a CBS-QB3 calculation from a Gaussian output file,
search for CBS-QB3 Energy in the file. You’ll have the following: CBS-QB3 (0 K)= is E0 , CBS-QB3
Energy= is U , CBS-QB3 Enthalpy= is H, and CBS-QB3 Free Energy= is G.
Now ∆Hr obtained with the composite method CBS-QB3 (∆Hr = −8.12 kJ/mol) seems to be
less accurate than that of MP2. The CBS-QB3 high accuracy composite method seeks to extrapolate
a coupled cluster calculation to the basis set limit at significantly lower cost. In principle, CBS-QB3
should be more accurate that MP2 with a double zeta basis set. The MP2/6-31G(d) result turns out
to be fortuitous. The MP2/6-311+G(2d,p) level predicts an isomerization energy of −9.94 kJ/mol.
The larger basis set again lowers the predicted ∆Hr , causing MP2 to overshoot experiment.
These results shows the importance of adding ZPE and thermal corrections to electronic ener-
gies in order to be able to compare with experimental thermochemical results. It also shows the
importance of electron correlation. However, the need of ab-initio electron correlation depends on
the system in hand. For example, hybrid DFT functionals gives excellent results for thermochemical
properties for some systems, but fail for other chemical system. In most cases, it is difficult to predict
which functional will perform better, unless it has been tested on similar systems. Sometimes, using
ab-initio correlation methods, such as MP or CCSD, is more “safe” to obtain reliable results, but
the cost of these accurate methods sometimes make it not practical for large chemical systems.
40
4.3 Exercises
Exercise 1
Repeat Exercise 5 in Experiment 3. In this exercise, calculate the vibrational frequencies of formalde-
hyde using the same level [B3LYP/6-31G(d)] but this time with a scaling factor. Add a new column
to the Table in that exercise to be able to compare. Did you obtained improved results in comparison
with experimental data?
[Hint] : Use the scaling factor appropriate for this level of theory (0.960). To add the scaling
factor in the input file, add: Scale=O.960 in the Additional Keywords box in the Method in the
Gaussian Calculation Setup, then click on Update.
Exercise 2
Oxygen’s paramagnetic nature results from the electronic configuration of the molecule’s ground
state. Determine the spin multiplicity of the ground state of oxygen by optimizing the singlet and
triplet forms in order to determine which one is lower in energy. Perform both calculations using
CCSD/6-311+G(2d,p) level, and predict the energy at standard conditions (298.15K and 1 atm).
Which multiplicity for O2 is more stable?
Exercise 3
Naphthalene and azulene are both aromatic hydrocarbons and have the same molecular formula,
C10 H8 , but they are geometric isomers, i.e. they differ in their structural formula. Perform cal-
culations for both molecules at the following levels: HF/6-31G(d), B3LYP/6-31G(d), B3LYP/6-
311++G(d,p). Use appropriate scaling factor for each level. Compare with the experimental value:
∆H ◦ = −35.3 kJ/mol. Use appropriate scaling factor for each level. Which of the two geometric
isomers is more stable?
Exercise 4
Calculate the standard Gibbs free energy change ∆G◦r for the following reactions using B3LYP/6-
31G(d) and using an appropriate scaling factor. Compare with experimental data reported with each
equation. Pay attention to the stoichiometric coefficients in the balanced equations!
Exercise 5
Some important quantities of interest in thermochemistry are:
Ionization potential: the energy required to remove an electron from a compound (e.g.,
forming a positive ion from a neutral species). For example, the ionization potential for water
is: E(H2 O+ ) − E(H2 O).
Electron affinity: the energy released when an electron is added to a compound (as in
forming a negative ion from a neutral species). For example, the electron affinity for CN is:
E(CN) − E(CN− ), which yields a positive value in this case. Negative electron affinities are
also possible, and they indicate that energy is required to attach the electron.
41
Proton affinity: the energy released when a proton (H+ ) is added to a compound, (e.g.,
forming a positive ion from a neutral species). For example, the proton affinity for water is
computed as: E(H2 O) − E(H3 O+ ) (the total energy of a proton is zero since it has no electron).
Atomization energy: the energy difference between a compound and its monoatomic com-
ponents. For example, for water, this is the difference between its energy, E(H2 O), and the
sum: E(O) + 2E(H). The opposite of atomization energy is the binding energy.
[Hint] : The enthalpy of an atom is the total energy obtained from a single point energy calculation
plus a constant correction: atomic thermal enthalpy correction = 23 RT = 0.00236 hartrees
Exercise 6
The Bond Dissociation Energy (BDE) for a molecule A–B is calculated as the difference in the
enthalpies of the products and reactants for homolysis:
For example, BDE of C–H bond in benzene is the energy change of the following reaction:
benzene −→ benzene radical + neutral hydrogen atom.
The IUPAC definition of bond dissociation energy refers to the energy change that occurs at 0K,
and the symbol is D0 . However, it is commonly referred to as BDE, the bond dissociation energy, and
it is generally used, albeit imprecisely, interchangeably with the bond dissociation enthalpy, which
generally refers to the enthalpy change at room temperature (298K). Although there are technically
differences between BDEs at 0K and 298K, those difference are not large and generally do not affect
interpretations of chemical processes. BDE at 298K is sometimes denoted D0298 .
Determine the bond dissociation energies D0298 using the CBS-QB3 of C–H bonds in: methane
(439.3 kJ/mol), ethylene (463.2 kJ/mol), acetylene (557.8 kJ/mol), benzene, (472.4 kJ/mol) and
formaldehyde (368.6 kJ/mol). Numbers in parenthesis are experimental values.
Exercise 7
Using an appropriate level, predict ∆Hr◦ and ∆G◦r for the reaction where ethyl radical abstracts a
hydrogen atom from molecular hydrogen:
42
Experiment 5
Including Solvent & Solvation
5.1 Introduction
Up to now we have concentrated on essentially isolated molecules, which are in essence models of the
gas phase. Arguably the most interesting chemistry occurs in the solution phase and it is therefore
important to be able to be able to predict the influence of the solvent on the calculated property.
Gaussian, and most other computational chemistry programs, allows molecules to be modeled in the
solution phase by use of polarizable continuum models (PCM).
Here we use an example calculation on the relative base strength of ammonia and pyridine to provide
an example of the calculation of the energetics of a reaction and also the inclusion of solvation effects.
We begin by posing the problem of: which is the stronger base, ammonia or pyridine? Experimen-
tally pyridine is the stronger base in the gas phase but ammonia is the stronger base in the aqueous
phase. This question like many others regarding chemical reactivity, can be better understood if it
is expressed in terms of a chemical equilibrium. In this case the equilibrium involves the transfer of
a proton between ammonia and pyridine i.e
This equilibrium describes a competition by ammonia and pyridine for the proton:
If ammonia is the stronger base, the right side of the equilibrium will be favored: ∆Gr < 0.
If pyridine is the stronger base the left side will be favored: ∆Gr > 0.
In fact ∆Gr can be difficult to calculate in solution (we will treat this subject later), so in many
cases one consider enthalpy changes ∆Hr , especially when entropic effects are negligible: ∆Gr ≈ ∆Hr .
Moreover, we will assume another simplification: ∆Hr ≈ ∆Er , where E is the electronic total energy
calculated in an electronic structure calculation1 . For our example, this will be
43
In this exercise we will calculate the energy of each of the reactants and products and thereby
obtain the energy change for the reaction ∆Er . We will then perform the same calculations in the
presence of a water solvent and see if any change has occurred.
Build each of the four molecule, two neutral and two ionic molecules. For the pyridine molecule,
you can either use the ready templates of the aromatics available when you double click the benzene
ring icon on the main window; or use the benzene ring template, then select a nitrogen aromatic
atom type from the periodic Table. By clicking on any position on the benzene ring a nitrogen atom
will be inserted as shown below.
Prepare the four input files for geometry optimization using M06-2X/def2-svp level. M06-2X is
a hybrid meta-GGA density functional, but it is not available in the menu of DFT functionals in
Methods. The def2 family of basis sets (which has an excellent performance with DFT functionals
in general) is also not available in the menu of basis sets. In order to include these in the input file,
we have to add them “manually”. In the Gaussian Calculation Setup click on Edit, after saving
the file, the input file in a text format will open. Of course, you can also open it as a text file using
any text editor. When the file is open, you’ll see this (the following input file is for ammonia):
Here we have three sections: The line beginning with # is the route section for this job. The
route section specifies the job type and the level of theory: method/basis. In our example, we are
seeking an Opt followed by Freq at the HF/3-21G level (the level is the default since we did not
specify any method or basis).
The line with “Title Card Required” is the title section for the job, which provides a description
of the job (not used by the program and can be left). Gaussian require a blank (empty) line before
and after title section.
44
After the title section comes the third section: the molecule specification section. The first line
of the molecule specification gives the charge and spin multiplicity for the molecule (0 1). The
remaining lines specify the element type and Cartesian coordinates in angstroms for each atom in
the molecule. In addition to these three sections, we can also add lines asking to generate a chickpoint
file, specify memory (RAM), specify shared processors, etc.
To add the functional/basis in the text input file, write m062x/def2svp instead of the hf/3-21g.
The functional M06-2X is typed m062x in Gaussian, and the basis def2-svp is typed def2svp. Check
the Gaussian 09 manual if you are not sure how to write a particular functional or basis set. Submit
the four calculations. 0Record the energy for each and calculate the reaction energy in kJ/mol or
kcal/mol. Do you have agreement with the experimental observation that pyridine is the stronger
base in the gas phase?
Now you will proceed to model each reactant and product in aqueous solution (water as solvent).
For this purpose, open the Solvation menu in the Gaussian Calculation Setup, then choose CPCM
as the Model and Water as the Solvent.
CPCM stands for Conductor-like Polarizable Continuum Model. This and other similar models,
are termed implicit or continuum solvation models: the solvent is represented as a continuous medium
instead of individual “explicit” solvent molecules. Explicit solvation, on the other hand, means that
if you want to study the effect of water solvent on a molecule, then you should literally add so many
water molecules around the solute as in the true and actual situation (see the Figure below). The
problem with explicit modeling of solvent is that it is a very huge and complicated calculation.
Instead of modeling the solvent effect explicitly, all implicit models consider the solvent as a
continuum and its effect is accounted through its uniform dielectric constant. The solute is placed
into a cavity within the solvent. Implicit models differ in how they define the cavity and the solvent
polarization. Implicit models are generally computationally efficient and can provide a reasonable
description of the solvent behavior, but fail to account for the local fluctuations in solvent density
around a solute molecule. The density fluctuation behavior is due to solvent ordering around a solute
and is particularly prevalent when one is considering water for example as the solvent.
Submit all four calculations with the same level as that for the gas phase. It is important to
emphasize that in cases of comparison of different systems, the same level of theory should be used,
unless he effect of the level of theory on the value of energy or any property is the objective. When
you open the text input files to add the functional and basis, pay attention to the syntax of the
solvation model and the solvent in the route section.
When completed calculate the energy of reaction, and again give the answer as to which molecule
is the stronger base in water? Do you find any significant difference between the gas phase and
aqueous phase result? Provide an explanation for any conclusions reached.
It will be instructive to compare the differences between the aqueous phase energy for each species
and the gas phase values. This gives an estimation of the solvation energy for each species. How do
they compare and how might they explain your observations?
45
Theoretical Background: Basis Sets. Part II
Ahlrichs and def2 Type Basis Sets: This type + 1(5) = 14 for cc-pVDZ]. Note that as we go from
include DZ, TZ and QZ quality for the elements up one basis set to the next, the number of sets of ba-
to Kr. The Split Valence Polarized (SVP) basis set sis functions of each angular-momentum l value is
is a (7s4p) → [3s2p], while the Triple Zeta Valence increased by one and one set of functions with the
(TZV) basis set is a (11s6p) → [5s3p]. Quadruple next higher l value is added. For an H atom, the
Zeta Valence (QZV) basis set, being a (15s8p) → cc-pVDZ set is [2s1p] and cc-pVTZ set is [3s2p1d]
[7s4p] contraction. The recently developed Def2- etc. The cc-pVDZ set is roughly comparable to
bases, form a system of segmented contracted ba- the 6-31G** set, while the cc-pVTZ set is roughly
sis sets for the elements H–Rn. The respective comparable to the 6-311G**.
basis set types are named def2-SV(P) to def2- The addition of diffuse primitive nonpolarization
QZVPP. Ahlrichs def2 family is recommended for and polarization functions to the cc-pVXZ ba-
DFT calculations on light main-group elements sis sets gives the augmented sets aug-cc-pVDZ,
and 1st row TM elements (H-Kr). For each fam- aug-cc-pVTZ, etc., especially suitable for calcu-
ily, SV, TZV, and QZV, there exist two sets of lations on anions and hydrogen-bonded species.
polarization functions leading to: def2-SV(P) and To form the set aug-cc-pVXZ from cc-pVXZ, the
def2-SVP, def2-TZVP and def2-TZVPP, and,def2- number of sets of basis functions of each angular-
QZVP and def2-QZVPP. For instance: def-SV(P) momentum l value is increased by one by the ad-
is for routine SCF or DFT; quality is about 6- dition of diffuse primitives. Thus, aug-cc-pVTZ
31G*. def-TZVP is for accurate SCF or DFT; is [5s4p3d2f] for a first-row atom. For calcula-
quality is slightly better than 6-311G**. def- tion of the electric polarizability of molecules, the
TZVPP is for MP2 or close to basis set limit SCF convergence rate with increase in basis-set size is
or DFT; comparable to 6-311G(2df). greatly increased by using doubly augmented (d-
aug-cc-pVXZ) sets. For a first-row atom, d-aug-
Atomic Natural Orbital Basis (ANO) Sets:
cc-pVTZ is [6s5p4d3f]. The addition of certain
Basis sets aimed at producing accurate wave func-
primitive Gaussians to the cc-pVXZ sets gives the
tions often employ a general contraction scheme,
cc-pCVDZ, cc-pCVTZ, . . . sets (where CV stands
such as ANOs and cc-pVXZ basis sets. In ANO
for core/valence), which are designed for calcula-
a large PGTO set is contracted to a fairly small
tions that include correlation effects involving the
number of CGTOs by using natural orbitals from
core electrons. Diffuse functions can be added to
a correlated calculation on the free atom, typically
CV sets to give the aug-cc-pCVnZ sets.
at the CISD level. ANO contraction “automati-
cally” generates balanced basis sets, e.g. for neon Effective Core Potentials: Systems involving
the ANO procedure generates the following basis atoms from the lower part of the periodic table
set: [2s1p], [3s2p1d], [4s3p2d1f] and [5s4p3d2f1g]. have a large number of core electrons. These are
Furthermore, in such a sequence the smaller ANO unimportant in a chemical sense, but it is neces-
basis sets are true subsets of the larger. sary to use a large number of basis functions to
expand the corresponding orbitals, otherwise the
Dunning’s Correlation-Consistent Basis
valence orbitals will not be properly described. In
Sets: Dunning and co-workers have developed
the lower half of the periodic table relativistic ef-
the CGTF basis sets cc-pVDZ, cc-pVTZ, cc-
fects further complicate matters. These two prob-
pVQZ, cc-pV5Z, and cc-pV6Z, designed for use
lems may be “treated” simultaneously by model-
in electron correlation methods. Here, cc-pVDZ
ing the core electrons by a suitable function and
stands for correlation-consistent, polarized valence
treating only the valence electrons explicitly. The
double-zeta. Unlike the Pople-type functions, the
function modeling the core electrons is usually
cc family of functions always uses five d functions,
called an Effective Core Potential (ECP). Popular
seven f functions, etc. These sets are defined
ECPs include LANL ECPs of Hay and Wadt (e.g.
for the elements H-Ar, and Ca-Kr. For first-row
LANL2DZ a double zeta (DZ) quality combined
atoms, the CGTO present in the cc basis sets are
with ECP), and the Stuttgart-Dresden ECPs de-
cc-pVDZ cc-pVTZ cc-pVQZ cc-pV5Z
veloped by Dolg. Another set for the 4p, 5p, and
[3s2p1d] [4s3p2d1f ] [5s4p3d2f 1g] [6s5p4d3f 2g1h]
14 30 55 91
6p elements has been developed by Dyall, which
where the last row gives the number of basis func- are designed to be the ECP-equivalent to the cc-
tions for a first-row atom [for example, 3(1) + 2(3) pVXZ basis sets of Dunning.
46
Theoretical Background: Restricted HF and Unrestricted HF
The Slater determinant has been written in terms The UHF wave function allows different spatial
of spin-orbitals, Equation (1.2), being products orbitals for the two electrons in an orbital. As
of a spatial orbital and a spin function (α or restricted-type wave functions put constraints on
β). If there are no restrictions on the form of the variation parameters, the energy of a UHF
the spatial orbitals, the trial function is an Un- wave function is always lower than or equal to a
restricted Hartree-Fock (UHF) wave function. corresponding R(O)HF-type wave function. For
If the interest is in systems with an even num- singlet states near the equilibrium geometry, it is
ber of electrons and a singlet type of wave func- usually not possible to lower the energy by allow-
tion (a closed shell system), the restriction that ing the α and β MOs to be different. For an open-
each spatial orbital should have two electrons, one shell system such as a doublet, however, it is clear
with α and one with β spin, is normally made. that forcing the α and β MOs to be identical is a
Such wave functions are known as Restricted restriction. If the unpaired electron has α spin, it
Hartree-Fock (RHF). Open-shell systems may will interact differently with the other α electrons
also be described by restricted-type wave func- than with the β electrons, and consequently the
tions, where the spatial part of the doubly oc- optimum α and β orbitals will be different. The
cupied orbitals is forced to be the same; this is UHF description, however, has the disadvantage
known as Restricted Open-shell Hartree-Fock that the wave function is not an eigenfunction of
(ROHF). For open-shell species, a UHF treatment the S2 operator (unless it is equal to the RHF so-
leads to well-defined orbital energies, which may lution), where the S2 operator evaluates the value
be interpreted as ionization potentials. For an of the total electron spin squared. This means
ROHF wave function, orbital energies from an that a “singlet” UHF wave function may also con-
ROHF wave function are not uniquely defined and tain contributions from higher-lying triplet, quin-
cannot be equated to ionization potentials. The tet, etc., states. Similarly, a “doublet” UHF wave
differences between these types of wave functions function will contain spurious contributions from
are illustrated in the Figure below. higher-lying quartet, sextet, etc., states.
47
5.3 Exercises
Exercise 1
Using the same level of theory as the example explained above, assess the influence of the methanol
solvent on the relative base strength of ammonia and pyridine.
Exercise 2
Using the same level of theory as the example, predict the relative acidities of sulfuric and ethanoic
acid in the gas and aqueous phases.
Exercise 3
Optimize the geometry of formaldehyde in gas and in acetonitrile. Compare the bond lengths in
both geometries.
Exercise 4
The molecular structural changes that occur in solution affect the molecular properties. Even small
changes in geometry can change the chemistry of a molecule. Repeat Exercise 5 in Experiment 3 and
Exercise 1 in Experiment 4. In this exercise, calculate the vibrational frequencies of formaldehyde
using the same level [B3LYP/6-31G(d)] with the scaling factor in acetonitrile solvent. Add a new
column in the Table in that exercise to be able to compare. Add another column in order to compute
the shift in cm−1 due to solvent effect. Are all vibrational modes shifted in solvent by the same value?
Exercise 5
Gibbs free energy, which can be used to compute the solvation energy of a molecule: the energy
change going from the gas phase to solution. The solvation energy can be computed for the same
compound with several solvents in order to understand its relative solubility in different environments.
We want to compare the solubility of acetic acid in several solvents: chloroform and water.
Optimize the geometry of acetic acid in the gas phase and in both solvents (optimization in
solvents can be started from the gas phase-optimized structure) at the M06-2X/def2-TZVP level.
Repeat all calculations twice: one using the CPCM solvation model, the other using the SMD
solvation model2 .
We will compute solvation energy as the difference of the predicted Gibbs free energy values in
the gas phase and in solution, taken from the two frequency calculations:
where Gsolv is the Gibbs free energy of solvation. Fill the Table below in order to compare with
the experimental value. Which of the two solvents the acetic acid is most soluble? Does this result
confirm your general information about the solubility of acetic acid in both solvents?
∆Gsolv kcal/mol
Quantity CPCM SMD Experiment
Chloroform −4.74
Water −6.70
2
SMD (the Solvation Model based on Density) is a solvation model designed in particular for solvation energies.
48
In some cases (e.g. for very large problems and/or in computing environments where resources
are limited) a frequency calculation cannot be performed, we can subtract the solute total energy
from the SMD total energy, both computed at the gas phase optimized geometry
Exercise 6
Consider the hydrolysis of esters reactions. Methyl acetate reacts with water (in a neutral solution
reaction), or with a base, e.g. hydroxide ion (basic solution reaction), to produce acetic acid/acetate
ion and methanol.
Predict the Gibbs free energy of both reactions in the gas phase and in solution. Use an appropriate
level of theory and solvation model.
49
Experiment 6
Locating & Optimizing Transition States
6.1 Introduction
Locating minima for functions is fairly easy. If everything else fails, the steepest descent method is
guaranteed to lower the function value. Finding first-order saddle points, Transition Structures or
Transition States (TS), is much more difficult. There are no general methods that are guaranteed to
work!
The optimization facility can be used to locate transition structures as well as ground state
structures since both correspond to stationary points on the potential energy surface PES. Gaussian
provides two methods for locating a transition structure:
1. By specifying a reasonable guess for the transition state geometry and directing the optimizer
to locate a first order saddle point. However, this can be challenging in many cases.
2. By automatically generating a starting structure for a transition state optimization based upon
the reactants and products that the transition structure connects. This technique is known as
a QST2 optimization.
QST2 uses the STQN method. STQN (Synchronous Transit-guided Quasi-Newton) employs a
quadratic synchronous transit approach to get closer to the quadratic region of the transition state
and then uses a quasi-Newton or eigenvector-following algorithm to complete the optimization.
Input files using QST2 must include two molecule specification sections. Gaussian have features
which automate this process. The facility generates a guess for the transition structure which is
midway between the reactants and products.
A variation of the QST2 approach allows you to specify a starting structure for the transition
state in addition to reactants and products. It is known as QST3.
Once the TS has been found, the whole reaction path may be located by tracing the intrinsic
reaction coordinate (IRC) (Experiment 8), which corresponds to a steepest descent path, from the
TS to the reactant and product.
50
6.2 Procedure & Examples
We will find and optimize the transition state of a simple 1,2 hydrogen shift (or rearrangement)
reaction: H3 CO −→ H2 COH.
The first step, start by building the reactant and the product. Build the reactant H3 CO by
removing a H atom from the hydroxyl group of a methanol molecule. This should produce a neutral
doublet. Do not build the product H2 COH by removing a H atom from the carbon of a methanol
molecule! This will not work! STQN (QST2 and QST3) requires that corresponding atoms appear
in the same order (have the same number) within the two molecule specifications. Of course, the
bonding in the two structures does not need to be the same.
The correct way to build the product is to copy/paste the reactant then transfer one of the three
H atoms on the carbon to be attached on the oxygen atom. See the Figure below, in which H atom
number 2 in the reactant file (left) is transferred and bonded to the O atom in the product input file
(right). Notice that this particular H atom has the same atom numbering in both files.
The second important step is to optimize both geometries: the reactant H3 CO and the product
H2 COH. You can clean the product H2 COH before optimization. Cleaning a geometry can be done
by clicking on the Clean icon1 . optimize the geometry and make sure that both structures are
actually minima. Use the HF/6-31G(d) level of theory.
The third step is to build the input file for QST2 calculation. Open a new viewer (File > New
> Create Molecule Group) and copy/paste the optimized reactant H3 CO. Open the optimized
product, adding the product molecule to the molecule group containing the reactant by File > New
> Add to Molecule Group. The molecule group will now contain two structures.
1
This function adjusts the geometry of the molecule to more closely match chemical intuition. It is often helpful
when building ground state structures. However, it should not replace a geometry optimization.
51
Note that it does not matter whether the reactant is structure 1 or structure 2. Use the Gaussian
Calculation Setup dialog to specify the transition structure optimization. Set the Opt+Freq then
select Optimize to a TS(QST2). Note that the default was optimize to a minimum.
Set the calculation at HF/6-31G(d) level. The job is now ready for submission. If we examine
the input file for this QST2 optimization, we find it has the following format:
The input file contains two complete molecule specifications. Note how the QST2 keyword is
specified: OPT=QST2.
52
Run the calculation. The optimized transition structure is shown below. In the predicted TS,
the hydrogen atom is weakly linked to both the carbon and the oxygen atoms. The following bond
lengths are found: C–O: 1.37, C–H: 1.28, O–H: 1.19. Finally, it is very important to verify the number
of imaginary frequencies in order to verify that we have obtained a TS. We really have 1 imaginary
frequency (-2603 cm−1 ), indicating that we have a first order saddle point (TS). The normal mode
corresponding to the imaginary frequency indicates movement of the hydrogen atom between the
carbon and the oxygen atoms!
In the second example, we want to find the transition structure for the rearrangement of azirine
into acetonitrile: a 1,2 hydrogen shift reaction again. If we try to run a QST2 calculation for the
TS of this reaction, Gaussian will usually fail to be able to produce a starting structure from which
to begin the optimization process. This is not surprising in that the reactant and product differ in
several structural features. For such cases, a QST3 calculation is needed.
Accurate starting structures are needed for the QST procedures to function well. Accordingly,
the first step here is to build and optimize [WB97XD/6-311+G(2d,p), see below] the azirine reactant.
Verify that the resulting structure is a minimum with a frequency calculation.
Beginning from the optimized reactant geometry, transform the reactant (azirine) into the ace-
tonitrile product. This can be done as the following: Remove the double bond. Change the N–C–C
bond angle to 180◦ . Switch the bond for the single hydrogen atom to the other carbon atom. Repo-
sition the hydrogen atom so that it is near the carbon atom to which it is now bonded. Clean the
structure. Again, creating the product structure by starting with the reactant structure is important
because the STQN procedure in Gaussian requires that corresponding atoms appear in the same
order within the two molecule specifications. Optimize the product, and verify that it is a minimum.
Finally, suggest a guess for the TS. Of course, the guess should be well “educated”. The reaction
involve a transfer (or shift, or migration) of a H atom from the “middle carbon” to the “terminal”
one. In the same time, the N atom will break the single bond with the terminal C, and will therefore
make a ≈ 120◦ angle with the imaginary line connecting the two C atoms. See below Figure on the
right. The left and the middle Figures are for the reactant and the product respectively.
53
QST3 calculations require three molecular structures as input: the reactants, the products and a
starting guess for the transition structure (the order of the first two does not matter, but the starting
TS guess must be the third molecule specification). After the definition of the three structures in a
molecule group, set up a QST3 calculation.
Challenging geometry optimizations, of which TS searches are one example, can benefit from an
extra step prior to the optimization procedure. This consists of computing the force constants at the
initial geometry as an aid to the optimizer. It is necessary for optimizing this particular transition
structure but need not be specified for most optimizations. This technique is requested in gaussview
as the Figure below, Calculate Force Constants > Once; or with the Opt=CalcFC option in the
text input file.
For this calculation, choose a robust DFT functional with a “large” triple zeta basis set, like, for
instance: WB97XD/6-311+G(2d,p) level. However, depending on the guess TS structure, this level
may not work. In this case, try with other functional/basis, or try to change slightly the geometry
of the TS, for instance, make the single hydrogen atom bonded to the carbon atom in the CN group
in the reactant at larger distance away from the methyl carbon atom (e.g. 1.5 A) than a normal C-H
bond, its distance to the other carbon atom is at e.g. 1.7 A.
After the QST3 optimization is complete, the subsequent frequency calculation finds one imagi-
nary frequency, indicating that the structure is a transition state. The optimized transition structure
is illustrated below. The normal mode corresponding to the imaginary frequency indicates movement
of the hydrogen atom between the two carbon atoms.
The optimization begins from a structure which is a much deformed version of the reactant, with
a C-C-N bond angle of about 130◦ . The single hydrogen atom bonded to the carbon atom in the CN
group in the reactant is at larger distance away from the methyl carbon atom than a normal C-H
bond. The optimization ends with a C–C–N bond angle increases to about 170◦ .
54
There are two pieces of information that are critical to characterizing a stationary point:
A structure which has n imaginary frequencies is an nth order saddle point. Thus, ordinary transition
structures are characterized by one imaginary frequency since they are first-order saddle points.
It is important to keep in mind that founding one imaginary frequency does not guarantee that you
have found the TS in which you are interested. Whenever a structure yields an imaginary frequency,
it means that there is some geometric distortion for which the energy of the system is lower than
it is at the current structure (indicating a more stable structure). In order to fully understand the
nature of a saddle point, you must determine the nature of this deformation.
One way to do so is to examine at the normal mode corresponding to the imaginary frequency and
determine whether the displacements that compose it tend to lead in the directions of the structures
that you think the TS connects. The symmetry of the normal mode is also relevant in some cases.
Animating the vibrations is often very useful. A more accurate way to determine what reactants and
products the TS connects is to perform an IRC calculation to follow the reaction path and thereby
determine the reactants and products explicitly (see Experiment 8).
Note that in both cis and trans conformers, the H atom at C2 and the H atoms at C3 are
staggered, as it should be for a minimum. Let’s find the TS for the reaction cis-1-fluoropropene −→
trans-1-fluoropropene. Of course, this is not a chemical reaction but merely a 180◦ rotation around
the C=C double bond. Optimize the reactant and the product at the WB97XD/6-311G(d,p) level,
and verify that both are minima. Both conformers produce no imaginary frequencies, and the cis
form is lower in energy than the trans form by about 3.9 kJ/mol (≈0.9 kcal/mol). The cis and the
trans conformers are the global minimum and the local minimum, respectively.
Set up and run a QST2 job at the same level. The output TS (see the Figure below) is an eclipsed
trans-conformer! This is a big surprise. Based on your chemical “intuition”, do you think that this
TS is the right one that really connects the two minima defined above? Let us check this TS.
The TS produces one imaginary frequency, indicating that this conformation is a transition
structure and not a minimum. But what two minima does it connect? Is it the TS for the cis-to-
trans conversion reaction (i.e. rotation about the C=C bond)?
55
We look first at the energies of the three compounds:
It is customary in computational chemistry to set the relative energy of the reactant(s) as zero,
and compare all other, products, intermediates, and TS relative to this reactant. The trans-eclipsed
TS is only about 0.5 kJ/mol (≈0.1 kcal/mol) higher in energy than the trans-staggered conformation,
a barrier which is quite less than one would expect for rotation about the double bond (see Exercise
5 in Experiment 3).
To investigate the TS further, we next examine the frequency data and normal mode correspond-
ing to the imaginary frequency. Note that the magnitude of the imaginary frequency is not very
large (−183), indicating that the geometric distortion by the molecule is modest. When we animate
this normal mode, we see that the largest motion is the rotation of the three hydrogen atoms in the
methyl group (C3).
From all of this, we can deduce that this TS connects two structurally equivalent minima, and
that the path between them corresponds to a methyl rotation. This is not the TS that we are looking
for! Of course you should know what is the TS for the desired rotation. The objective of this example
was only to show that a TS found using QST2 (or any other method) may not be the desired TS
and a deeper check of the TS is necessary. After all, saddle points always connect two minima on
the PES, but these minima may not be the reactants and products of interest.
Our last example is a 1,3 hydrogen shift in 1-fluoropropene (left) resulting in 3-fluoropropene
(right):
56
It can be challenging to create your starting product structure by copying and modifying the
optimized reactant, and it will require many steps. Another alternative is to build and optimize
the product separately, and then specify the corresponding atoms using the Gauss View’s Edit >
Connection Editor. In order to make this tool work optimally, copy/past both molecules in one
molecule group, split the screen (in order to see both molecules in one viewer, this is done by clicking
behind the number counter), then we add bonds between the migrating hydrogen atom and the
target carbon atom in all structures before opening the Connection Editor. Now when you open the
Connection Editor, you’ll have the following as illustrated below:
Notice that the atom numbering is quite different. For example, the fluorine atom is atom 5 in
the first structure and atom 9 in the second. The STQN facility in Gaussian will not be able to
locate the proper transition state given these input structures. However, if we click on the Enable
Autofixing button, GaussView will successfully identify all corresponding atoms and modify the
atom ordering accordingly. If the Autofixing feature in the Connection Editor is not successful in
aligning the two structures, atom ordering can be done manually. Left click on an atom to select it,
and then right click on a second atom, and their two atom numbers will be swapped.
After autofixing the numbering of the reactant and the product run a QST2 and frequency
calculation (you may train yourself buy building a guess for the TS and run a QST3 job; verify if
you obtain the same result!). We find one imaginary frequency for the TS. When we examine the
associated normal mode for this imaginary frequency using animation, we observe that the majority
of the motion in this mode involves the shifting hydrogen atom, so it appears that this is the correct
transition structure (see below where the normal mode motion is indicated by displacement vectors).
This could be confirmed with an IRC calculation, as well discuss later. The large magnitude of
the frequency (about −1927 cm−1 ) also indicates a substantial change in structure. Finally, the
predicted energy barrier of ≈ 415 kJ/mol is of a reasonable order of magnitude for such reactions (H
shift reactions). This energy barrier is actually the activation energy Ea of the reaction.
57
6.3 Exercises
Exercise 1
Using the QST2 method, find, optimize, and verify the TS for the reactions: SiH2 + H2 −→ SiH4
and GeH4 −→ GeH2 + H2
Exercise 2
Perform geometry optimization and frequency calculation on the two vinyl alcohol isomers (see below)
at the HF/6-31G(d) level. Are both conformers minima? Which of them is the global minimum?
Guess the TS for the reaction between the two conformers and optimize it as a transition state. To
do this, choose optimize to TS(Berny) or simply add opt=ts in the route section of the input file.
Verify if this is the desired TS that connects the two minima of interest.
Exercise 3
Find the TS and the relative energies (in both kJ/mol and kcal/mol) for the reactant, TS and the
two products, for the conversion of isopropylazide to dimethylimine plus molecular nitrogen:
The reaction is a one step (concerted) reaction in which it involves a hydrogen migration from the
central carbon atom to the adjacent nitrogen atom as well as N2 elimination.
Exercise 4
Find the TS and the relative energies (in both kJ/mol and kcal/mol) for the reactant, TS and
product, for the 1,3 fluorine shift reaction in 1-fluoropropene giving 3-fluoropropene.
Exercise 5
Consider the reaction
−
NH3 + CH3 Cl −→ NH3 CH+
3 + Cl (6.2)
Predict the free energy change for the reaction, in both the gas phase and in aqueous solution. In
addition, compute the predicted activation free energy ∆Ga for the reaction in solution. The reaction
proceeds through an SN 2 transition structure in solution. In the gas phase, the reaction is most likely
a two-step process. Use the SMD model in all solvent calculations. Compare with the experimental
values: ∆Gr = 110 kcal/mol (gas phase); ∆Gr = 34 kcal/mol (aq soln); ∆Ga = 23.5 kcal/mol (aq
soln).
Build the TS initial structure in order to be used as the initial structure for an Opt=(TS,CalcFC)
calculation or as the third structure for an Opt=QST3 calculation.
[Hint: Building the TS Initial Structure for an SN 2 reaction:] Here is one method using Gaussview
for building the starting structure for the TS optimization:
58
2. Open a new viewer to place the atom.
3. Change the axial hydrogens to nitrogen and chlorine and the silicon to carbon. Add three
hydrogens to the nitrogen atom.
4. Change both heavy atom bonds to the carbon to half bonds. Clean the structure.
6. Use the point group symmetry feature to impose C3v symmetry on the structure: Open Edit
> Point Group, activate by clicking Enable Point Group Symmetry in the upper left. The
current molecular symmetry will then be displayed at the upper right. You can impose symme-
try on the molecule using some controls The Tolerance popup specifies how close the structure
must be to symmetric before a point group can be applied. As you loosen the tolerance, ad-
ditional point groups will appear in the popup menu to the left. You can select the desired
popup, and then click the Symmetrize to modify the structure so that it attains that symmetry.
59
Experiment 7
Chemical Reaction Mechanisms: Reaction Coordi-
nate Scans & Potential Energy Surfaces
7.1 Introduction
In this experiment, we consider an important technique for investigating chemical reactions. We will
study another techniques in the next experiment.
Thus far in our examples, we have treated the study of reactivity by focusing on molecular
geometries such as reactants, products, intermediates, and transition structures. The relative energies
of these are all we need to predict thermochemical properties such as the enthalpies of reaction or
activation free energies. However, these are only stationary points on a much larger potential energy
surface (PES). The actual landscape of this surface can also be explored to see how the various
stationary points connect. Details about these pathways are important in validating mechanisms
especially when more than one pathway between reactants and products can exist.
In general, the PES for a reaction is a 3N -dimensional surface where N is the number of atoms. It
is impossible to calculate or visualize the entire PES for any but the simplest of reactions. A judicious
choice of internal coordinates to explore is the first step in any investigation. Theoretical predictions
of potential energy surfaces and reaction paths can sometimes yield quite surprising results.
Consider rotational isomerism in allyl cation. We are interested in knowing how difficult it would be
to twist this molecule given that it is held together by a double bond. One suggested path between
the two forms is via a perpendicular transition structure having Cs symmetry. A plausible way to
begin an investigation of this reaction is to attempt to locate a saddle point on the potential energy
surface corresponding to this hypothesized transition structure.
Our chemical reaction involves an allyl cation reactant, and an allyl cation product, but with
one set of external hydrogen atoms are exchanged, as in the left two structures in the following
illustration (the same H atom is highlighted in blue):
The three structures shown above comprise the required input for a QST3 transition structure
60
search for this process. We run this calculation using the WB97XD/6-31G(d) level, calculating the
force constants at each step. We do the latter because this is a tricky optimization which failed to
converge with several other less costly approaches:
# WB97XD/6-31G(d) Opt(QST3,CalcAll)
When we animate the imaginary frequency from this calculation, we are disappointed to see that
the reaction coordinate is not what we sought, but instead involves the hydrogen atom of the middle
carbon migrating to the terminal carbon, suggesting that the exchange of the two terminal hydrogen
atoms does not occur by the mechanism we thought. Instead, it could involve a stepwise process:
1. An initial hydrogen shift from the central atom to the terminal carbon atom.
3. A second hydrogen shift from the terminal carbon atom back to the center.
Each of these steps would involve its own TS with associated activation barriers. For steps 1
and 3, the TS is the same because of symmetry. For step 2, the methyl rotation passes through two
barriers, but these are much smaller than the migration barrier. The migration barrier (Ea for step
1) is found to be ≈ 130 kJ/mol (check if this value is right).
The failed transition structure search was due to our poor guess at how this process actually
occurs. This may not be a general result for C=C bond rotation since environmental effects (e.g.,
gas phase versus solution) and/or changes in the chemical structure (e.g., methyl groups instead of
hydrogen atoms) may change the landscape of the PES.
It is possible to simply sample the PES in a region that corresponds to the process in which you
are interested. This type of calculation is called a scan.
The scan is an automated (it can also be manual) calculation of energy at each point in the
reaction coordinate. There are two types of scan procedures:
61
1. A rigid scan takes a geometric structure and freezes all the coordinates in place except for the
particular coordinate being scanned. A single point energy calculation is performed for each
generated structure.
2. A relaxed scan does a partial optimization at each point of the scan, freezing the scan coordinate
and optimizing all others. In other words, each optimization locates the minimum energy
geometry with the scanned parameters set to specific values.
In our example of the allyl cation, we want to find the TS for twisting the C=C in allyl cation. We
will force the twist angle to take on a set of values in the range from 0◦ to 180◦ , evaluating the energy
or partially optimizing the geometry at each point.
Gaussview includes features for setting up relaxed potential energy surface scan calculations. To
do so, you build or import the desired molecule in the usual manner, and then you click in Edit
> Redundant Coordinate specify what structural parameters should be scanned. This dialog is
illustrated below:
Click on Add, then select the atoms corresponding to the desired scan (4 atoms for dihedral
angle, 3 atoms for an angle, and two atoms for bond lengths). In our case select the four atoms as
shown in the Figure above in order to scan the rotation around the C=C double bond. Scroll down
Unidentified and select Dihedral, then scroll down Add and select Scan Coordinate. Now that
you have defined what parameter you want to scan, you have to define the number of points in the
scan and the range of your scan. This is defined by: Take and Step(s) of size. Write 18 for Take
and 10 for Step(s) of size. This means the following: You have asked for a scan of 18 points,
with 10 degree of interval between each point and the next one, i.e. a scan from 0 to 180 degrees
with 10 degrees interval (if you ask for 18 point, it will calculate 19). Finally, confirm by clicking OK.
When you open the Gaussian Calculation Setup and choose Scan as Job Type, select the
Relaxed (Redundant Coordinate) subtype. These are the default when you have defined scan
variables with the Redundant Coordinate Editor. GaussView will add the appropriate Opt keyword
and option to the jobs route section.
62
It is also useful to know how to generate a rigid scan. See this video for an automated relaxed
and rigid scan procedure. Of course, rigid scans can also be done manually.
When the calculation is finished, the output file will contain 19 optimized geometries (check
them). In addition, you can obtain the PES by clicking on Results > Scan. You can even save the
data in a text file that can be used later on (right click on the graph, choose save data).
The following two Figure shows the results of our two scans (the relaxed scan, and the rigid scan),
which each incremented the scan variable (the H–C–C–H dihedral angle) by 10◦ at each step.
The two scans are quite different. A smooth PES is suggested by the rigid scan, but that is
misleading because we are only examining the effect of one degree of freedom. Real molecules
travel on relaxed potential energy surfaces with many degrees of freedom. The relaxed scan shows a
discontinuity for our hypothetical reaction process, suggesting that a more complicated set of atomic
motions is involved in the actual reaction coordinate.
While scan calculations provide considerable insight into the structure of the PES, they do not
define the lowest energy path between two structures. In order to plot the actual lowest energy
pathway, we need to follow our transition structure downhill to the reactants and to the products
rather than simply stepping across the PES. An intrinsic reaction coordinate (IRC) calculation does
precisely this; we will discuss this calculation type in the next experiment.
7.3 Exercises
Exercise 1
In order for an optimization of a TS jobs to work without problems, one needs to have previously
found a geometry close to the TS. Guessing a TS is usually too difficult. One method to obtain a
guess of the structure of a TS close to the “real” TS is to find an internal coordinate that resembles
the real reaction coordinate sufficiently well and run a relaxed surface scan.
Perform a relaxed scan to find the TS of the SN 2 reaction: CH3 Cl + F− −→ CH3 F + Cl− . The
distance between F and C (or Cl and C) is the obvious reaction coordinate for this reaction. Starting
with the F− ion in 3.0 Å distance from the CH3 Cl molecule, run a relaxed scan until the distance is
1.2 Å. Of course, the fluorine ion will attack the carbon from the opposite side of the C–Cl bond,
as you know for a typical SN 2 reaction. Use an appropriate level of theory and use an appropriate
interval for the scan in order to obtain a smooth curve. Copy past the geometry of the TS (the
geometry with the highest energy) and optimize it as a TS (Opt=TS). Estimate ∆Hr◦ and ∆Ha◦ for
this reaction.
63
Exercise 2
In this exercise, you will explore the PES associated with a single C-H bond breaking in methane:
CH4 −→ CH3 · +H (notice the methyl radical formed in the reaction). Begin with an optimized
tetrahedral methane molecule with 0.75 Å C–H bond length. Perform a relaxed scan over one of the
C–H bond by increasing it in steps of 0.1 Å for 26 steps. Use a 6-311G(d,p) basis set with each of
the following methods: UHF, UB3LYP, UAPFD, UMP2, UCCSD, UCCSD(T).
Use an unrestricted model (indicated by a U prefixed to the method keyword) and also use the
Guess(Mix,Always) keyword. The former removes the restriction that all electrons remain paired
(since the products include a radical species), while the latter is necessary to create an appropriate
wavefunction throughout the bond distance range covered in the scan. At distances beyond 1.5
Angstroms, the two electrons that had occupied the sigma bonding orbital are now in separate,
singly occupied orbitals each with opposite spin. We describe this as an open-shell singlet, since the
total spin is still zero, but not all orbitals are doubly occupied. When you finish all calculations and
plot all PESs from all methods in one graph, you should obtain a Figure such as this one:
Exercise 3
Perform a relaxed PES scan that varies the N–C–C–N dihedral angle in n-methyl-(2-nitrovinyl)
amine [IUPAC: 1-methoxy-N-methyl-2-nitroethenamine]. This process transform the E form to the
Z form in this compound. Let the scan from start from 180◦ (the E form) to 0◦ (the Z form), in −5◦
steps. Perform this process both in gas phase and in solution with ortho-dichlorobenzene and N,N-
dimethylformamide solvents using the SMD solvation model. Select an appropriate level of theory.
You may observe a discontinuity at around 95◦ due to the sudden inversion of pyramidalization at
the carbon atom attached to the nitro group. Compare between the rotational barriers of the gas
phase with that of solution (in both solvents).
64
Experiment 8
Chemical Reaction Mechanisms: Intrinsic Reac-
tion Coordinate (IRC)
8.1 Introduction
Successfully completing a transition structure optimization does not guarantee that you have found
the right transition structure: the one that connects the reactants and products of interest. One
way to determine the minimum to which a TS structure connects is by examining the normal mode
corresponding to the imaginary frequency and determining whether or not the motion tends to
deform the transition structure as expected. However, it is often difficult to tell for certain. In this
experiment, we will discuss a more precise method for determining what points on a potential energy
surface are connected
An intrinsic reaction coordinate (IRC) calculation examines the reaction path leading down from
a TS on a potential energy surface. Such a calculation starts at the saddle point and follows the
path in both directions from the transition state, optimizing the geometry of the molecular system
at each point along the path. In this way, an IRC calculation definitively connects two minima on
the potential energy surface by a path which passes through the TS between them.
Note that two minima on a potential energy surface may have more than one reaction path
connecting them, corresponding to different TSs through which the reaction passes. From this point
on, we will use the term reaction path to designate the intrinsic reaction path predicted by the
IRC procedure, which can be qualitatively thought of as the lowest energy path, in mass-weighted
coordinates, which passes through a given saddle point.
Reaction path computations allow you to verify that a given TS actually connects the starting
and ending structures that you think it does. Once this fact is confirmed, you can go on to compute
an activation energy for the reaction by comparing the appropriate energy values of the reactants
and the transition state.
In Gaussian, a reaction path calculation is requested with the IRC keyword in the route section.
Before you can run one, however, certain requirements must be met. An IRC calculation begins at
65
a transition structure and steps along the reaction path an n number of times (the default is 10) in
each direction, toward the two minima that it connects. However, in most cases, it will not step all
the way to the minimum on either side of the path.
To accurately predict the barrier for the reaction, you need to perform some additional compu-
tations in order to collect all required data. Optimization+frequency calculations for the reactants
and the products will predict the thermally-corrected energy, enthalpy or free energy (depending on
your specific requirements as well as any experimental data with which you plan to compare).
The entire process can be repeated for a different reaction path, starting from a different saddle
point, in order to explore other possible ways to move from the reactants to the products. In this
way, you can perform a comprehensive exploration of a potential energy surface.
Lets us continue on the allyl cation example discussed in the last Experiment. Let us perform
an IRC calculation (Job Type > IRC, default variables) on the optimized TS that we have obtained
previously, with the same level of theory used. The plot on the left in the following figure is the IRC
computed for the transition structure we located in the last experiment (Results > IRC/Path):
Notice that you have 21 geometry in the output file represents all the geometries calculated on
the reaction path. If you check these structures, in particular the first and the last one, you will
66
notice that these two structures are still far from being the two minima of the TS in question. In
order to obtain these two, you can either Opt+Freq these two final geometries (on both sides) or
simply modify the number of steps in each direction to, e.g., 50 or even 100 if needed. Below is
the same previous calculation but with the number of steps (30) in each direction (Compute more
points, N= 30, in the IRC menu.).
It is clear that the IRC have reached the two minimums. But are they really the true minimums?
Take a look at the angle of the final geometries (in both sides) between the three carbon atoms. Are
the reactant and the products optimized in the last experiment have the same angle? Those tow
geometries still need optimization followed by frequency calculations!
1. Occasionally, you may need to increase the number of steps taken in the IRC in order to get
closer to the minimum (see above); the MaxPoints option in the input text file specifies the
number of steps to take in each direction as its argument (the default is 10). You can also resume
a completed IRC calculation from its checkpoint file by using the IRC=(Restart,MaxPoints=n)
keyword, setting n to some appropriate value. All this can also be done using Gaussview:
Compute more points, N=, in the IRC menu.
2. If you want to follow the IRC all the way to the reactants and products, you can include
the Recorrect=Never option in addition to increasing the value of MaxPoints; the former
option suppresses a corrective action which is taken when the computed value of the relevant
IRC parameter exceeds a threshold. Using this option results in a significantly less accurate
reaction path, and so using it is appropriate only when your goal for the IRC calculation is
verification of the endpoints rather than the reaction path itself.
3. By default, IRC calculations follow the reaction path in both directions from the TS. You can
limit the direction to the forward or reverse direction with the IRC=Forward and IRC=Reverse
options, respectively (or using Gaussview).
4. The endpoints of the IRC generally do not correspond to the minima as obtained via geometry
optimizations. They are likely to be close to the minima, but you’ll need to run an optimization
to obtain the true minimum structure.
5. Note that the final energy of the products in an IRC calculation may not equal the sum of the
energies of the isolated molecules. An IRC terminates when the energy reaches a minimum for
the molecular complex, a level which may be different than the sum of the isolated product
molecules.
67
8.3 Exercises
Exercise 1
In this exercise, you will explore the formaldehyde H2 CO PES. There are several minima on this
surface: formaldehyde, hydroxycarbene (HCOH: both cis and trans), and hydrogen molecule plus
carbon monoxide. Each corresponding to different reactant/product combinations. We will consider
these two reactions in this exercise:
H2 CO −→ CO + H2 (8.1)
H2 CO −→ trans-HCOH (8.2)
The first reaction is a dissociation reaction, the second is a 1,2 H migration (or shift). Run opti-
mization+frequency calculations for all reactants and products in these reactions. Then locate and
optimize the TSs using the Opt=QST2 procedure and run IRC calculations.
So far we have obtained five stationary points: three minima (formaldehyde, trans hydroxycar-
bene, and carbon monoxide plus hydrogen molecule) and the two transition structures connecting
formaldehyde with the two sets of products. One obvious remaining step is to find a path between
the two sets of products. Determine the reaction path connecting trans hydroxycarbene and H2 +CO,
and predict the activation energy. This reaction occurs via a two-step process, with trans HCOH
first converting to the cis form and then dissociating into carbon monoxide and hydrogen molecule:
68
Experiment 9
Excited Electronic States
9.1 Introduction
In this experiment, you will compute the electronic excited states (ES) of some chemical chromophores
in solvent.
Optimize the geometry of anthracene using B3LYP/6-31G(d,p) level. Copy/paste the optimized
geometry a new viewer and perform an excited state calculation for anthracene as explained in the
following steps:
2. On Method, change the default Ground State and select TD-SCF, then choose CAM-B3LYP/6-
311G(d,p) as the level of calculation.
3. Use the the CPCM solvation model and select cyclohexane as solvent.
4. Run the calculation using 2 processors (Link 0, Shared Processors, Specify. . . 2). When the job
is done, go to Results and select UV-Vis. A spectrum should be generated. Right click and
“Save data” in order to save the spectra into a text file.
TD-SCF is an abbreviation of Time Dependent Self Consistent Field. Actually, the form of the
Schrödinger equation you have learned in physical chemistry 3 is the time independent Schrödinger
equation. The TD-SCF method calculates the electronic excited states (ES) using the time-dependent
Schrödinger equation. In addition, CAM-B3LYP is a range separated hybrid (RSH) functional, de-
signed for excited states with charge transfer (CT) character. Of course, there is no CT in anthracene,
but it is fine to compute its ES using this functional.
The spectrum plotted using Results > UV-Vis for anthracene excited state calculation looks
like this:
69
Notice that λmax and the oscillator strength (the computational equivalent of the intensity of the
absorption, or the absorptivity coefficient ε) are reported on the plot, you can change the position
of the pointer to any wavelength.
You can also open the ES calculation output file (*.log file) as a text file. Search for “Excitation
energies”. Here uou’ll find the calculated data of the lowest 3 excited states. For example, if you
open the anthracene output file and search for “Excitation energies”, you will have the following data
printed for the lowest three excited states:
Note that for excited state 1, in the first line it is printed: 348.56 nm f=0.0836, which are the
λmax and oscillator strength you have found by the generated UV-Vis spectrum using Gaussview!
Moreover, look at the second line, it is printed: 47 -> 48 0.6990. This important information
means the following: This excited state is a transition of the electron from orbital number 47 (the
HOMO) to orbital number 48 (the LUMO) with a coefficient of 0.6990 (or 0.7 to simplify). You may
also be interested in the 2nd or 3rd ESs. You can even ask more ESs in the input file to be calculated
(the default is 3 ESs).
Therefore, we have concluded that the main absorption band in the calculated UV-Vis spectra
of the previous calculation is an electronic transition mainly from the HOMO to the LUMO orbital.
Now, the question is: What is the nature of those orbitals (47 and 48 or HOMO a LUMO)?
Open the *.chk file (as usual from Gaussview) and look for these orbitals to see what nature they
possess. By considering the nature of the orbitals, we can decide if the transition is a charge transfer
(CT) transition, a π → π ∗ , an n → π ∗ transition, etc. Obviously, the main band in anthracene is a
π → π ∗ transition.
70
Predicting Fluorescence: Optimizing Excited State Geometries
Excited state calculations using the ground state geometry will predict the vertical excitation energies
of the molecule, which are necessary to understand the process of electronic absorption. However, if
we want to study an emission process like fluorescence, the potential energy surface of the excited state
must be explored. The lowest energy predicted for the optimized excited state structure corresponds
to the energy emitted when the molecule fluoresces as it returns back to its ground state.
This small molecule is a simple push-pull chromophore whose fluorescence spectrum contains
emission from both a local excited state (LE) typical of a benzene ring and a second peak whose
origins are hotly debated. Many recent experimental investigations have confirmed that the process
leading to the anomalous peak most certainly involves competing geometric distortions that are
strongly influenced by solvent. In this exaple we will demonstrate how to find stationary points on
the excited state surface of this molecule using TD-DFT.
The observed UV-Vis spectrum of DMABN has a feature at 291 nm and can also exhibit shoulder
peaks on either side depending on the polarity of the solvent; these features correspond to the
absorption of a photon. Fluorescence occurs at ∼350 nm; again depending on the solvent, the
spectrum may include a second fluorescence peak at ∼475 nm, (In eV, these values are ∼4.26, ∼3.54
and ∼2.61, respectively.) The next figure from shows the spectra in two different solvents:
In the non-polar solvent n-hexane, there is an absorption peak and a second fluorescence peak.
In contrast, in the polar solvent acetonitrile, the first peak is slightly red-shifted. It also contains a
small shoulder in the area outlined in red (it is obscured by the line for n-hexane). There are also two
fluorescence peaks in this environment: one with small intensity at ∼350 nm and the more obvious
second peak at the longer wavelength ∼475–485 nm.
71
Our first step is to optimize the ground state of this molecule. Notice the position of the methyl
hydrogens in the illustration below.
Molecular builders (including GaussView and WebMO) may position the hydrogen atoms differ-
ently, and the resulting optimization may locate a transition structure with an imaginary frequency
corresponding to methyl rotation. We were careful to begin the optimization from a structure with
the C-N-C-H dihedral angles equal to 180◦ , -6◦ and +6◦ on both methyl groups. We also placed the
nitrogen atom in a slightly pyramidal position (not planar) and made sure that the point group was
Cs . Finally, we included Opt=CalcFC in this optimization because the force constant associated with
the out-of-plane wagging of the nitrogen atom is not well estimated.
Our optimization was successul, and it found a structure with Cs symmetry that has a very
slightly pyramidal nitrogen atom (the angle is ∼4 degrees), so the predicted geometry is nearly
planar except for two sets of methyl hydrogen atoms. The lengths of the middle C-C bonds within
the ring are increased by ∼0.4 Å in the optimized structure, and the C-C bond between the ring
and the amide group is longer by ∼0.2 Å. Our next step is to run a TD-DFT calculation on the
optimized ground state geometry, which predicted the following two lowest singlet excited states:
These calculated values are in reasonable agreement with gas phase experiments. Examining the
molecular orbitals allows us to identify the primary characters of these excited states as follows:
1. The lowest energy state is a local excitation (LE) involving the electrons in the π orbital of the
benzene ring being excited to a π ∗ orbital.
2. The second excited state is primarily an intramolecular charge transfer (ICT) state from the
donor amino group nitrogen atom to the cyano group (acceptor).
We continue this study by examining the excited state potential energy surface. We will begin
by optimizing the geometry of the excited state, which will lead us to an approximate value for the
emission wavelength coming from the local excited state. Later on, we will locate a second stationary
point on the excited state PES.
In order to optimize the structure of the first excited state, we begin with the optimized ground
state structure. We impose C2v symmetry on the molecule, and then perforin an optimization
plus frequency calculation using the TD(Root=1) keyword and our standard model chemistry. The
optimization locates a C2v minimum.
Next, we will locate a stationary point on the excited state surface that is known to be impor-
tant in understanding the dual fluorescence phenomenon in this compound. The so-called twisted
intermodular charge transfer (TICT) state is one in which electrons of the amino group have been
72
donated to the ring, allowing the hybridization at the nitrogen atom to be more purely sp2 and
allowing rotation about the N–C bond such that the methyl groups are now above and below the
plane of the ring:
We created the guess structure by taking the optimized ground state geometry and twisting the
CCNC dihedral angles to be 90◦ . The resulting structure can be fixed to Cs symmetry before running
the optimization.
The C–C bond changes with respect to the ground state are even more pronounced in the TICT
minimum than they were in the LE minimum. In the ring, the shortest bonds are the middle ones;
this is also the case for the ground state, although the length difference between these bonds and the
others is larger in the excited state. The C-N bond distance lengthening between the ring and the
amide group is also significant in the TICT minimum.
The transition wavelength reported in the output file from the TICT optimization, 436 nm (2.61
eV), is actually a vertical emission energy since we are now at a stationary point on the excited state
surface. This approximates the peak position for the anomalous peak in the fluorescence spectrum
but by no means completely explains it, as we have not investigated how the molecule would reach this
state after excitation. However, we can tell that it is a charge-transfer type state, clearly indicated
by the following difference density plot:
Electron density moves from the yellow to the blue areas as the molecule transitions from the
ground state to the first excited state.
These calculations have all been performed in the gas phase. In exercise 2 at the end of this
expeiment, you will continue this study by modeling this molecule in solution with a polar solvent (the
environment in which the dual fluorescence is observed) which will provide an interesting comparison.
It should also be noted that fluorescence can be modeled in a more sophisticated way using
Franck-Condon/Herzberg-Teller analysis.
73
9.3 Exercises
Exercise 1
Predict the lowest ES and plot UV-VIS spectra for boron-dipyrromethene (BODIPY) and zinc-
porphyrin in dichloromethane solvent, using an appropriate level of theory. Use many functionals
with many basis set combinations, and search in the literature for experimental data of these two
chemical chromophores. Compare your results using different levels of theory with the available
experimental data.
N
N N
N Zn N
B
F F N
BODIPY
Zinc-porphyrin
Exercise 2
Repeat the study of DMABN, this time using acetonitrile as the solvent. Predict the vertical exci-
tation energies and optimized geometries of the LE and TICT states with our standard model and
the SCRF(IEFPCM) model. How does the solvent environment affect the results as compared to the
gas phase?
Note that Gaussian (as well as most other quantum chemistry programs) does not offer analytic
TD-DFT second derivatives (i.e frequencies). Numerical TD-DFT frequency calculations can be
quite lengthy for all but the smallest molecular systems. Nevertheless, for original research destined
for publication, running the TD-DFT frequency calculation is not optional.
74
Part II
75
Experiment 10
Getting Started & Basic Calculations with ORCA
10.1 Introduction
When working with ORCA, and many other quantum chemistry programs, it is customary to
deal with the input and output files using text editors (such as notepad) under windows operating
system, or command-line editors (such as gedit or vi) under Linux or Mac operating systems.
However, molecular editors such as Avogadro (see next experiment) and Gabedit are still very useful
in building molecular systems and output file analysis.
An example of a simple ORCA input file for the calculation of a geometry optimization of H2 O is
shown below. Just “copy/paste” the following into a new text file called inputfile.inp (in Linux
or Mac) or inputfile.txt (in Windows):
The “!” sign is needed to start a keyword line that contains one or more keyword(s); the keyword
B3LYP is the DFT functional with the same name, OPT keyword is for geometry optimization job (the
keyword for a single point calculation is SP), and def2-SVP is the keyword for the basis that bears
the same name1 ; the “#” symbol is the beginning of a comment line (all characters in the same line
after the # symbol are not read by the program); the block of lines between the two asterisks * . . . *,
is the geometry block that always should start by a coordinate definition (in this case an xyz), total
1
The def2 basis sets of the Karlsruhe group, and are recommended for DFT calculations. The def2-SVP is a double
zeta with polarization basis and def2-TZVP is a triple zeta with polarization basis; see ORCA manual for more details.
76
charge (0 indicating a neutral molecule) and spin multiplicity (1 indicating a singlet state). Finally,
blank lines are allowed and text is not case-sensitive in ORCA input files.
ORCA has both a Simple keyword syntax as well as a Block syntax. The Simple input is often
the only input line needed, in addition to the geometry block. In the Simple input syntax, keywords
are added in any order to the line beginning with “!”, e.g. ! Keyword1 Keyword2 ..., like the
above example (! B3LYP OPT def2-SVP). Multiple “!” lines are allowed in ORCA.
Job types are typically specified using a simple input keyword. The default job is a single-point
energy calculation (= SP). Some other jobs and their corresponding keywords: geometry optimization:
!Opt; energy+gradient: !EnGrad, vibrational frequencies: !Freq, molecular dynamics: !MD.
Advanced settings are often specified using the Block input for different modules. Note that
settings specified in the Block input always takes precedence over the Simple input. Blocks start
with %nameofblock and end with end:
%block
block-specific keywords
end
For example, requiting tight convergence and changing the max number of SCF iterations to 100
requires creating a SCF block:
! HF SP def2-TZVP
%scf
maxiter 100
convergence tight
end
*xyz 0 1
O 0.000000 0.000000 0.000000
H 0.000000 0.759337 0.596043
H 0.000000 -0.759337 0.596043
*
In another example, we request 10-roots (10 excited electronic states) in a time-dependent DFT
(TDDFT) calculation, performed after a geometry optimization of the ground state:
! BP86 def2-TZVP
%tddft
nroots 10
end
*xyz 0 1
O 0.000000 0.000000 0.000000
H 0.000000 0.759337 0.596043
H 0.000000 -0.759337 0.596043
*
Coordinates can be either specified directly in the input file, such as in the above examples, or
read the coordinates from external files (e.g. a basename.xyz file) such as:
* xyzfile 0 1 basename.xyz
77
Running ORCA
The program will produce a number of files, such as the output file “basename.out” and “base-
name.gbw”, which contains a binary summary of the calculation. GBW stands for “Geometry-Basis-
Wavefunction”. You need this file for restarting SCF calculations or starting other calculations with
the orbitals from this calculation as input.
To run ORCA in parallel (many processors work in parallel), call ORCA by a full path:
To run ORCA in parallel, use the ! PalX keyword. For instance, to start a 4-processor job:
%pal
nprocs 16
end
MaxCore assigns a certain amount of memory to all modules. For example, %MaxCore 4000 sets
4000 MB (= 4 GB) as the limit. This limit applies per processing core. To check for an estimation
of memory requirements for a job, type the following in the input file:
ORCA will finish execution after having printed the estimated amount of memory needed per
processor. The total memory needed for a given parallel calculation = the estimated memory ×
number of parallel processors.
The implicit (or continuum) solvent models implemented in ORCA are the Conductor-like Continuum
Polarization Model (CPCM) and the Solvation Model based on Density (SMD). The CPCM model
can be used via:
! CPCM(solvent)
2
In the workstations in our computational lab, the full path is: /home/wissam/orca/orca file.inp > file.out.
Also, to run parallel calculations, MPI (Message Passing Interface, a program specialized for parallel computing)
should be installed. MPI is properly installed in all workstations in our computational lab.
78
where solvent can be: water, acetone, acetonitrile, ammonia, benzene, CCl4, CH2Cl2, chloroform,
cyclohexane, DMF, DMSO, ethanol, hexane, methanol, octanol, pyridine, THF, and toluene.
Many detailed parameters can be more accurately defined using the %cpcm block. For instance,
to use the Gaussian charge scheme (recommended with CPCM solvation models) with a scaled van
der Waals (vdW) cavity, the following tag in the %cpcm block in the input file is added:
%cpcm
surfacetype vdw gaussian
end
SMD solvation models were developed/parameterized to work with the Minnesota functionals
family. i.e. Truhlar’s M05, M06-2X etc. To use SMD, the user must simply specify smd true in the
%cpcm block and provide the name of the solvent. This automatically sets a number of default SMD
parameters. If required, the user can also manually specify the solvent descriptors used in an SMD
calculation. In ORCA, there are more than 180 solvents in the SMD library:
%cpcm
smd true
solvent "DMF"
end
The Resolution of Identity (RI) approximation (also called Density Fitting), dramatically speed up
calculations, while introducing a very small error, and is generally highly recommended. The errors
introduced due to RI approximations are usually smaller than basis set and other errors.
The use of the RI approximation always requires an auxiliary basis set and its choice depends
on what basis set is being used and what integrals are being approximated. HF, DFT, and post-HF
methods always require the calculation of Coulomb (J) and exchange (K) integrals.
A common RI method for both J and K integrals is the “RIJK” approximation that can be used
with the def2/J auxiliary basis set when using the def2 family of basis sets: !RIJK def2/JK. The
RIJK approximation is suitable for small to moderate size systems (basis functions less than ≈ 1000).
For larger systems another common approximation for both J and K integrals is the “RIJCOSX”
approximation: !RIJCOSX def2/J (see next experiment for more details).
The following are the general printing keywords for controlling what is printed in the output files:
1. !MiniPrint for minimal printing. This will only print coordinates, grid information, SCF
iterations, orbital energies and property output.
2. !SmallPrint (default). This will in addition print some basis set information, SCF settings,
minimal Mulliken, Löwdin and Mayer population analysis.
79
3. !NormalPrint will in addition print a Löwdin orbital analysis and detailed SCF iterations.
4. !LargePrint will in addition to Normalprint print the full basis set, composition of the guess
orbitals, the final molecular orbitals and density.
A General Example
we ask ORCA to perform geometry optimization (OPT) and frequency calculation (FREQ) jobs, using
the PBE0 functional (PBE0), the polarization triple zeta def2-TZVP basis set (def2-TZVP), the
RIJCOSX resolution of identity approximation (RIJCOSX), the def2/J auxiliary basis set (def2/J),
a tight SCF convergence criteria (TightSCF), in water solvent using the CPCM implicit solvation
model (CPCM(water)), for pyridine in its neutral charge and singlet state. The calculation will be
performed with 4 parallel processors (PAL4) and 1 GB of RAM (%MaxCore 1000) for each processor
(i.e. the calculation will use up to 4 × 1 = 4 GB of RAM).
When the calculation is run and finished, the xyz coordinates of the optimized geometry will
be stored in the file: basename.xyz. This geometry can be used for subsequent calculations (for
example, UV-Vis excited state calculations) using the following input file:
%tddft
nroots 10
end
*xyzfile 0 1 basename.xyz
80
Experiment 11
Avogadro as a Molecular Editor & Some Other Ad-
vanced Issues
11.1 Introduction
The Avogadro software is a free graphical user interface (GUI) that can be used to build molecular
geometries and analyze many ORCA outputs.
In this experiment, we will also encounter more details about the important RI approximations
which are used to speed up large calculations, the corresponding auxiliary basis sets that should
accompany these RI approximations, and many other related technical issues.
Many GUIs/molecular editors can be used along ORCA such as Chemcraft, Gabedit, VMD and
UCSF Chimera. OpenBabel is also very useful for file conversion to various chemical formats.
Moreover, Avogadro is an excellent tool to edit molecular geometries and analyze output results.
Avogadro is also able to generate ORCA input files.
Many of these program can be installed and executed under different operating systems: Linux,
Mac, or Windows. The student/researcher is advised to work with the GUI that is convenient to
his/her preference, experience, etc. However, we find that Avogadro is very convenient for general
purpose applications, free, and easy to learn and use.
Avogadro can be downloaded from the Avogadro official website https://avogadro.cc/ (go to
Downloads and follow the instructions/links there). Moreover, there are many excellent tutorial
for Avogadro and Avogadro related to ORCA calculations; see for instance the YouTube channel
“ IaNiusha” https://www.youtube.com/c/IaNiusha/playlists and select select the playlist entitled
“ORCA tutorials”, where you can see many tutorial explaining ORCA calculations with the aid of
Avogadro.
81
Building Molecules, Creating Inputs & Visualizing ORCA Output with Avogadro
After opening Avogadro program, you will see the main screen with the main menu:
1. Using the pencil tool, you can build a molecule using the left click to create and right click to
delete atoms.
2. The atom type is selected under “Element” (on the left) and the “Adjust hydrogens” can
automatically fix the valence of the molecule.
3. The blue star can be used to navigate (rotate) the molecule and the glove to move atoms.
4. In the navigation mode, you can zoom in/out using the scroll wheel on the mouse. You can
rotate the molecule with left mouse button and translate with the right mouse button.
5. The selection tool (black arrow pointing up-left) allows the selection of atoms or fragments.
There are three selection modes (left menu): “Atom/Bond”, “Residue”, and “Molecule”.
6. After building the structure, click on the “E” tool (the one with the green arrow pointing down-
wards) and then click on “Start” to run a simple MM optimization to “clean” the molecule.
7. You can measure the bond lengths and angles using the measure tool (the ruler).
8. You can save the geometry as an xyz file or directly use the shortcut key Control+C to copy
the xyz data and paste it on a text editor.
We can also insert many ready molecular fragments: Build > Insert > Fragment, or build molecules
with the “simplified molecular-input line-entry system” (SMILES): Build > Insert > SMILES. More-
over, Avogadro can build peptides, DNA/RNA, and nanotubes (see the Build menu).
After building a molecule, you can use Avogadro to make simple ORCA inputs. Click on
Extensions > Orca > Generate Orca Inputs to open a new window, where you can choose the
options you need, then click on Generate to generate the input after saving it.
82
If you run a calculation using ! LargePrint in the input, you can visualize the MOs by opening
the basename.out file in Avogadro. Then select the orbital you want (e.g. HOMO) on the right
panel. Higher quality images can be generated by changing the Quality option on the Configure
button. If the orbital image is not immediately available, click on Render to produce the image.
You can also go to Extensions > Create Surfaces, a new window will appear were you can
select the Surface Type and Color by to make electron density plots and electrostatic potentials:
83
Draw Orbitals Surfaces & Densities From the .gbw Files
The utility program orca plot (a standalone program) can create three dimensional graphics data
for visualization. It is also possible to run this program interactively: orca plot basename.gbw -i.
You will then get a simple, self-explaining menu that will allow you to generate a variety of files
(such as .plt and .cube) directly from the .gbw files without restarting or running a new job. The
menu will ask to select a number from:
For instance, to draw the HOMO and LUMO isosurface densities (i.e. molecular orbitals), we
need first to check the numbering of these orbitals in the output file (they are labeled with numbers
4 and 5 respectively), then type
This will generate two cube files: basename.mo4a.cube and basename.mo4a.cube that can be vi-
sualized with Avogadro: Extensions > Create Surfaces > Surface Type > Molecular Orbital.
In case your output has a frequency calculation, the vibrational modes can be animated after
loading the file and the vibrational spectra can be plotted by selecting the normal mode then click
on Start Animation.
Note that similar functionalities to Avogadro are also available with the Chemcraft and Gabedit
softwares, which are also free.
84
Energy and Geometry Convergence Criteria
The energy change for SCF procedure is controlled by the keywords in Table 11.1. TightSCF criteria
is generally recommended for calculations. Only rarely does one need to go beyond the TightSCF
setting, but for some sensitive molecular properties it may be the case.
There are also keywords for the control of the convergence criteria for geometry optimizations; Ta-
ble 11.2. Rarely needed, but if a tight optimized geometry is desired, use TightOPT or VeryTightOPT.
Increasing the SCF convergence might then be also needed. The default is NormalOPT or simply OPT.
Table 11.2: Geometry optimization convergence criteria for Energy (E), RMS gradient (RMSG),
maximum gradient (MaxG), RMS displacement (RMSD), and maximum displacement (MaxD).
Keyword Tol. E Tol. RMSG Tol. MaxG Tol. RMSD Tol. MaxD
LooseOPT 3 × 10−5 5 × 10−4 2 × 10−3 7 × 10−3 1 × 10−2
NormalOPT 5 × 10−6 1 × 10−4 3 × 10−4 2 × 10−3 4 × 10−3
TightOPT 1 × 10−6 3 × 10−5 1 × 10−4 6 × 10−4 1 × 10−3
VeryTightOPT 2 × 10−7 8 × 10−6 3 × 10−5 1 × 10−4 2 × 10−4
As mentioned in the previous experiment, the Resolution of Identity (RI) approximation (or Density
Fitting), dramatically speed up calculations, while introducing a very small error, and are generally
recommended. The errors introduced due to RI approximations are usually smaller than basis set
errors and much smaller than electronic-structure-method errors.
Use of the RI approximation always requires an auxiliary basis set and its choice depends on
what integrals are being approximated and what basis set is being used. HF, DFT, and post-HF
methods require the calculation of Coulomb (J) and exchange (K) integrals. While the Coulomb
integrals are usually done analytically, the exchange integrals can be evaluated semi-numerically on
a grid. Tables 11.3 and 11.4 lists a summary of the RI approximations available in ORCA and the
auxiliary basis sets corresponding with these RI approximations.
85
Table 11.3: Summary of RI approximations.
Table 11.4: Basis sets and auxiliary basis sets for use with RI approximations.
[1] Recommended for elements (H–Kr). Scalar relativistic versions of the Ahlrichs def2 family are recommended
for heavier elements. [2] SARC/J: Auxiliary basis set for scalar relativistic ZORA and DKH calculations. [3] The
automatic aux basis set feature AutoAux will usually give an accurate but more expensive aux basis set. [4] Unlike
the general def2/J and def2/JK sets, there are multiple RI-C auxiliary bs available and should be chosen according to
what orbital basis set is being used (or possibly use an even larger /C auxiliary basis set). [5] cc-pVnZ/JK (n=T,Q,5);
aug-cc-pVnZ/JK (n=T,Q,5); cc-pVnZ/C (n=T,Q,5); aug-cc-pVnZ/C (n=D,T,Q,5).
86
Grids & Numerical Integration
Starting ORCA 5, there is a new scheme for the quadratures used in numerical integration that use
machine learning. There are three new grid schemes named: DEFGRID1, DEFGRID2, and DEFGRID3.
DefGrid2 is the default, and is expected to yield sufficiently small errors for all kinds of appli-
cations. DEFGRID3 is a heavier, higher-quality grid, that is close to the limit if one considers an
enormous grid as a reference.
In order to change from the default DefGrid2, one just needs to add !DefGrid1 or !DefGrid3 to
the main input.
It is important to note that the COSX approximation is (Starting ORCA 5) the default for DFT,
whenever HF exchange is needed. This can be turned off by using !NOCOSX.
11.3 Exercises
Exercise 1
Using Avogadro (or any other molecular editor of your preference, build each of the following
molecules and perform a geometry optimization and frequency calculation at the B3LYP/def2-SVP
level of theory: phenol, aniline, anthracene, p-benzoquinone, p-methylphenol, benzothiazole, pyrrole,
Mn(H2 O)6 , and [Cr(NH3 )6 ]3+ .
For each of these molecular systems, use Avogadro to visualize the HOMO and LUMO, plot the
electron density, and animate the normal modes of vibration.
Repeat all the calculations above two times: 1) using the def2-TZVP basis set without RI ap-
proximation; 2) using the def2-TZVP basis set with the RIJK approximation and an appropriate
auxiliary basis set. Compare the energy and the time needed of both calculations for each molecule.
87
Experiment 12
Chemical Reactions & Transition States
12.1 Introduction
In ORCA, the following methods provide analytic first derivatives (analytic gradients): 1) Hartree-
Fock (HF) and DFT (including the RI, RIJK and RIJCOSX approximations), 2) MP2, RI-MP2
and DLPNO-MP2, 3) CAS-SCF, and 4) TD-DFT for excited states. Some methods for locating
transition states (TS) require second derivative matrices (Hessian), implemented analytically for HF,
DFT/TD-DFT and MP2 only. Additionally, several approaches to construct an initial approximate
Hessian for TS optimization are available.
A very useful feature for locating complicated TSs is the Nudged-Elastic Band (NEB) method
in combination with the TS finding algorithm (NEB-TS, ZOOM-NEB-TS). An essential feature for
chemical processes involving excited states is the conical intersection optimizer. Another feature the
MECP (Minimum Energy Crossing Point) optimizations.
Locating minima for functions is fairly easy. If everything else fails, the steepest descent method
is guaranteed to lower the function value. Finding first-order saddle points, i.e. transition structures
or transition state (TS), is much more difficult. There are no general methods that are guaranteed
to work!
The optimization facility can be used to locate transition structures as well as ground state
structures since both correspond to stationary points on the potential energy surface PES. ORCA
provides two methods for locating a transition structure:
1. By specifying a reasonable guess for the transition state geometry and directing the optimizer
to locate a first order saddle point. However, this can be challenging in many cases.
2. By automatically generating a starting structure for a transition state optimization based upon
the reactants and products that the transition structure connects: the Nudged-Elastic Band
(NEB) method.
Once the TS has been found, the whole reaction path may be located by tracing the intrinsic reaction
coordinate (IRC), which corresponds to a steepest descent path, from the TS to the reactant and
product.
88
12.2 Procedure & Examples
Relaxed surface scans are very handy: you can scan through one coordinate while all others are
relaxed. It works as shown in the following example:
In the example above the value of the bond length between C and O will be changed in 12
equidistant steps from 1.35 down to 1.10 Å and at each point a constrained geometry optimization
will be carried out.
If you want to perform a geometry optimization at a series of values with non-equidistant steps
you can give this series in square brackets, [ ]. The general syntax is as follows:
In addition to bond lengths you can also scan bond angles and dihedral angles:
In the following, the H-atom abstraction step from CH4 to OH radical is computed with a relaxed
surface scan:
89
It is obvious that the reaction is exothermic and passes through an early transition state in which
the hydrogen jumps from the carbon to the oxygen. The structure at the maximum of the curve is
probably a very good guess for the true transition state (TS) that might be located by a TS finder.
You will probably find that such relaxed surface scans are incredibly useful but also time consum-
ing. Even the simple job shown above required several hundred single point and gradient evaluations
(convergence problems appear for the SCF close to the TS and for the geometry once the reaction
partners actually dissociate – this is to be expected). Yet, when you search for a TS or you want
to get insight into the shapes of the potential energy surfaces involved in a reaction it might be a
good idea to use this feature. One possibility to ease the burden somewhat is to perform the relaxed
surface scan with a “fast” method and a smaller basis set and then do single point calculations on
all optimized geometries with a larger basis set and/or higher level of theory. At least you can hope
that this should give a reasonable approximation to the desired surface at the higher level of theory
(this is the case if the geometries at the lower level are reasonable).
It is possible to start the relaxed surface scan with a different scan parameter than the value
present in your molecule. But keep in mind that this value should not be too far away from your initial
structure. Moreover, ORCA allows up to three coordinates to be scanned within one calculation:
90
Finally, it is possible to perform multiple XYZ file scans. Such scans produce a series of structures
that are typically calculated using some ground state method. Afterwards one may want to do
additional or different calculations along the generated pathway such as excited state calculations
or special property calculations. In this instance, the multiple XYZ scan feature is useful. If you
request reading from a XYZ file via:
this file could contain a number of structures. The format of the file is:
Number of atoms M
Comment line
AtomName1 X Y Z
AtomName2 X Y Z
...
AtomNameM X Y Z
>
Number of atoms N
Comment line
AtomName1 X Y Z
...
Thus, the structures are simply of the standard XYZ format, separated by a “>” sign. After the
last structure no “>” should be given but a blank line instead. The program then automatically
recognizes that a multiple XYZ scan run is to be performed. Thus, single point calculations are
performed on each structure in sequence and the results are collected at the end of the run in the
same kind of trajectory.dat files as produced from trajectory calculations. In order to aid in using
this feature, the relaxed surface scans produce a file called MyJob.allxyz that is of the correct format
to be re-read in a subsequent run.
A TS can be found and optimized if a good estimate for the TS structure is provided. In the following
example we take the structure with highest energy of the above surface scan. The inernal coords are
obtained from Avogadro using the Cartesian coords: Extensions > ORCA > Format > Internal.
! B3LYP SV(P) TightSCF SlowConv OptTS
# performs a TS optimization with the EF-algorithm of H abstraction from CH4 to OH
%geom
Calc Hess true # calculation of the exact Hessian before the first optimization step
end
* int 0 2
C 0 0 0 0.000000 0.000 0.000
H 1 0 0 1.285714 0.000 0.000
H 1 2 0 1.100174 107.375 0.000
H 1 2 3 1.100975 103.353 119.612
H 1 2 3 1.100756 105.481 238.889
O 2 1 3 1.244156 169.257 17.024
H 6 2 1 0.980342 100.836 10.515
*
91
Again, you need a good guess of the TS structure. Relaxed surface scans can help in almost all
cases. Note also that for TS optimization (in contrast to geometry optimization) an exact Hessian,
a Hybrid Hessian or a modification of selected second derivatives is necessary. Analytic Hessian
evaluation is available for HF and SCF methods, including the RI and RIJCOSX approximations
and canonical MP2. You should check the eigenmodes of the optimized structure for the eigenmode
with a single imaginary frequency. You can also visualize this eigenmode with orca pltvib or any
other visualization program that reads ORCA output files. If the Hessian is calculated during the
TS optimization, it is stored as basename.001.hess, if it is recalculated several times, then the
subsequently calculated Hessians are stored as basename.002.hess, basename.003.hess, . . . If you
are using the Hybrid Hessian, then you have to check carefully at the beginning of the TS optimization
(after the first three to five cycles) whether the algorithm is following the correct mode (see TIP
below). If this is not the case you can use the same Hybrid Hessian again via the inhess read
keyword and try to target a different mode (via the TS Mode keyword, see below).
The utility program orca pltvib is used to animate vibrational modes and to create arrow-pictures.
This program uses an ORCA output file and creates a series of files that can be used together with
any visualization program such as Avogadro, ChemCraft, and gOpenMol.
However, Avogadro can animate the vibrational frequencies and draw all force vectors without the
use of orca pltvib. Just open the output file and all vibrational frequencies will be listed; click the
corresponding one and click on Start Animation. You can also click on Display Force Vectors.
Alternatively ORCA can produce a file for the corresponding vibrational frequency: For instance, in
the example above, open the output file and search for VIBRATIONAL FREQUENCIES and look for the
imaginary modes (there should be only one since it is a TS). In this example, the imaginary mode is
number 6 (-1398.73 cm**-1). Type the following
orca pltvib filename.out 6
this will create the file filename.out.v006.xyz which is a series of xyz coordinates (like a trajectroy
file) that can be viewd as trajectory files (again, using Avogadro or other molecular editors).
In the example above the TS mode is of local nature. In such a case you can directly combine
the relaxed surface scan with the TS optimization with the ! ScanTS keyword instead of OptTS, as
in the following example:
In the above example, the algorithm performs the relaxed surface scan, aborts the Scan after the
maximum is surmounted, chooses the optimized structure with highest energy, calculates the second
derivative of the scanned coordinate and finally performs a TS optimization. If you do not want the
scan to be aborted after the highest point has been reached but be carried out up to the last point,
then you have to type:
%geom
fullScan true # do not abort the scan with !ScanTS
92
end
The TS finder is implemented using the quasi-Newton like Hessian mode following algorithm. This
algorithm maximizes the energy with respect to one (usually the lowest) eigenmode and minimizes
with respect to the remaining 3N − 7(6) eigenmodes of the Hessian.
TIP: You can check at an early stage if the optimization will lead to the “correct” transition state.
After the first optimization step you find the output below for the redundant internal coordinates.
Every Hessian eigenmode can be represented by a linear combination of the redundant internal
coordinates. In the last column of this list the internal coordinates, that represent a big part of
the mode which is followed uphill, are labelled. The numbers reflect their magnitude in the TS
eigenvector (fraction of this internal coordinate in the linear combination of the eigenvector of the
TS mode). Thus at this point you can already check whether your TS optimization is following the
right mode (which is the case in our example, since we are interested in the abstraction of H1 from
C0 by O5.
If you want the algorithm to follow a different mode than the one with lowest eigenvalue, you can
either choose the number of the mode:
%geom
TS Mode M 1 # M 1 mode with second lowest eigenvalue
end # (default: M 0, mode with lowest eigenvalue)
end
or you can give an internal coordinate that should be strongly involved in this mode:
%geom
TS Mode {B 1 5} # bond between atoms 1 and 5,
end # you can also choose an angle: {A N1 N2 N1} or a dihedral: {D N1 N2 N3 N4}
end
93
Hessians for Transition State Calculations
For TS optimization a simple initial Hessian, which is used for minimization, is not sufficient. In a
TS optimization we are looking for a first order saddle point, and thus for a point on the PES where
the curvature is negative in the direction of the TS mode (the TS mode is also called transition
state vector, the only eigenvector of the Hessian at the TS geometry with a negative eigenvalue).
Starting from an initial guess structure the algorithm used in the ORCA TS optimization has to
climb uphill with respect to the TS mode, which means that the starting structure has to be near
the TS and the initial Hessian has to account for the negative curvature of the PES at that point.
The simple force-field Hessians cannot account for this, since they only know harmonic potentials
and thus positive curvature.
The most straightforward option in this case would be (after having looked for a promising initial
guess structure with the help of a relaxed surface scan) to calculate the exact Hessian before starting
the TS optimization. With this Hessian (depending on the quality of the initial guess structure) we
know the TS eigenvector with its negative eigenvalue and we have also calculated the exact force
constants for all other eigenmodes (which should have positive force constants). For the HF, DFT
methods and MP2, the analytic Hessian evaluation is available and is the best choice.
When only the gradients are available (most notably the CASSCF), the numerical calculation of
the exact Hessian is very time consuming, and one could ask if it is really necessary to calculate the
full exact Hessian since the only special thing (compared to the simple force-field Hessians) that we
need is the TS mode with a negative eigenvalue. Here ORCA provides two different possibilities to
speed up the Hessian calculation, depending on the nature of the TS mode: the Hybrid Hessian and
the calculation of the Hessian value of an internal coordinate (see ORCA manual for more details).
The Intrinsic Reaction Coordinate (IRC) is a special form of a minimum energy path, connecting a TS
with its downhill-nearest intermediates. A method determining the IRC is thus useful to determine
whether a transition state is directly connected to a given reactant and/or a product. ORCA features
its own implementation of Morokuma popular method. The IRC method can be simply invoked by
adding the IRC keyword as in the following example.
Note that the same method and basis set as used for optimization and frequency calculation
should be used for the IRC run. Also, the IRC keyword can be requested without, but also together
with OptTS, ScanTS, NEB-TS, AnFreq and NumFreq keywords. Moreover, the IRC code checks
whether a Hessian was computed before the IRC run. If that is not the case, and if no Hessian is
94
defined via the %irc block, a new Hessian is computed at the beginning of the IRC run. A final
trajectory ( IRC Full trj.xyz) is generated which contains both directions, forward and backward,
by starting from one endpoint and going to the other endpoint, visualizing the entire IRC. Forward
( IRC F trj.xyz and IRC F.xyz) and backward ( IRC B trj.xyz and IRC B.xyz) trajectories and
xyz files contain the IRC and the last geometry of that respective run.
12.2.4 Finding TSs with the Nudged Elastic Band (NEB) Method
The Nudged Elastic Band (NEB) method is used to find a minimum energy path (MEP) connecting
given reactant and product state minima on the energy surface. An initial path is generated and
represented by a discrete set of configurations of the atoms, referred to as images of the system. The
number of images is specified by the user and has to be large enough to obtain sufficient resolution
of the path.
The most common use of the NEB method is to find the highest energy saddle point on the
potential energy surface specifying the transition state for a given initial and final state. Rigorous
convergence to a first order saddle point can be obtained with the climbing image NEB (CI-NEB),
where the highest energy image is pushed uphill in energy along the tangent to the path while relaxing
downhill in orthogonal directions. Another method for finding a first order saddle point is the NEB-
TS which uses the CI-NEB method with a loose tolerance to begin with and then switches over to
the OptTS method to converge on the saddle point. This combination can be a good choice for
calculations of complex reactions where the ScanTS method fails or where 2D relaxed surface scans
are necessary to find a good initial guess structure for the OptTS method. The zoomNEB variants
are a good choice in case of very complex transition states with long tails. Here, we present a simple
example for the NEB-TS method, but the ORCA manual can be consulted for more examples and
for other CI-NEB and zoomNEB variants.
Suppose you wanted to predict the transition state (TS) for the hydrolysis of methyl-acetate into
acetic acid and methanol:
In ORCA a black box method NEB-TS (from NEB with TS optimization), can find the TS
structure only from the geometries of the reactants and products. So the first step is to optimize the
reactants and products.
It is very important to notice that the atom numbering in both reactant and product geometries
must be the same. Which means that a given carbon on the reactant, should have the same order
in the xyz table in the product. The best way to guarantee this is to start from one geometry,
make a copy and move the atoms to draw the next structure. For this end, note that when using
Avogadro, you have to disable the Adjust Hydrogen feature, otherwise atoms will be automatically
placed when a bond is broken, and not necessarily where you want them to be.
First, for the reactants, after building initial guess structures and performing a geometry opti-
mization using:
95
one gets, for instance, the following geometry of the reactants:
with the nucleophilic water making a hydrogen bond with the carbonyl group. Rearranging the
atoms to make a guess product structure, and after optimizing with the same method, one gets:
Note that the idea is not to optimize all molecules separately and just join them in a single file,
but rather already build your “reactant” and “product” adducts in a certain orientation that makes
sense with respect to the expected reaction, like done in the example above. We are ready now to
find the TS.
The NEB-TS input is quite simple: NEB-TS in the main input should be added, provide the name
of the product xyz file on the %neb block, while provide the reactant xyz file in the geometry block:
In case you have a guess structure for the TS as well, that can be included as:
Note that the reactants and products will not be optimized during the default run, so make sure
you are using the same method you chose for the previous optimization. If you want to reoptimize
them, set PREOPT ENDS TRUE under %neb.
After running and the calculation starts, the first output looks like:
96
which contains some details of the method. Then, comes the construction of a initial path:
The idea here is to use a method called IDPP to create a series of images, from the reactant
to products, that will be optimized together using the NEB formalism. The initial path is saved in
basename initial path trj.xyz, and it is recommended to check if this path makes sense. In our
case, we get a sequence of eight “images” (you can watch the video using Avogadro1 ). If everything
goes well, the trajectory file should be reasonable, with the proton being transferred from the water
to the methanol as it leaves. After that, the iterative process starts:
This is our first guess to the TS final structure. After some iterations, the NEB-CI converges:
1
Trajectory files are simply multi-xyz files containing several structures one after the other. Any “trajectory”, or
*trj* file can be watched as a video using Avogadro or any other visualization program. In Avogadro, after opening
the program, open the *.trj.xyz file, then Extensions > Animations > Load File and load the trajectory file
*.trj.xyz again (it should be the same file as that opened in the first step, otherwise it will produce an error). When
the loading is done, click on the “Play” bottom.
97
A TS opt starts, a guess Hessian matrix is built using data from the previous NEB, and a saddle
point search is initiated. When the TS is optimized, a summary of the whole NEB-TS is given:
and the frequencies of the TS are computed, to verify if that has only one negative frequency:
proving that the optimized TS structure is indeed a saddle point. The final trajectory from the
reactants to products can be read from basename MEP trj.xyz, and the final TS geometry, saved in
basename NEB-TS converged.xyz looks like:
98
One can see that the TS has a tetrahedral carbon atom, as expected from the experimental
mechanism, and there is a concerted transfer of proton to the leaving methoxy group. Note that
this calculation was run in gas phase to simplify the explanation. If one includes the solvent, the
mechanism will be more complex and the proton transfer can occur, possibly, through an external
water molecule.
That was because a local minimum was detected along the reaction path. Actually, there could
be even more than one. The energies following the reaction minimum energy path are printed in the
”interp” files and can be plotted to better visualize the possible intermediates:
Here, one clearly sees that there are two intermediates and one should optimize thess structures
as well in order to predict reaction energies correctly, and maybe rerun the NEB-TS with these new
rectant and product guesses.
The whole example of NEB-TS of this reaction, and other interesting examples on chemical
reactivity, such as calculating accurate energy barriers, kinectic Isotope effects, and plotting Fukui
functions, are found in the interesting site ORCA tutorials.
99
Experiment 13
Excited Electronic States
13.1 Introduction
ORCA can calculate excited electronic states using many different theoretical models. They all are
capable of generating absorption and CD spectra at the various levels of theory. Only TD-DFT/CIS
features analytic gradients.
The CIS module is essentially the same as the TDDFT module. Main difference being that a CIS
calculation will use a restricted/unrestricted Hartree-Fock reference instead of a DFT reference.
Note that TD-HF (also called random phase approximation, RPA) is not available. A typical CIS
calculation asking for the lowest 20 excited electronic states is (note that the geometry block is not
shown in the following examples. Try all the following examples on water molecule):
CIS with doubles correction, CIS(D), can also be performed that can significantly improve the
results of a CIS calculation with added computational cost (comparable to RI-MP2). CIS(D) requires
a special auxiliary basis set for correlation (“X/C” basis, e.g. def2-SVP/C). A typical CIS(D)
calculation asking for the lowest 10 excited electronic states is
100
The 4 algorithms in dcorr n are
1. algorithm 1: Is perhaps the best for small systems. May use a lot of disk space
3. algorithm 3: Is good if the system is large and only a few states are calculated. Saves disk and
main memory.
4. algorithm 4: Uses only transformed RI integrals. May be the fastest for large systems and a
larger number of states
Again, CIS(D) calculations in ORCA are only implemented together with the RI approximation
and therefore you need to supply an appropriate (“/C”) fitting basis. However, CIS(D) can be
performed with TDA approximation (default) or without TDA if more accurate results are desired
(see below).
Spin-component scaling (SCS) versions of CIS(D) can be evoked in the %cis block by set-
ting DOSCS TRUE and the four scaling parameters in the following order: same-spin indirect term
(CTss), opposite-spin indirect term(CTos), same-spin direct term(CUss), and opposite-spin direct
term(CUos). Note that this implementation only works for the version with the parameter λ = 1.
The example below shows how to apply the SCS-CIS(D) version with λ = 1 whose usage has been
advocated in the literature. The user is able to specify other scaling parameters.
By setting the SS parameters to zero, one reduces the SCS- to the SOS-CIS(D) approach, which
can benefit from a better formal scaling behavior:
An excited state geometry optimization can also be performed as analytical gradients are available
for CIS calculations. Using the IRoot keyword you select which excited state you want to optimize
the geometry for.
101
TD-DFT
TD-DFT is a practical, nice and common approach to computing excited states in general:
Among the various approximate correlation methods available for excited states, one of the most
popular one is algebraic diagrammatic construction(ADC) method. The ADC has it origin in the
Green’s function theory. It expands the energy and wave-function in perturbation order and can
directly calculate the excitation energy, ionization potential and electron affinity, similar to that in
the EOM-CCSD method. Because of the symmetric eigenvalue problem in ADC, the calculation of
properties are more straight forward to calculate than EOM-CCSD. In ORCA, only the second-order
approximation to ADC (ADC2) is implemented. It scales as O(N 5 ) power of the basis set.
The simplest way to perform an ADC2 calculation is via the usage of the ADC2 keyword, together
with the specification of the desired number of roots in the mdci block:
The integral transformation in the ADC2 implementation of ORCA is done using the density-
fitting (RI) approximation. Therefore, one need to specify an auxiliary basis.
102
EOM-CC calculations
Equations of motion coupled cluster (EOM-CC) supports both singles and double excitations in
ORCA. The current version will only work with a closed-shell reference. After the convergence of the
ground-state coupled cluster calculation (i.e. standard CCSD), the EOM-CCSD routine is activated
and the EOM equations are solved. An EOM-CCSD calculation can be as easy as the input below:
Finally, the DLPNO-STEOM-CCSD method uses the full potential of DLPNO to reduce the
computational scaling while keeping the accuracy of STEOM-CCSD.
13.3 Exercises
Exercise 1
Repeat Exercise 1 in section §9.3 using ORCA with the following levels of theory: CIS/def2-SVP,
CIS(D)/def2-SVP, SCS-CIS(D)/def2-SVP, SOS-CIS(D)/def2-SVP, B2PLYP/def2-SVP, ADC(2)/def2-
SVP and STEOM-DLPNO-CCSD/def2-SVP, starting from geometries optimized at PBE0/def2-SVP.
Calcualte the fluorescence of these two compounds by optimizing the geometry of the lowest
singlet excited state using the CAM-B3LYP/def2-SVP level of theory, then repeat all the above
levels on these ES optimized geometries.
Plotting UV-Vis spectra is possible with Avogadro (from .out: Extensions > Spectra) or using
orca mapspc utility as (on linux terminal type):
103
orca mapspc basename.out ABS -x11000 -x40000 -eV -n10000 -w2.0 -g:
which means, plot an Abssorption UV-Vis spectrum in the range from 11000 cm−1 (909 nm)
to 40000 cm−1 (250 nm), with eV unit in the output (default cm−1 ), 1000 points, 2 cm−1 FWHM
(full-width-half-maximum), and a Gaussian lineshape.
Exercise 2
Repeat Exercise 2 in section §9.3 using ORCA with the following levels of theory: CIS/def2-SVP,
CIS(D)/def2-SVP, B3LYP/def2-SVP, B2PLYP/def2-SVP. Compare with experimental results.
104
Experiment 14
CASSCF and Multireference Methods
14.1 Introduction
CI, PT, and CC are types of dynamic correlation (see box 4.2 Electron Correlation Methods) due
to the instantaneous correlations between motions of electrons. In some situations, the single-Slater-
determinant HF wavefunction [or the few-determinant configuration state functions (CSFs) required
for some open-shell states] is a poor representation of the system’s state, thereby making EHF de-
viate very considerably from the true nonrelativistic E, and making total Ecorr substantial. This
contribution to Ecorr is called static (or nondynamic) correlation.
For example, the single-determinant wavefunction for H2 has equal probability for ionic terms
with both electrons close to one atom and covalent terms with one electron on each atom. Since
H2 dissociates to two H atoms and not to ions, the single-determinant wave function is qualitatively
wrong at large internuclear distances. At large internuclear distances in H2 , the static correlation
becomes substantial. In order to have a qualitatively correct wave function at large R, one must take
a linear combination of the 1σg2 Slater determinant and the 1σu2 Slater determinant; the contribution
of the 1σu2 determinant increases as R increases. At large internuclear distances, the energy difference
between the highest occupied MO (HOMO) 1σg in H2 and the lowest unoccupied MO (LUMO) 1σu
becomes small and goes to zero as R → ∞. This near degeneracy of occupied and virtual orbitals is
characteristic of systems with substantial static correlation. For example, for the ground state of the
Be atom, the HF wave function is a single determinant corresponding to the electron configuration
1s2 2s2 . However, in Be the 2p AO is nearly degenerate with the 2s AO, and the static correlation is
substantial. To have a qualitatively correct wave function, one must include a contribution from the
singlet CSF with orbital occupancy 1s2 2p2 .
Other situations where static correlation is important include molecules with double or triple
bonds, transition-metal compounds, and many of excited electronic states. To deal with static
correlation, one replaces the single determinant (or single CSF) used in the HF method with a
linear combination of the CSFs. This gives the multiconfiguration-SCF (MCSCF) wavefunction,
implemented in the so called complete active space SCF (CASSCF) method (see next box).
The CASSCF wave function corresponds to a FCI1 wave function in the active space, while the
occupied and virtual orbitals are optimized through an SCF-like procedure.
1
For any given one-particle basis set, the most accurate treatment within the algebraic approximation method
is termed Full CI (FCI), which implies solving the eigenvalues problem using all the CSFs (or SDs) that can be
constructed using the basis set at hand. For any but the very smallest basis sets, this is impossible.
105
Theoretical Background: Multi-Configuration Wavefunction Methods
In HF theory, we describe the wavefunction with The factorial increase in the number of CSFs effec-
a single Slater determinant (SD) or configura- tively limits the active space for CASSCF wave-
tion state function (CSF) Φ, each CSF is a functions to fewer than 12-14 electrons/orbitals.
spin-adapted linear combination of determinants. Selecting the “important” orbitals to correlate
Multiconfigurational (MC) wavefunctions, on the therefore becomes very important. Selecting the
other hand, are constructed as a linear combina- active space for an MCSCF calculation requires
tion of several determinants, or CSFs. A generic some insight into the problem. There are a few
multideterminant trial wavefunction is written as rules of thumb that may be of help:
1. For each occupied orbital, there will typ-
X
Ψ = a0 ΦHF + ai Φi (14.1)
i=1 ically be one corresponding virtual orbital.
A progression from AO (basis functions), to MOs, This leads to (n,m)-CASSCF wavefunctions
to SDs and to a MC wavefunction is illustrated: where n and m are identical or nearly so.
2. Including all the valence orbitals, leads to
χ −→ ϕ −→ Φ −→ Ψ a wavefunction that can correctly describe
AO −→ MO −→ SD −→ MC all dissociation pathways. However, a full
X X valence CASSCF wavefunction rapidly be-
ϕ= cα χα Ψ= ai Φi
comes very huge for large-sized systems.
α=1 i=0
3. The orbital energies from an RHF calcula-
In a MCSCF calculation, the set of coefficients of tion may be used for selecting the important
both the determinants and the basis functions in orbitals. The highest occupied and low-
the MOs are varied to obtain the total electronic est unoccupied are usually the most impor-
wavefunction with the lowest possible energy. tant orbitals to include in the active space:
The major problem with MCSCF methods is se- the smaller the orbital energy difference,
lecting which configurations are necessary to in- the larger contribution to the correlation
clude for the property of interest. One of the most energy. The occupied and virtual orbitals
popular approaches is the Complete Active Space should occupy the same spatial region.
Self-Consistent Field (CASSCF) method. Here Multireference (MR) Methods: Adds dy-
the selection of configurations is done by partition- namic correlation over a MCSCF (CASSCF)
ing the MOs into active and inactive spaces. The wavefunction. Thus, both dynamical and non-
active MOs will typically be some of the highest dynamical correlation energies are recovered with
occupied and some of the lowest unoccupied MOs multi-reference methods. Three methods are pos-
from an RHF calculation. The inactive MOs have sible: a MR method that uses variational CI:
either 2 or 0 electrons (always either doubly occu- MRCI; MR methods that use perturbational The-
pied or empty). Overall, there are three classes: ory: MRPT; and MR methods that use coupled
1. Occupied orbitals, that are doubly occu- cluster methods: MRCC. In MRCI, a CI calcu-
pied in all the reference determinants; lation is performed with an MCSCF wavefunc-
2. Active orbitals, have a variable occupa- tion as the reference function instead of a sin-
tion number in the reference determinants; gle reference CSF as in CI method. The MRCI
3. Virtual orbitals, that are unoccupied in method has been shown to reproduce near FCI re-
all reference determinants. sults for a wide range of spectroscopic problems.
A common notation is (n,m)-CASSCF, which in- As with single-reference CI, most MRCI calcula-
dicates that active n electrons are distributed in tions truncate the CI expansion to include only
all possible ways in m active orbitals. singles and doubles (MRCISD). In multireference
Which MOs to include in the active space must be perturbation theory (MRPT), the generalization
decided manually, by considering the problem at of MPn theory to the multireference case involves
hand and the computational expense. If several using an MCSCF wave function for Ψ0 instead of a
points on the potential energy surface are desired, single-determinant RHF or UHF one. Implemen-
the MCSCF active space should include all those tations of MP2 type expansions based on CASSCF
orbitals that change significantly, or for which the gives CASPT2. Another MRPT method is the n-
electron correlation is expected to change. electron valence perturbation theory (NEVPT).
106
Chemistry Background: Guidelines of Active Space Selection
A major concern in multiconfiguration method is to decide on how many and what active orbitals
to have in each symmetry. The following guidelines are proposed by Björn Roosas a set of general
advices on what could be called prime candidates for active orbitals:
General Remarques
These rules are not comprehensive but can often serve as a guide. More serious is that they often cannot
be followed, because the resulting number of active orbitals would make the calculation impossible in
practice. Except for small cases, and we need to cut down the number. A common choice is to more or
less explicitly order the orbitals by some perceived importance, keeping the most needed ones. These
are the ones that would have fractional occupation, due to quasidegeneracy. However, one should not
always use as many active orbitals as is affordable, on the principle that more is better.
107
The input to the computer code is, not merely the numbers like (10,10) but also the starting orbitals
must be regarded an important part of the input. As in many cases, the shape and localization in
space of the orbitals can be used as a criteria for the selection, a graphical inspection of orbitals can
be useful, and it makes possible to use the chemical intuition. Using a large basis set from the start
merely introduces unnecessary details in the shape of occupied orbitals and fills the lower part of the
virtual space with a large number of low-lying delocalized orbitals. A better approach is to start from
minimal basis calculations, where the calculation starts from a virtual space built from only the atomic
valence shells. The selected active orbitals are then used (by projection) to create orbitals with the
larger basis. Finally, natural orbitals, or approximate natural orbitals, from a correlated calculation
provide good starting points for CASSCF.
To conclude, the following general advice should always be observed:
1. First, try to understand the system, to know what to expect. Try a smaller system to check
your assumptions, read the literature.
2. Start exploring with a smaller basis set. This not only makes the calculations faster but also
makes the orbitals visually better defined.
3. Use an orbital viewer to pick out the starting orbitals.
4. If in doubt, run exploratory calculations with larger active space.
5. Ideally, orbitals with natural occupation number in the range of 0.02–1.98 should be active.
Note that the occupation numbers change with geometry or during reaction. The ideal active
space should contain enough orbitals to stay continuous during reaction.
Using RHF orbital energies for selecting the active space may be problematic in two situations. The
first is when extended basis sets are used, where there will be many virtual orbitals with low energies
and the exact order is more or less accidental. The second problem is more fundamental. If the
real wave function has significant multiconfigurational character, then RHF may be qualitatively
wrong, and selecting the active orbitals based on a qualitatively wrong wave function may lead to
erroneous results. The problem is that we wish to include the important orbitals for describing the
multideterminant nature, but these are not known until the final wave function is known.
An attempt to overcome this self-referencing problem is to use the concept of natural orbitals. The
natural orbitals are those that diagonalize the density matrix and the eigenvalues are the occupation
numbers. Orbitals with occupation numbers significantly different from 0 or 2 (for a closed-shell
system) are usually those that are the most important to include in the active space. An RHF wave
function will have occupation numbers of exactly 0 or 2, and some electron correlation must be included
to obtain orbitals with non-integer occupation numbers. This may, for example, be done by running
a preliminary MP2 or CISD calculation prior to the MCSCF. The procedure may still fail. If the
underlying RHF wave function is poor, the MP2 correction may also give poor results, and selecting
the active MCSCF orbitals based on the MP2 occupation number may again lead to erroneous results.
In practice, however, selecting active orbitals based on, for example, MP2 occupation numbers appears
to be quite efficient, and better than using RHF orbital energies.
Finally, in order to predict the reliability of using a single-reference based method/the need of using
a multi-reference method, many different diagnostic tests can be carried out by performing CCSD,
CCSD(T), and CASSCF computations. One of the most known methods is known as the T1 diagnostic
test. The T1 diagnostic test of Lee and Taylor is based on the norm of the vector of single-excitation
amplitudes from CCSD in a closed shell system. The T1 diagnostic value is printed in all CCSD or
CCSDT output files. If the T1 value is smaller than 0.02, the system is considered to be dominated
by a single reference, but if it is larger than 0.02, the system is considered to have (most likely)
multireference character.
108
14.2 Procedure & Examples
Minimal input: The basic input file requires the number of n active electrons and m active orbitals,
denoted by CAS(n, m). Here’s an input file example for the case of CAS(4,3):
! RHF def2-SVP
%casscf
nel 4
norb 3
mult 1
end
The nel keyword in the %casscf block is used to specify the number of active electrons, while norb
specifies the number of active orbitals. The multiplicity is given by the mult keyword. Additional
roots have to be included for excited state calculations (see below).
Sometimes a CAS-CI calculation is wanted where the orbitals are not optimized. This is performed
by adding the NoIter keyword:
! RHF def2-SVP NoIter
%casscf
...
end
State-average CASSCF wavefunctions: In the example below two multiplicity blocks are
specified and two states are selected for each multiplicity. As a consequence the CASSCF wave-
function calculate 4 states and the orbitals will be averaged for two singlet states and two triplet
states.
! RHF def2-SVP
%casscf
nel 4
norb 3
mult 1,3
nroots 2,2
end
When selecting multiple roots, the weight of each root (i.e. the contribution to the state-
averaging) should be specified. The default is to make equal weights for each root. Users can
define a custom weighting scheme for the multiplicity blocks and roots:
%casscf mult 1,3 # singlet and triplet multiplicities
nroots 4,2 # four singlets, two triplets
bweight 2,1 # singlets and triplets weighted 2:1
weights[0] = 0.5,0.2,0.2,0.1 # singlet weights
weights[1] = 0.7,0.3 # triplet weights
end
In ORCA, state-specific optimization are realized adjusting the weights for the desired state
(root):
109
%casscf mult 1
nel 4
norb 3
nroots 3
weights[0] = 0,0,1 # weights for the roots in the first mult block (singlet)
end
this will optimize a state-specific state which is the second singlet excited state or root. This can
be repeated for other states (roots) of the same or different multiplicities.
Note that state-specific orbital optimization are challenging to converge and often prone to root-
flipping. If several states cross during the orbital optimization this will ultimately cause convergence
problems.
Symmetry: If the symmetry handling in ORCA is enabled (! UseSym) each multiplicity block
must have an irreducible representation assigned. Numbers corresponding to the “irrep” within a
given symmetry are printed in the output of ORCA.
%casscf mult 1,3 # multiplicities singlet and triplet
irrep 0,1 # irrep for each multiplicity block (mandatory!)
...
nroots 4,2 # four singlets, two triplets
end
ORCA detects the point group and produces symmetry-adapted orbitals for SCF/CASSCF, but
the calculation time will not be reduced. Only D2h and subgroups are supported.
In general, for most calculations the default PMODEL guess will be adequate. In some special situations
you may want to switch to a different choice. One of those cases are CASSCF calculations.
1. PATOM: Selects the polarized atoms guess: suitable for CASSCF transition metals.
2. PMODEL: Selects the model potential guess: suitable for DFT and HF.
3. HUECKEL: Selects the extended Hückel guess
4. HCORE: Selects the one-electron matrix guess
5. MOREAD: Read MOs from a previous calulation (use %moinp “myorbitals.gbw” in a separate
line to specify the GBW file that contains these MOs to be read)
6. AUTOSTART: Try to start from the existing GBW file of the same name as the present one
1. canonical orbitals HF orbitals from a RHF calculation are not good choice, as the identification
and selection of the active space orbitals is often difficult.
110
2. Usually DFT orbitals (quasi-restricted or RKS) perform better than HF orbitals (see below).
3. Alternatively, if CASSCF orbitals from a previous run or a close-by geometry are available this
is a good choice.
4. In many instances, e.g. transition metal complexes, the PATOM guess produces more reliable
start orbitals than the PMODEL guess.
5. For more challenging complexes, the guess generated with orca mergefrag (see the CASSCF
tutorial of ORCA), is probably the best choice.
6. Natural orbitals from a simple correlation calculation like MP2 or a calculation with the MRCI
module are usually a good choice and easily generated. (see below):
QROs as starting orbitals for CASSCF: The quasi-restricted orbitals (QROs) from a DFT
calculation as starting orbitals are not the only choice but are convenient starting orbitals and
typically lead to fast convergence. ROHF orbitals, localized DFT orbitals, MP2 natural orbitals,
previous CASSCF-optimized natural orbitals etc. are other choices (sometimes better).
This procedure requires 4 different ORCA job steps. While some of these steps can be combined
by $new job keywords, the orbitals need to be carefully looked at and selected before starting the
CASSCF calculation.
1. Creating initial QRO orbitals from a DFT calculation: Do a single-point DFT cal-
culation on the molecule of interest, ask ORCA to create quasi-restricted orbitals (QRO) using
the UNO keyword [UNO creates both unrestricted natural orbitals (UNOs) and quasi-restricted
orbitals (QROs)].
job1.inp:
! BP86 def2-SVP def2/J UNO Normalprint
After job completion, the QRO orbitals are found in the file job1.qro.
2. Inspect QRO orbitals: Take a look at the QRO orbitals, see their composition or
visualize surface plots. Keyword NoIter is used to prevent the orbitals from being changed by
SCF iterations. Then, one can use orca plot to create Cube Files from *.gbw for visualization.
job2.inp:
! BP86 def2-SVP def2/J Normalprint NoIter MOREAD
%moinp "job1.qro"
3. Rotate QRO orbitals if required: Now, one can use orca plot to create Cube Files
from job2.gbw for visualization. The active occupied orbitals for the CAS space must be
ordered so that they appear last in the list of occupied orbitals and unoccupied orbitals should
come after them. If it is not the case, rotation is needed. In the example below the position of
orbitals 10 and 20 are changed (rotated). (orbital 10 is an orbital that we want in the active
space).
job3.inp:
! BP86 def2-SVP def2/J Normalprint NoIter MOREAD
%moinp "job1.qro"
%scf rotate {10,20,90} end end
111
4. Running the CASSCF calculation using rotated DFT-QRO orbitals: Now a
CASSCF calculation can be performed. It should be noted that the basis set used for the
CASSCF calculation should ideally be the same as that used in the DFT calculation. Otherwise,
ORCA will do a basis set projection step when reading in the DFT-QRO orbitals and the
orbitals may end up changing.
job4.inp:
! def2-SVP def2-SVP/C Normalprint MOREAD
%moinp "job3.gbw"
%casscf
...
end
Natural orbitals as starting orbitals for CASSCF: Natural orbitals from MP2 are usually
a good starting orbitals choice and easily generated. For example, the case of formaldehyde:
112
MACRO-ITERATION 10:
...
N(occ)= 1.99763 1.99696 1.98360 1.97923 1.94253 0.05958 0.02153 0.01894
...
From which we see that we had two orbitals with occupation numbers very close to two. The
presence of barely correlated orbitals (occupation close to 0.0 or 2.0) can cause convergence problems.
If CAS(6,6) was used instead of CAS (10,8), the result will be:
MACRO-ITERATION 8:
...
N(occ)= 1.98134 1.97931 1.94184 0.05868 0.02101 0.01781
...
The convergence is faster (8 iterations) and the occupation numbers show that all of these orbitals
are actually needed in the active space. The omission of the two orbitals from the active space came
at an increase of the energy by ∼ 4 mEh which is almost negligible.
This CAS(6,6) has correlated the in-plane oxygen lone pair, the C–O σ and the C–O π bonds.
For each strongly occupied bonding orbital, there is an accompanying weakly occupied antibonding
orbital in the active space that is characterized by one more node.
Convergence Problems:
The rate of convergence is higher with this method, but the radius of convergence is smaller.
ORCA can use two different convergers specified with orbstep and switchstep:
1. Far off from convergence orbstep is used. The SuperCI is good choice for large initial gradients.
2. ORCA changes the converger to switchstep when the calculation is close to convergence
(||g|| < 0.02)2 .
The NR method is a safe pick for re-converging calculations that have already been converged with
a slightly different active space or basis set.
we may try the Newton-Raphson method (keyword “switchstep nr”) to obtain convergence here.
The rate of convergence is higher with this method, but the radius of convergence is smaller.
14.2.3 NEVPT2
NEVPT2 is an internally contracted multireference perturbation theory, which applies the CASSCF
type wavefunctions. The NEVPT2 method, as described in the original papers of Angeli et al, comes
in two flavors:
113
2. the partially contracted NEVPT2 (PC-NEVPT2). In fact, the PC-NEVPT2 employs a fully
internally contracted wavefunction and should more appropriately called FIC-NEVPT2.
In conjunction with the RI approximation, systems with active space up to 16 active orbitals
and 2000 basis functions can be computed. With the newly developed DLPNO version of the FIC-
NEVPT2, the size of the molecules does not matter anymore.
As a simple example for NEVPT2, consider the ground state of the nitrogen molecule N2 . After
defining the computational details of our CASSCF calculation, we insert !SC-NEVPT2 as simple
input or specify PTMethod SC NEVPT2 in the %casscf block. Note the difference in the two keywords
spelling: Simple input uses hyphen, block input uses underscore for technical reasons.
!def2-SVP nofrozencore PAtom
%casscf nel 6
norb 6
mult 1
PTMethod SC NEVPT2 # SC NEVPT2 for strongly contracted NEVPT2 (default)
# FIC NEVPT2 for the fully internally contracted NEVPT2
# DLPNO NEVPT2 for the FIC-NEVPT2 with DLPNO
# DLPNO requires: trafostep RI and an aux basis
end
* xyz 0 1
N 0.0 0.0 0.0
N 0.0 0.0 1.09768
*
For better control of the program flow it is advised to split the calculation into two parts. First
converge the CASSCF wave function and then in a second step read the converged orbitals and
execute the actual NEVPT2. Furthermore, starting ORCA 4.0, NEVPT2 calculations employ the
frozen core approximation by default. Results from previous versions can be obtained with the added
keyword !NoFrozenCore.
An interesting case is to dissociate the N-N bond of the N2 molecule correctly. Using CASSCF
with the six p-orbitals we get a nice potential energy curve (The depth of the minimum is still too
shallow compared to experiment by some 1 eV or so. A good dissociation energy requires a dynamic
correlation treatment on top of CASSCF and a larger basis set, e.g. NEVPT2). Inserting PTMethod
SC NEVPT2 into the %casscf block, we obtain the NEVPT2 correction as additional information.
! def2-svp nofrozencore
%casscf nel 6
norb 6
mult 1
PTMethod SC NEVPT2
end
# scanning from the outside to the inside
%paras
R = 2.5, 0.7, 30
end
* xyz 0 1
N 0.0 0.0 0.0
N 0.0 0.0 {R}
*
114
In general, scanning from the outside to the inside is the recommended procedure.
All of the options available in CASSCF can in principle be applied to NEVPT2. Since NEVPT2
is implemented as a submodule of CASSCF, it will inherit all settings from CASSCF (!tightscf,
!UseSym, !RIJCOSX, . . . ).
1. In CASSCF/NEVPT2, the IROOT flag has a different meaning from all other modules. In this
case, the ground state is the IROOT 1, the first excited state is IROOT 2 and so on. If your
are using a state-averaged calculation with more than one multiplicity, you need also to set an
IMULT to define the right block, IMULT 1 being the first block, IMULT 2 the second and etc.
115
Experiment 15
Molecular Mechanics: Proteins & Large Molecules
15.1 Introduction
Typically, ab-initio and DFT methods can deal with molecular systems up to a few hundred atoms.
Because of the many approximations made, semiempirical (SE) methods are much faster than ab-
initio and DFT and can deal with molecules up to a few thousand atoms. However, all these quantum
chemical methods cannot treat very large molecular systems such as polymers, proteins, and DNA.
Numerous force fields (FF) have been developed for MM, the most common ones for studies of
proteins being AMBER and CHARMM. Popular FFs for smaller organic molecules are GAFF and
Allinger’s MMx set. In materials science, DREIDING and MOF-FF are widely used. Limitations of
those “special-purpose” FFs are manifold, as they are not suited for general use, given the fact that
parameters only exist for a limited amount of elements and structural motifs. General purpose FF
covering a full periodic-table parameterization includes universal force field (UFF) and GFN-FF.
The general procedure for finding parameter values in a FF is to use experimental or ab-initio
information to choose an initial set of parameters, then vary these parameters so as to minimize
the deviations of force-field predicted molecular properties from experimental or ab initio calculated
properties of a chosen set of molecules. An empirical FF may contain hundreds or thousands of pa-
rameters. MMFF94 has roughly the following numbers of parameters: 500 stretching force constants,
500 reference bond lengths, 2300 bending force constants, 2300 reference bond angles, 600 stretch–
bend constants, 100 out-of-plane force constants, 1400 torsion parameters, 400 van der Waals and
600 electrostatic charge-increment, for a total of about 9000 parameters. In contrast, UFF, which
has 126 atom types, contains only about 800 parameters. Because of the relatively small number of
parameters, UFF cannot achieve the accuracy of highly parameterized FF such as MMFF94, but it
has the advantage of being broadly applicable.
116
Theoretical Background: Molecular Mechanics Method
Molecular-mechanics (MM) uses a model of a Cross Term: Vcross contains cross terms for inter-
molecule as composed of atoms held together by actions between stretching, bending, and torsion.
bonds and constructs a potential-energy expres- Electrostatic Interactions: Ves allows for elec-
sion that is a function of the atomic positions. trostatic Interactions between nonbonded atoms:
MM potential energy V is the sum of energies X Qi Qj
due to bond stretching (Vstr ), bond-angle bending Ves = Ves,ij where Ves,ij =
4πε0 Rij
(Vbend ), out-of-plane bending (Voop ), internal rota- 1,≥4
tion (torsion) about bonds (Vtors ), interactions be- where the sum goes over all 1,4, 1,5, . . . atom pairs.
tween these kinds of motion [cross terms (Vcross )], Rij is the distance between atoms i and j, and Qi
and van der Waals (VvdW ) and electrostatic (Ves ) and Qj are the (partial) charges on atoms i and
interactions between nonbonded atoms: j in the molecule. A force field might use Qi val-
V = Vstr + Vbend + Voop + Vtors + Vcross + VvdW + Ves ues based on the atom type and what atoms are
bonded to atom i, or it might take Qi values found
The explicit expressions used for each of the terms on similar atoms in ab initio calculations.
in this equation and the values for all the param- VdWaals Interactions: VvdW = 1,≥4 VvdW,ij
P
eters that occur in these formulas define what is is the contribution of vdW nonbonded interactions
called a molecular-mechanics force field FF. involving all possible 1,4, 1,5, . . . atom-pairs:
Stretching: Vstr is the sum of harmonic-oscillator " #
σIJ 12 σIJ 6
terms for each pair of bonded atoms: VvdW,ij = 4εIJ −
X
0 2 RIJ RIJ
Vstr = Vstr,ij , Vstr,ij = 12 kIJ (lij − lIJ ) (15.1)
1,2 The 1,2 and 1,3 van der Waals and electrostatic in-
Atoms separated by one, two and three bonds are teractions are considered to be implicitly included
called 1,2; 1,3; and 1,4 atoms. The sum in (15.1) in the bond-stretching and bond-bending param-
is over all pairs of 1,2 atoms. The parameter kIJ eters. Each van der Waals pair term VvdW,ij is a
is the force constant for stretching the bond be- Lennard-Jones 6-12 potential. Rij is the distance
tween atoms i and j, lIJ0 is a fixed reference length, between atoms i and j, the well-depth parameter
and lij is the distance between the bonded atoms εIJ is the value of VvdW,ij at the minimum in the
i and j for a particular geometry. The harmonic- interaction curve, and the parameter σIJ is the
oscillator expression in (15.1) is not highly accu- Rij value at which VvdW,ij is zero. For H-bonds,
rate and some force fields use a more accurate ex- some FFs modify the vdW interaction to a form
pression, for instance quadratic, cubic, and quartic such as A/R12 − C/R10 , but many FFs contain
terms, for Vstr,ij . An MM force field classifies each no special terms, and rely on the electrostatic and
atom in the molecule into an atom type, depend- van der Waals terms to produce the H-bond.
ing on its atomic number and on how it is bonded The electrostatic and van der Waals interactions
in the molecule, e.g. sp3 C, aromatic sp2 C, H are called nonbonded interactions and consume
bonded to C, etc. In (15.1), I and J stand for the the largest part of the time needed to calculate
atom types of atoms i and j in the molecule. The V of a very large molecule. For a 3000-atom
reference length lIJ 0 is close to the typical bond molecule, Ves and VvdW are each the sum of about
1 2 6
length between atoms types I and J. 2 (3000) ≈ 4 × 10 terms. To speed up MM cal-
Bending: Vbend in the simplest FFs has the form: culations on large molecules (and molecular dy-
X
0
Vbend = Vbend,ijk , Vbend,ijk = 12 kIJK (θijk −θIJK )2 namics calculations on systems containing many
1,3
molecules), many programs use a cutoff ; that is,
The sum is over all bond angles; θijk is the ijk Ves,ij and VvdW,ij terms are omitted for atom pairs
bond angle, θIJK0 is the reference value for the that are farther apart than some chosen distance.
bond angle type IJK, If a cutoff is abruptly applied at a particular in-
P and kIJK is a parameter. teratomic distance, this discontinuity can cause
Torsion: Vtors = 1,4 Vtors,ijkl , where the sum is
over all 1,4 atom pairs and Vtors,ijkl is a function of problems in energy minimization and molecular-
the dihedral angle ϕ defined by atoms ijkl (V1 , V2 , dynamics calculations. To avoid this, one can use
V3 are parameters depend on atom types i, j, k, l): a cutoff that makes the nonbonded interactions go
to zero gradually over a distance of, say, 1 Å. This
Vijkl = 21 [V1 (1+cos ϕ)+V2 (1−cos 2ϕ)+V3 (1+cos 3ϕ)] is done using the so called switching function.
117
Chemistry Background: Protein Structures
Primary structure is the sequence of amino Certain amino acids are more or less likely to be
acids in a polypeptide chain. For example, the found in α-helices or β sheets. For instance, pro-
hormone insulin has two polypeptide chains, A line is sometimes called a “helix breaker” because
and B, as shown below. Each chain has its own its unusual R group creates a bend in the chain
set of amino acids, assembled in a particular order. and is not compatible with helix formation. Pro-
The sequence of the A chain starts with Gly at the line is typically found in bends regions between
N-terminus and ends with Asn at the C-terminus. secondary structures. Similarly, amino acids such
as tryptophan, tyrosine, and phenylalanine, which
have large ring structures in their R groups, are
often found in β sheets, since they provide plenty
of space for the side chains. Many proteins con-
tain both α-helices and β sheets, some contain one
type, some other do not form either type.
The overall three-dimensional structure of a
polypeptide is called its tertiary structure. The
tertiary structure is primarily due to interactions
Changing one amino acid in a protein’s sequence between the R groups of the amino acids includ-
can affect its overall structure and function. The ing H-bonding, ionic bonding, dipole-dipole inter-
difference between a normal and a sickle cell actions, dispersion forces, and hydrophobic inter-
hemoglobin is just 2 amino acids out of ≈ 600. actions in which amino acids with nonpolar hy-
Secondary structure (α helix and β pleated drophobic R groups cluster together on the inside
sheet) refers to local folded structures that form of the protein, leaving hydrophilic amino acids on
within a polypeptide due to interactions between the outside to interact with surrounding water.
atoms of the backbone only. In an α helix, the car- The strong disulfide covalent bonds (between the
bonyl of one amino acid is H-bonded to the amino S atoms in cysteines) also contribute to tertiary
H of an amino acid that is four down the chain. structure. They act like molecular “safety pins”.
This pattern pulls the polypeptide chain into a
helical structure that resembles a curled ribbon,
with each turn of the helix containing 3.6 amino
acids. In a β pleated sheet, two or more segments
of a polypeptide chain line up next to each other,
forming a sheet-like structure held together by H-
bonds, while the R groups extend out of the plane
of the sheet. The strands of a β sheet may be par-
allel, pointing in the same direction (their N- and
C-termini match up), or antiparallel, pointing in
opposite directions (the N-terminus of one strand
is positioned next to the C-terminus of the other).
Many proteins are made up of a single polypep-
tide chain and have only three levels of structure.
However, some proteins are made up of multiple
polypeptide chains, also known as subunits. When
these subunits come together, they give the pro-
tein its quaternary structure. An example is
hemoglobin which is made up of four subunits,
two each of the α and β types. The same types of
interactions that contribute to tertiary structure
(mostly H-bonding and London) also hold the sub-
units together to give quaternary structure.
Changes in temperature, pH, and certain chemi-
cals will disrupt the weak interactions causing the
protein to lose its 3D structure: denaturation.
118
15.2 Procedure & Examples
In this experiment we will explore MM method on a protein called ubiquitin. The structure of this,
and all other published proteins, is found in the protein data bank website https://www.rcsb.org/.
Every protein structure is assigned a 4-character code in PDB. For instance ubiquitin is assigned
1UBQ. You can search the site by this code and download the pdb file that contains all structural
information about the corresponding protein.
For any MM calculation or for the MM part of the QM/MM calculation (see next experiment),
force-field parameters are necessary. ORCA has its own parameter file format (ORCA force field file:
basename.ORCAFF.prms), which includes the atom specific parameters for nonbonded interactions
(partial charges and Lennard-Jones (LJ) coefficients) and parameters for bonded interactions (bonds,
angles, Urey-Bradley terms, dihedrals, impropers, and CMAP cross-terms for backbone).
An ORCA force field file (basename.ORCAFF.prms) can be generated from a PDB (basename.pdb)
file through parameter topology files (basename.prmtop AMBER force field files) by the following
procedure (1UBQ protein as an example):
1. Download the protein pdb file 1ubq.pdb from the pdb website https://www.rcsb.org/.
2. Open the file using chimera1 , add hydrogens to the pdb file: Tools > Structure Editing >
AddH, click OK then File > Save PDB, write a new file name (e.g. 1ubqaddh.pdb), then save.
3. On chimera, the pdb file can be converted to prmtop as the following: Tools > Amber > Write
Prmtop, then write the file name to save (e.g. 1ubqaddh.prmtop), select a force field type (e.g.
AMBER ff99SB), then Save, click on Assign Charges, then OK. Two files will be generated:
Amber parameter-topology (basename.prmtop) and coordinate (basename.inpcrd) files.
4. Convert basename.prmtop to basename.ORCAFF.prms using the convff in the orca mm module:
orca mm -convff -AMBER 1ubqaddh.prmtop
The orca mm module can create a simple approximate FF from xyz or pdb using makeff, for
a molecule and storing it in ORCAFF.prms format (see ORCA manual for details). However, the
resulting topology is approximate and not as accurate as an original CHARMM topology, but can
still be used for an approximate handling of the molecule. In this case, the molecule can be part of the
QM region (having at least the necessary LJ coefficients), or part of the MM region as a non-active
spectator (being not too close to the region of interest). In the latter case it is important that the
molecule is not active, since bonded parameters are not available. However, it can still be optimized
as a rigid body, i.e. optimizing its position and orientation with respect to the specific environment.
The minimum input necessary for a MM treatment (single point energy calculation) looks as:
! MM
%mm
ORCAFFFilename "1ubqaddh.ORCAFF.prms"
end
*pdbfile 0 1 1ubqaddh.pdb
1
You can use any suitable molecular editor/visualizer of your preference, such as UCSF Chimera, VMD, Pymol,
etc. See next box for getting started with chimera
119
Technical Background: Getting Started With Chimera
120
Geometry Optimization of Large Systems
Geometry optimization can also be performed on very large systems in ORCA using the L-Opt
optimizer (L-BFGS optimizer). For instance:
! MM L-Opt
%mm
ORCAFFFilename "1ubqaddh.ORCAFF.prms"
end
*pdbfile 0 1 1ubqaddh.pdb
With the L-Opt optimizer, systems with hundreds of thousands of atoms can be optimized! Of
course, MM or QM/MM methods should be used for such large systems.
The default maximum number of iterations is 200, and can be increased as follows:
! L-Opt
%geom
maxIter 500 # default 200
end
*pdbfile 0 1 1ubqaddh.pdb
Most protein structures found in PDB site are obtained from x-ray data. Since crystal structures
in general does not reliably resolve the hydrogen positions in, it is sometimes useful to only relax the
hydrogen atom positions. Only the hydrogen positions can be optimized with the following command:
! L-OptH. But also other elements can be exclusively optimized with the following command:
!L-OptH
%geom
OptElement F # Optimize F when L-OptH is invoked. Doesn’t work with reg opt
end
If you want to study systems, which consist of several molecules (e.g. the active site of a protein)
with constraints, then you can either use Cartesian constraints or use ORCA’s fragment constraint
option. ORCA allows the user to define fragments in the system. For each fragment one can then
choose separately whether it should be optimized or constrained. Furthermore, it is possible to choose
fragment pairs whose distance and orientation with respect to each other should be constrained (see
section 8.3.7 in ORCA Manual 5.0.1 for more details).
Fragments can be defined in a !L-Opt job for any system, where each fragment can be optimized
differently. The following options are available:
2. RelaxHFrags: Relax the hydrogen atoms of the specified fragments. Default for all atoms if
L-OptH is defined.
121
3. RelaxFrags: Relax all atoms of the specified fragments. Default for all atoms if L-Opt is
defined.
4. RigidFrags: Treat each specified fragment as a rigid body, but relax the position and orienta-
tion of these rigid bodies.
The fragments have to be defined after the coordinate input (see details of *.pdb files in the next
experiment). An example is depicted in the following:
!MM L-Opt
%mm
ORCAFFFilename "1ubqaddh.ORCAFF.prms"
end
*pdbfile 0 1 1ubqaddh.pdb
%geom
Frags
2 {19:35} end # First the fragments need to be defined
3 {1232:1234} end # Note that all other atoms belong to
4 {1235:1237} end # fragment 1 by default
5 {1238:1240} end
RelaxFrags {2} end # Fragment 2 is fully relaxed
RigidFrags {3 4 5} end # Fragments 3, 4 and 5 are treated as rigid bodies each.
end
122
Experiment 16
Multiscale (QM/MM) Calculations
16.1 Introduction
Molecular mechanics is not suitable for dealing with chemical reactions and many other important
chemical phenomena (H-bonding, excited states, etc.). A combined quantum-mechanical and molecu-
lar mechanics methods (QM/MM) have been proposed and developed. For instance, in the treatment
of an enzyme-catalyzed reaction by a QM/MM method, one uses a quantum-mechanical method to
treat the substrate and the active site of the enzyme and uses MM for the rest of the enzyme.
In ORCA, one can perform different multiscale methods for large systems (see next box):
1. Additive QMMM
2. Subtractive QM1/QM2 methods (2-layered ONIOM)
3. Subtractive QM/MM methods (also 2-layered ONIOM)
4. Subtractive QM1/QM2/MM methods (3-layered ONIOM)
The following basic concepts are important in the multiscale methods in general:
1. QM atoms: The user can define the QM region either directly, or via a pdb file.
2. QM2 atoms: Only applicable for QM/QM2/MM. For the QM/QM2/MM method the low
level QM region (QM2) is defined via the input or via flags in a pdb file. For QM/QM2 the
low level region consists of all atoms but the QM atoms.
3. Active atoms: The user can choose an active region, e.g. for geometry optimizations the
atoms that are optimized, for a frequency calculation the atoms that are allowed to vibrate for
the PHVA, or for an MD run the atoms that are propagated.
4. Forcefield: ORCA has its own forcefield file format (basename.ORCAFF.prms). For a conve-
nient setup the orca mm module offers the option to convert from other forcefield formats. The
following formats can be converted to the ORCA FF format (see ORCA manual for details):
(a) CHARMM psf files: protein structure file from the CHARMM forcefield.
(b) AMBER prmtop files: topology files from the AMBER force field. see this tutorial.
(c) Open Force Field: xml files from the openforcefield initiative. For a tutorial see here.
123
Theoretical Background: Additive & Subtractive Multiscale QM/MM Methods
Combined quantum mechanics and molecular me- QM system needs to be properly truncated. This
chanics (QM/MM) is a popular method to in- can be done by special localized orbitals, but it
vestigate biological macromolecules, homogeneous is more common that the QM system is simply
catalysis and nanostructures. There are two vari- truncated by hydrogen atoms, the hydrogen link-
ants of QM/MM methods, the additive and the atom approach. In the subtractive scheme, MM
subtractive schemes. In a subtractive scheme, parameters for the link atoms are needed
three separate calculations are performed: One The interaction between the QM and MM regions
QM calculation with the QM region (system 1; is typically dominated by electrostatics. This in-
E1QM ) and two MM calculations, one for the en- teraction can also be treated at different levels of
MM ) and one for
tire system (systems 1 and 2; E12 approximation:
the QM region (E1MM ):
1. In mechanical embedding, it is calculated at
sub
EQM/MM = E1QM + E12
MM
− E1MM (16.1) the MM level.
The advantage with this approach is the sim- 2. In electrostatic embedding, the electrostatic
plicity: It automatically ensures that no inter- QM–MM interaction is instead treated at
actions are double-counted and it can be set up the QM level by including a point-charge
for any QM and MM software (provided that model (i.e., atomic partial MM charges) of
they can write out energies and forces), with- system 2 in the QM calculations. Thereby,
out the need of any modification of the code. system 1 is polarized by system 2, but not
Thereby, the QM/MM software is updated every vice versa.
time the underlying QM or MM software is up-
dated. Moreover, it can be easily extended to 3. In polarized embedding, both systems are
more than two computational methods and re- mutually and self-consistently polarized in
gions (e.g. QM/QM2/MM). Typically 2–3 lay- the QM calculations. This requires a po-
ers are used. The typical example of a subtrac- larizable MM force field for system 2 and a
tive scheme is ONIOM (short for “Our own N- QM software that can treat polarizabilities,
layered Integrated molecular Orbital and Molec- which are still rather unusual. Therefore,
ular mechanics”) and its variations IMOMM and such calculations are less common and typ-
SIMOMM. ically restricted to single-point calculations
In the additive scheme, only two calculations are of accurate properties.
performed: the same QM calculation for the QM
MM ): Mechanical embedding is normally considered to
region, but only a single MM calculation (E2−1
be less accurate than electrostatic embedding, and
add QM MM
EQM/MM = E1 + E2−1 (16.2) the latter has therefore been the most widely used
approximation, although it involves polarization
although the latter is often formally divided into of only parts of the system and is more sensitive
two terms, a MM energy of system 2 and a to the treatment of the link atoms. Strictly, Equa-
QM/MM interface energy: tions (16.1) and (16.2) apply only to mechanical
(16.3) embedding, but they can easily be adapted to elec-
MM QM/MM
E2−1 = E2MM + E12
trostatic embedding by including a point-charge
In this case, it is up to the developer to en- model of system 2 in the QM term (EQM1 + ptch2 )
sure that no interactions are omitted or double- and setting the charges of system 1 to zero in the
counted. Therefore, an additive scheme requires a MM calculations.
special MM software, in which the user or devel- Unfortunately, the distinction between the sub-
oper can select which MM terms to include. The tractive and additive schemes in literature is often
advantage of the additive QM/MM scheme is that unclear and confused. In many cases, the subtrac-
no MM parameters for the QM atoms are needed, tive scheme is equated with mechanical embed-
because those energy terms are calculated by QM ding and the additive scheme with electrostatic
only. embedding. In fact, both additive and subtractive
Further differences may arise if the QM region is schemes may use either mechanical or electrostatic
covalently connected to the MM region. Then, the (or even polarized) embedding.
124
ORCA automatically detects QM-MM boundaries, i.e. bonds that have to be cut between QM
and MM region. ORCA automatically generates the link atoms and keeps them at their relative
position throughout the run, even allowing to optimize the bond along the boundary. Treatment
of overpolarization: ORCA also automatically adapts the charges at the QM-MM boundary in
order to properly treat overpolarization.
Before going through the details of the procedure, its important to know more about the structure
of a pdb file (see next technical box).
A PDB is a text file where each line of information is called a record. A PDB file generally contains
several different types of records, arranged in a specific order to describe a structure.
Record Data Provided by Record
Type
ATOM atomic x,y,z coordinates (Å) for atoms in standard residues (amino and nucleic acids).
HETATM atomic x,y,z coordinates (Å) for atoms in nonstandard residues (inhibitors, cofactors,
ions, and solvent including water), that are not connected to other residues.
TER indicates the end of a chain of residues. For example, a hemoglobin molecule consists of
four subunit chains that are not connected. TER indicates the end of a chain and
prevents the display of a connection to the next chain.
HELIX indicates the location and type (right-handed alpha, etc.) of helices. 1 record per helix.
SHEET indicates the location, sense (anti-parallel, etc.) and registration with respect to the
previous strand in the sheet (if any) of each strand in the model. 1 record per strand.
SSBOND defines disulfide bond linkages between cysteine residues.
Each line in a PDB file begins with the record type ATOM. The atom serial number is the next item.
The atom name is the third item in the record. Notice that the first one or two characters of the atom
name consists of the chemical symbol for the atom type (“C” carbon; “N” nitrogen , “O” oxygen).
The next character is the remoteness indicator code (see the Figure below), which is transliterated
according to:
Code α β γ δ ϵ ζ η
Transliteration A B G D E Z H
COOH
C C C C C C H
ε γ α
ζ δ β
NH 2
The next data field is the residue type. Notice that each record contains the residue type, e.g. HIS
(histidine) and SER (serine). The next data field contains the chain identifier, e.g. A. The next data
field contains the residue sequence number. Two like residues may be adjacent to one another, so the
residue number is important for distinguishing between them.
The next three data fields contain the x,y,z coordinates. The last three fields shown are the occupancy,
temperature factor (B-factor), and element symbol. The spacing of the data fields is crucial. If a data
field does not apply, it should be left blank. The TER record terminates the amino acid chain. A more
complicated protein, hemoglobin (3hhb), consists of four amino acid chains, each with an associated
heme group. There are two alpha chains (identifiers A & C) and two beta chains (identifiers B & D).
125
16.2 Procedure & Examples
QM Atoms: QM atoms can be defined by specifying their atom number (e.g. 0 4 7) or as a range
(e.g. 5:10). This can be done either directly through the QMAtoms keyword in the %qmmm block:
! QMMM
%qmmm
ORCAFFFilename "1ubqaddh.ORCAFF.prms"
QMAtoms {315:325} end
end
*pdbfile 0 1 ubq.pdb
If Use QM InfoFromPDB is set to true, a pdb file should be used for the structural input. In the
occupancy column, QM atoms are defined via 1, QM2 atoms via 2, while MM atoms via 0 (see
§16.2.2), as shown in following example (ASP21 of 1UBQ):
Use QM InfoFromPDB keyword needs to be written before the coordinate section. Note also, that
in pdb files, counting starts from 1, while in ORCA counting starts from zero.
The charge of the last example should be set -1. The following amino acids: histidine (HIS),
lysine (LYS), arginine (ARG), aspartic acid (ASP), and glutamic acid (GLU) are charged. If treated
as neutral singlets in any QM region (either QM or QM2), you will obtain the following error:
“multiplicity (1) is odd and number of electrons (185) is odd ”. In fact, HIS, LYS, and ARG exist
in their basic form (NH+ n , where a proton is added), while ASP and GLU exist in their acidic form
(COO− , where a proton is lost). If one or more of these amino acids is or are included in any of the
two QM regions, the correct charge of the system should be considered.
126
Active and Non-Active Atoms: The systems of multiscale calculations can become quite
large with tens and hundreds of thousands of atoms. In multiscale calculations the region of interest
is often only a particular part of the system, and it is common practice to restrict the optimization
to a small part of the system, which we call the active part of the system. Usually this active part
consists of hundreds of atoms, and is defined as the QM region plus a layer around the QM region.
The same definition holds for frequency calculations. MD calculations on systems with hundreds of
thousands of atoms are not problematic, but there are applications where a separation in active and
non-active parts can be important.
If no active atoms are defined, the entire system is treated as active. The active region definitions
also apply to MM calculations, but have to be provided via the qmmm block. Active atoms can be
defined either directly or via the B-factor column of a pdb file.
%qmmm
QMAtoms {19:35} end
ActiveAtoms {0:35 55:74} end # 1- list of atoms (counting starts from 0) or
Use Active InfoFromPDB true # 2- definition from the file. Default false.
end # If (2) is set to true, (1) is ignored
*pdbfile 0 1 ubq.pdb
If Use Active InfoFromPDB is set to true, a pdb file should be used for the structural input.
Active atoms are defined via 1 in the B-factor column, non-active atoms via 0.
For very large active regions, the L-Opt or L-OptH feature can be used (“Geometry Optimization
of Large Systems” in Exp 15). For the L-Opt/L-OptH feature, there exist two ways to define the
active region: 1) via the ActiveAtoms keyword (or the Use Active InfoFromPDB true) or 2) via
fragment definition and the different keywords for fragment optimization (see “Geometry Optimiza-
tion of Large Systems” in Exp 15).
QM-MM, QM-QM2 and QM2-MM Boundary & Embedding Types: There is one
boundary region in QM/MM or QM/QM2 methods, and two boundaries in QM/QM2/MM, all
of which can go through covalent bonds. In the following we will only discuss the boundary between
QM and MM, but the same holds for the other boundaries.
ORCA automatically generates link atoms based on the information of the QM region and on
the topology of the system (based on the ORCAFF.prms file). ORCA places link atoms on the bond
between QM and MM atoms. When defining the QM, QM2 and MM regions, make sure that you
only cut through single bonds, not aromatic, double, triple bonds, etc.
ORCA uses standard values for the most common atoms involved in boundary regions (C, N, O),
which can be modified if desired. Moreover, bonded interactions at the QM-MM boundary as well as
charge alteration can be specified by the user (see section 8.13.1.6 in ORCA 5.0.1 for more details).
Both mechanical and electrostatic embedding schemes are available in ORCA The electrostatic
interaction between QM and MM system is computed at the QM level. Thus, the charge distribution
of the MM atoms can polarize the electron density of the QM region. The LJ interaction between
QM and MM system is computed at the MM level. The electrostatic embedding is more advanced
and more accurate procedure than the mechanical scheme.
%qmmm
Embedding Electrostatic # two options: Electrostatic (Default); Mechanical
end
127
16.2.2 ONIOM (Subtractive) Methods
For large systems (up to 10000 atoms), or for large QM regions in biomolecules, ORCA provides the
subtractive QM/QM2 and QM/QM2/MM methods, where we use a high level (QM) and a low level
(QM2) for different parts of the system. The advantages of this, in contrary to additive QM-MM
methods, are: 1) QM2 methods are polarizable, the interaction with the high level region is more
accurate; 2) No MM parameters are needed for the atoms that are described at the QM2 level.
A simple QM/QM2/MM calculation looks like (where QM2 atoms are defined by QM2Atoms):
! QM/XTB/MM
%qmmm
ORCAFFFilename "1ubqaddh.ORCAFF.prms"
QMAtoms {338:370} end
QM2Atoms {326:337 371:437} end
end
*pdbfile -1 1 1ubqaddh.pdb
QM method can be defined as usual (e.g. B3LYP def2-TZVP def2/J). QM2 method can be one
of the built-in methods in ORCA (defined above) or even can be extended to other methods defined
by the user using QM2CUSTOMMETHOD and QM2CUSTOMBASIS:
! QM/QM2/MM B3LYP def2-TZVP def2/J
!pal4
%qmmm
ORCAFFFilename "1ubqaddh.ORCAFF.prms"
QM2CUSTOMMETHOD "BP86"
QM2CUSTOMBASIS "def2-SVP def2/J"
QMAtoms {338:370} end
QM2Atoms {371:437} end
end
*pdbfile 0 1 1ubqaddh.pdb
128
Alternatively, a custom QM2 method and/or basis set can be provided using a file. By default,
ORCA uses electrostatic embedding, i.e. the high level system sees the atomic point charges of
the low level (QM2) system. These point charges are derived from the full system low level (QM2)
calculation. Hirshfeld method is the default for determining these charges. Moreover, the QM2 point
charges can be scaled (the default is 1). ORCA uses the XTB method (needs otool xtb binary in
ORCA PATH) by default for the preparation of the QM2 topology (for the detection and realistic
treatment of the covalent bonds between high and low level part of the system). (see section 8.13.3
in ORCA 5.0.1 manual for more details on these points).
The two subsystems in QM/QM2 and the three subsystems in QM/QM2/MM can have different
charges and multiplicities. Defining the correct charges and multiplicities is very important. The
charge and multiplicity defined via the coordinate section defines the charge and multiplicity of the
high level region only (QMAtoms). The user still needs to define the charge and multiplicity of the
total system of QM/QM2 (corresponding to the sum of the charge of the high level and low level
parts, and corresponding to the overall multiplicity). The charge of the MM region is determined
based on the MM parameters provided by the forcefield.
In case of QM/QM2, the charge and multiplicity of the total system are defined by Charge Total
and Mult Total:
!QM/QM2
%qmmm
QMAtoms {1:5} end
Charge Total 0 # charge of the full system. Default 0.
Mult Total 1 # multiplicity of the full system. Default 1.
end
*pdbfile 0 1 1ubqaddh.pdb
While in case of QM/QM2/MM, the charge and multiplicity of the QM2 system are defined by
Charge Medium and Mult Medium:
!QM/QM2
%qmmm
QMAtoms {1:5} end
Charge Medium 0 # charge of the medium system. Default 0.
Mult Medium 1 # multiplicity of the medium system. Default 1.
end
*pdbfile 0 1 1ubqaddh.pdb
Solvation
Solvation in QM/QM2 is only included in the large low level calculation. The small (high level region)
calculations are only seeing the (already solvated) point charges of the large system calculation.
Implicit solvation cannot be performed with QM/QM2/MM method since it cannot be implemented
in MM. Implicit solvation with semiempirical, ab-initio and DFT can be performed with PCM based
models and its variations (CPCM, SMD, etc.) e.g. ! QM/HF-3c CPCM(Water). Implicit solvation
with DFTB (such as XTB) and some force fields (such as GFN-FF) only utilize the analytical
linearized Poisson-Boltzmann (ALPB) solvation model, e.g. ! QM/XTB ALPB(water). See also
Experiment 17 for explicit solvation techniques.
129
Technical Background: Practical Tips on Setting-Up a QM/MM Calculations
1. For many cases, a crystal structure of an enzyme alone, with no ligands bound at the active site,
may be of little use, because it is difficult to predict binding modes and protein conformational changes
associated with binding. Often, the crystallographic structure of an enzyme-ligand complex is a good
choice.
2. The resolution of a protein structure determined by X-ray crystallography provides an indication of
the level of accuracy. A high-resolution structure (less than 2 Å) is likely to give the positions of most
heavy atoms very well, while at a very low resolution, it is probable that only the overall shape of the
protein can be inferred. The quoted resolution (and the crystallographic R-factor) is only a measure
of global model quality. Even in high-resolution structures, there can be considerable uncertainty in
atomic positions for part of the system due to protein dynamics and conformational variability.
3. Hydrogen atoms are not usually resolved in X-ray crystallography of proteins, because of their
low electron density. As a result, hydrogen atoms have to be added to a crystal structure prior to
simulation.
4. For titratable amino acid residues such as aspartic acid, glutamic acid and histidine (see Figure
below), the protonation states of the residues need to be specified, and might not be obvious by
inspection.
Unexpected protonation states of amino acid side can be favoured within proteins, and predicting
pKa s in proteins remains a challenging problem. One method to aid in the selection of protonation
states is based on their local environment (for example, using the PROPKA program ). Care must
be taken to choose the most likely state for each titratable amino acid side chains in a protein model.
This can be achieved by inspection of the local hydrogen-bonding environment.
5. Crystal structures often contain alternative conformations of some side chains: it may be necessary
to investigate the various possibilities.
6. Crystal structures usually contain oxygen atoms corresponding to ordered water molecules that are
often involved in hydrogen bonding with the enzyme. To create a full model, it is necessary to solvate
the protein further, typically by placing the protein in a pre-equilibrated water box, and deleting any
water molecules close to other atoms, in order to reproduce the effects of bulk solvation.
7. Once a molecular model has been created, a series of MM and/or QM/MM energy minimizations
is usually carried out in order to optimize the geometries of both the added hydrogen atoms and
the protein heavy atoms, as well as the added water molecules. Usually, initial minimization of the
hydrogen atoms is first done followed by all atoms optimization.
8. MD simulations are important to investigate protein/enzyme internal motions and conformational
changes, to generate structures for mechanistic modelling, and for (QM/MM) calculations of free
energy profiles for reactions. MD simulations can be performed with MM methods, or with low levels
of QM/MM.
9. It is increasingly common to use MD structural “snapshots” as a starting point for QM/MM
calculations, rather than the crystal structure directly.
10. For MD QM/MM simulations particularly, the whole protein/enzyme is often truncated where
only a part of the whole protein (for example, a rough sphere around the active site) might be
included in the simulation. When simulating a truncated protein, it is necessary to include restraints
or constraints in the boundary region to force the atoms belonging to it to remain close to their
positions in the crystal structure.
130
Experiment 17
Ab-initio Molecular Dynamics (AIMD) Simulations
17.1 Introduction
Molecular dynamics (MD) simulation is a technique to simulate the motion of atoms and molecules
under predefined conditions, such as temperature, pressure, external forces, etc. MD simulations
can therefore be used to study dynamical processes at the nanosecond scale (or sometimes at the
microseconds) and to calculate a broad range of properties.
MD simulations in principle requires only energies and gradients from an electronic structure
calculation such as DFT, ab-initio, semiempirical (SE), density functional tight-binding (DFTB), or
different molecular mechanics (MM) force fields. The Newton’s equations of motion are then solved
in order to obtain the molecular evolution with time.
Traditionally, most MD simulations are based on classical MM force fields. If energies and gra-
dients are calculated using a SE, DFT, or ab-initio SCF electronic structure method, the simula-
tion is then called an ab-initio MD (AIMD) simulation. Strictly speaking, these simulations are
Born-Oppenheimer molecular dynamics (BOMD) simulations, since they approximately solve the
time-independent Schrodinger equation to compute gradients and then move the atoms according to
these gradients. In addition, SE and DFTB methods are not ab-initio methods.
Any electronic structure method in ORCA with available gradients (ideally analytic) can be used
to perform AIMD/BOMD simulations. The velocity Verlet algorithm is used in ORCA for solving
Newton’s equations of motion.
Here, it is important to realize that an AIMD simulation is expensive and will not enable you
to access the nanosecond/microsecond timescales that are possible with classical MM MD. A few
picoseconds for a small molecule is achievable at high cost and it’s up to the user to carefully set
up such a simulation and determine whether a few ps MD simulation will result in useful data or
not. For instance, chemical reactions are rarely accessible at the picosecond timescales. Moreover,
in AIMD, trajectories are sensitive to numerical noise. SCF convergence criteria, DFT numerical
integration etc. needs to be monitored for stable trajectories. Doing an MD simulation is not as
straightforward as doing a geometry optimization.
131
Theoretical Background: Some Basic Concepts in MD Simulations
132
Theoretical Background: Some Basic Concepts in MD Simulations Contd’
Overall, the energy of the system is conserved up external temperature, pressure, etc. In the above
to very small fluctuations with an order of mag- example, it takes around 600 fs before the system
nitude of less than 10−5 relative to the average temperature as well as the magnitude of the total
value. Importantly, the average value is station- energy total fluctuations have reached a station-
ary and no drift can be observed. These findings ary state. This is important because any measure-
indicate properly chosen MD settings. Now if the ment of observables should be carried out after the
above simulation is repeated but with an initial system is equilibrated (unless we are interested in
temperature of, say, 5000 K and a time step size of non-equilibrium phenomena). In general, equili-
5 fs, you will find an increased motion of the atoms bration times of MD simulations can vary over a
and will observe a pronounced drift of the total en- wide range up to several hundreds of nanoseconds
ergy. The energy is not conserved any more, as the (e.g. in biological molecules or polymer systems)
timestep chosen is too large for this temperature. and depend essentially on the longest relaxation
time present in the system. We should therefore
always check that the observables of interest have
reached a stationary state in our simulations.
Steps of Running MD Simulations
1. Setting up the system: Before setting up
the simulation we should decide what type
of calculation we are interested in. Should
the total energy be conserved, as in an iso-
lated system? Should T be kept constant
to mimic the coupling of the system to a
heat bath? Is the system exposed to any
external pressure or stress? Based on these
considerations, a suitable set of simulation
parameters should be selected: Time step
size, number of integration steps (duration
Thus, a larger time step increases the error in of simulation), initial temperature, etc.
the numerical integration scheme of the equations
of motion. The underlying assumption that the 2. Equilibration: In MD simulations, atoms
atomic forces are approximately constant during of the system undergo a relaxation that usu-
one integration step may not be valid at large time ally lasts for tens or hundreds of ps before
steps. Essentially, the chosen time step should the system reaches a stationary state. This
be small enough to resolve the highest vibrational initial nonstationary segment of the simu-
frequencies of the atoms (i.e. it should be much lation is typically discarded. Equilibration
smaller than the smallest vibrational period), so protocols are still a matter of personal pref-
if you have light atoms (e.g. hydrogen), you will erence. Some protocols call for elaborate
generally be required to use a smaller time step procedures involving gradually increasing T
than if you have only heavy atoms (such as gold). in a step-wise fashion while other aggressive
A smaller time step size may also be necessary if approaches simply use a linear T gradient
you have different elements in your calculation, if and heat the system up to the desired T .
the temperature is high, or if the atoms are far 3. Production: Here we need to run the MD
away from their equilibrium configuration, i.e if simulation for a time much longer than the
large forces act on the particles. For most systems relaxation time of the property of interest.
a safe choice to start, if you do not know what time If we study a large biomolecule that can
step to use, is 1 fs. Larger timestep values can then adopt different conformations, the simula-
be assessed by monitoring the conservation of the tion time should be enough to allow the
total energy in an NVE simulation. macromolecule to explore all the possible
Another very important aspect of performing MD configurations. Otherwise the results ob-
simulations is to be aware that it may take some tained may correspond to only one of the
time before a system is equilibrated to the chosen global conformations of the macromolecule.
133
17.2 Procedure & Examples
The molecular dynamics module is activated by specifying “MD” in the simple input line. But also
the actual MD input which describes the simulation variables follows in the “%md” section (or block)
later in the input file.
Let us start by a simple AIMD simulation of a water dimer. The input below will perform a
simple MD simulation at the BLYP-D3/def2-SVP level of theory, using the default velocity Verlet
algorithm in the NVT ensemble1 . Timestep is 0.5 femtoseconds and initial velocities according to
a temperature of 300 K. Temperature is maintained at 350 K using a Berendsen thermostat with a
coupling strength of 10 fs. There will be 2000 runs and therefore the simulation time is 1 ps.
! MD BLYP D3 def2-SVP
%md
timestep 0.5 fs
initvel 300 k
thermostat berendsen 300 k timecon 10 fs
dump position stride 1 filename "basename.trajectory.xyz"
run 2000
end
* xyz 0 1
O -2.03740 -1.21799 -0.08342
H -1.06493 -1.04408 -0.02285
H -2.37327 -1.07034 0.83692
O -1.65042 1.84243 0.07893
H -0.72656 1.49786 -0.01029
H -2.07086 1.65422 -0.79801
*
The command dump specifies how to write the output trajectory of the simulation. Position in
the above example is asked to be written in the trajectory file, but also can be one of the keywords:
Velocity, Force, GBW, and EnGrad. The stride modifier specifies to write only every n-th time
step to the output file (default is n = 1, i.e., every step). In addition, the format modifier (not
shown in the above example) sets the format of the output file. Currently, only the XYZ, PDB, and
DCD formats are implemented (for instance Format PDB). If not specified, ORCA tries to deduce
the format from the file extension of the specified file name. If also no file name is given, trajectories
will be written in XYZ format by default.
The output file of such a calculation contain tabulated data of simulation time, temperature,
kinetic, potential, and total energies, and some other information, for each step. The same data is
also found in the file named basename-md-ener.csv which is a text file containing a table with all
the previous information, and can be used as spreadsheet (like excel) or directly used for plotting with
gnuplot. Plot T , Ekin , Epot , and Etot as a function of simulation time for the above MD simulation
and observe the behavior of these variables with time.
The geometry of the system at each step is stored in a trajectory file, the name of which was
defined in the input file: basename.trajectory.xyz. This file is needed for the animation of the
1
The NVT ensemble is appropriate for processes at thermal equilibrium, while NVE ensemble is more suitable for
non-equilibrium situations.
134
simulation trajectory using Avogadro, VMD, UCSF Chimera, or any other visualization program. In
Avogadro, after opening the program, open the *.trajectory.xyz file (this may take some time),
then Extensions > Animations > Load File and load the trajectory file *.trajectory.xyz again
(it should be the same file as that opened in the first step, otherwise it will produce an error). Loading
the file may take some time again. When the loading is done, click on the “Play” bottom.
To watch trajectory movies using VMD program (Visual Molecular Dynamics) on a Linux ter-
minal (command line), simply type: vmd basename.trajectory.xyz, then click on Play Forward
bottom (see next Technical Background: Getting Started With VMD).
You may have noticed in the above trajectory movie, that the whole system is moving during
simulation time. In order to keep the system’s center of mass fixed during MD runs, we use the
following “constraint” on the center of the system: Constraint Add Center 0..5, where the list of
numbers (0..5 in our example) can be a combination of numbers and ranges, e.g., 0, 1, 3, 5..11,
142 . The weighted average position of these defined atoms is then constrained to a fixed position
in Cartesian space. By default, the weights are taken as the atom masses, such that the center of
mass of the selected atoms is kept fixed. This allows, for instance, to run a MD simulation of two
molecules with a center of mass of fixed position.
AIMD simulations are computationally expensive, and will typically run for a long time. Often,
it is desirable to perform such a simulation as a sequence of multiple shorter runs. The ORCA MD
module writes a restart file in each simulation step, which allows for the seamless continuation of
simulations. This restart file has the name basename.mdrestart. To load an existing restart file,
use the Restart command followed by IfExists modifier. In the first run of a planned sequence
of runs, no restart file exists yet, the restart is therefore simply skipped with no error. (by using
Restart IfExists, we do not need to modify the input after the first run has finished). In fact, it
is important to know that trajectory files are appended (not overwritten) by default. To summarize,
if the following input is subsequently executed ten times (without any modification):
%md
timestep 0.5 fs
initvel 300 k
thermostat berendsen 300 k timecon 10 fs
dump position stride 1 filename "basename.trajectory.xyz"
Restart IfExists
run 100
end
the resulting trajectory file will be identical (apart from numerical noise) to that obtained if the
following input is executed once:
%md
timestep 0.5 fs
initvel 300 k
thermostat berendsen 300 k timecon 10 fs
dump position stride 1 filename "basename.trajectory.xyz"
run 1000
end
2
Since ORCA is written in C language, the numbering of atoms, orbitals, etc. start from 0.
135
Technical Background: Getting Started With VMD
Next click on the Molecule name in the VMD Main. This should highlight the line in green.
Then left-click File > Save Coordinates, a window should appear, select all in the Selected
atoms box. Next under File type select xyz (or pdb). Finally in the Frames box select the First
frame as 299 and the Last frame as 299, then click on Save.
136
Unlike ORCA input, the MD input is based on commands not keywords. Commands are executed
line-by-line consecutively starting at the top. Thus, identical commands with different arguments
may be given, coming into effect when the interpreter reaches the corresponding line. This enables
to perform multiple simulations (e.g., pre-equilibration and production) within a single input file:
%md
initvel 300 k
thermostat berendsen 300 k timecon 10 fs
dump position stride 1 filename "basename.trajectory.xyz"
timestep 1.0 fs
run 200
timestep 2.0 fs
run 500
end
Consider now the dynamics of a proton transfer in the rxn between NH3 and [Al(H2 O)6 ]3+ :
137
Note that the reaction is allowed to take place using implicit solvation (CPCM solvation model)
in water. Watch the trajectory movie when the calculation is finished, and observe how ammonia
“abstracts” a proton (H+ ) from one OH group in [Al(H2 O)6 ]3+ .
The command Cell Sphere 0, 0, 0, 15 is simply to build a cell around the molecular system.
Cell Defines a harmonic repulsive wall around the system (the wall is “soft” with a spring constant
and atoms can slightly penetrate. This helps to keep the molecules inside of a well-defined volume.
We can select cubic, spherical, etc. shapes of cells. In our example, we have asked for a spherical
cell with radius 15 Å centered at the origin (0, 0, 0).
In fact, MD simulated trajectories must be analyzed to extract the desired properties. After a
successful MD job, we have access to the atomic positions, velocities and forces as a function of
time, so any statistical mechanical property that can be expressed in terms of those variables can be
computed. Moreover, some statistical variables, such as atomic mean square displacement (AMSD)
and RMSD analysis is important in regard to MD simulation but we will go through their detail in
the present introductory experiment.
In the above examples of water dimer and NH3 reaction with [Al(H2 O)6 ]3+ using implicit solvation
in water, and in many other AIMD simulations, we simulate a system in isolation, i.e. surrounded
by vacuum3 . However, in typical MD simulation, in particular when using classical force fields and
in cases we are interested on the bulk properties of a liquid or solid system, we need to impose some
boundary conditions. Note that ORCA does not feature periodic boundary conditions, and therefore,
all cells that can be created around a system are non-periodic and are just repulsive walls.
Multiscale QM/QM2 (see §16.2.2) calculations are done with the ! QM/QM2 keyword with some
detailed information in the %qmmm block. QM2 could be DFTB (such as XTB), semiempirical (such
as AM1 and PM3), or composite methods (such as HF-3c, PBEh-3c, and r2SCAN-3c). The charge
and multiplicity defined via the coordinate section defines the charge and multiplicity of the high
level region only (QMAtoms). The user still needs to define the charge and multiplicity of the total
system of QM/QM2 (corresponding to the sum of the charge of the high level and low level parts,
and corresponding to the overall multiplicity). The charge and multiplicity of the total system are
defined by Charge Total and Mult Total. For example
Consider again the H2 O dimer. Let us assume that we want to treat one water molecule at the
DFT level and the other molecule at the DFTB (XTB) level. The input file will be something like:
3
Of course, we can solvate any system using implicit or explicit solvation models, but even then, the system itself
containing the solvent is still an isolated system surrounded by vacuum.
138
! MD QM/XTB wB97X-D3 def2-SVP RIJCOSX def2/J TightSCF ALPB(water) PAL4
%qmmm
QMAtoms {0:2} end
Charge Total 0
Mult Total 1
end
%md
timestep 0.5 fs
initvel 300 k
thermostat berendsen 300 k timecon 10 fs
dump position stride 1 filename "basename.trajectory.xyz"
Constraint Add Center 0..5
run 2000
end
* xyz 0 1
O -2.03740 -1.21799 -0.08342
H -1.06493 -1.04408 -0.02285
H -2.37327 -1.07034 0.83692
O -1.65042 1.84243 0.07893
H -0.72656 1.49786 -0.01029
H -2.07086 1.65422 -0.79801
*
In molecular dynamics, if there are active and non-active atoms in the multiscale system, only
the active atoms are allowed to propagate in the MD run. If all atoms are active, all atoms are
propagated. Moreover, as water have to be treated as rigid bodies due to its parametrization, ORCA
offer a keyword for the automated rigid treatment of all active MM water molecules. The following
keyword applies bond and angle constraints to active MM water molecules in optimizations as well
as MD runs:
In ORCA, implicit solvation with semiempirical, ab-initio and DFT can be performed with PCM
based models and its variations (CPCM, SMD, etc.) Implicit solvation with DFTB (such as GFN2-
xTB and GFN-xTB) and some force fields (such as GFN-FF) may utilize the analytical linearized
Poisson-Boltzmann (ALPB) solvation model, e.g. ALPB(water).
Explicit water (or other solvents) can be added to any system/solute as a box (for periodic
boundary conditions calculations) or as a sphere (for non-periodic boundary conditions calculations).
This can be done, for instance using VMD or Chimera. In Chimera: Tools > Amber > Solvate
then select box or cap (cap add a sphere) then define cap radius and cap center. Recently, Grimme
and coworkers proposed an automated solvation model based on molecular cluster growing for explicit
solvation using FF and DFTB methods; see the details in the corresponding site Quantum Cluster
Growth program (free open source).
139
Chemistry Background: Averages in MD Simulations
140
17.3 Exercises
Exercise 1
In this exercise, you will apply AIMD to study the inclusion of a small molecule (N2 ) in a small
carbon nanotube (CNT).
141
18
Appendices
Gaussian is the most widely used program. Gaussian is commercial and exists in versions for UNIX
workstations and Windows and Macintosh personal computers. The first version of Gaussian was
released in 1970 and Gaussian 09 was released in 2009. Gaussian was developed by John Pople
and co-workers and has been a key force in the growing use of quantum-chemistry calculations by
chemists, since it is an easy-to-use program that allows a very wide variety of calculations to be done
by virtually every available quantum-mechanical method.
ORCA, NWChem, GAMESS, and Dalton are advanced programs that are free to academic
users. Some other commercial advanced quantum chemical computational programs includes Mol-
pro, Molcas, Q-Chem, and Turbomole.
The atomic units have been chosen such that the fundamental electron properties are all equal to
one atomic unit (a.u.). See the Table below.
The resulting formulae then appear to be dimensionless. For instance, the Hamiltonian operator
for a single electron hydrogenlike atom becomes
142
ℏ2 2 e2 1 1
− ∇ − = − ∇2 − (18.1)
2me 4πε0 r 2 r
The main quantum chemical computational methods are (1) ab-initio, (2) density-functional theory
(DFT), (3) semiempirical, (4) quantum monte carlo (QMC), and (5) quantum molecular dynamics
(QMD). An ab-initio (Latin: from the beginning) calculation uses the correct Hamiltonian to calcu-
late the wavefunction Ψ using only values of the fundamental physical constants. The DFT method
calculates the electron probability density ρ, and calculates the electronic energy from ρ. Most of
these methods uses at least two approximations: (I) The Born-Oppenheimer approximation,
and (II) LCAO P approximation: The basis set is composed of a finite number of orthogonal
functions (ϕ = i ci χi ).
In a Hartree-Fock (HF) ab-initio approach, a third approximation is made: (III) the orbital
143
approximation, in which the energy eigenfunctions are assumed to be products of one-electron
wavefunctions (Hartree product). Therefore, electrons are allowed to move independently and feel
the averaged field of all the other electrons in the system; their motion is uncorrelated (a neglect of
instantaneous e-e interactions). The set of MOs leading to the lowest energy using the variational
principle is obtained by a procedure referred to as a self-consistent field (SCF).
Post-HF ab-initio methods add electron correlation on a HF wavefunction. The basic three
Post-HF methods are:
Moreover, HF and post-HF methods use: (IV) the single reference approximation, in which
the wavefunction is described in terms of one Slater determinant or configuration state function
(CSF). A multi-configurational self-consistent field (MCSCF) method uses more than one
determinant or CSF as a reference. When electron correlation on top of a MCSCF wavefunction is
added we have the multi-reference regime: MRCI, MRPT, or MRCC.
Properties of Methods
A method is size consistent if the computed energy of a molecule dissociated into two or more
infinitely separated parts and treated as a single system equals the sum of the computed energies of
each part. Size consistency can be checked in the following way: If two noninteracting systems A
and B are calculated as a single system {A · · · B, 10} (e.g., with a separation of 10 Å), then the sum
of the energies E(A) + E(B) must fulfill the following condition:
E(A · · · B, 10) = E(A) + E(B) (18.2)
A more imposed definition requires that the method not only correctly describe the fragmentation
limit, but the entire potential energy curve mapped out when two non-interacting molecules are
brought close together must be correctly described as well. However, we will only consider the basic
definition in (18.2) in the following discussion.
The size extensivity of a given quantum chemical method guarantees that the energy calcu-
lated for an electronic system with this method scales linearly with the number n of electrons. An
144
important advantage of a size-extensive method is that it allows straightforward comparisons be-
tween calculations involving variable numbers of electrons, e.g. ionization processes or calculations
using different numbers of active electrons. Lack of size-extensivity implies that errors from the exact
energy increase as more electrons enter the calculation.
Size-extensivity and size-consistency are not mutually exclusive properties, by any means. At the
non-interacting limit, size-extensivity of a method is a necessary and sufficient condition to ensure
size-consistency, implying that the former is more general than the latter. However, size-extensivity
does not ensure correct fragmentation.
Notation
A notation having the form “Method/Basis” is used to specify the method and the basis set used in a
calculation. For example, HF/6-31G* denotes a HF SCF MO calculation using the 6-31G* basis set.
The notation CCSD(T)/CBS denotes a result found by extrapolation of CCSD(T) calculations to the
CBS limit. In the notation “Method2/Basis2//Method1/Basis1 ”, the notation after the “//” indicates
the level at which the geometry is optimized. For example, CCSD(T)/cc-pVQZ//HF/6-31G* denotes
a single-point energy calculation done with the CCSD(T)/cc-pVQZ level at the equilibrium geometry
found by an HF/6-31G* geometry optimization.
The four sources of error in ab-initio molecular electronic calculations are (a) neglect of or incomplete
treatment of electron correlation, (b) incompleteness of the basis set,(c) relativistic effects, and (b)
deviations from the Born-Oppenheimer approximation. Deviations from the Born-Oppenheimer
approximation are usually negligible for ground-state molecules. Relativistic effects are important
for heavy atoms. In calculations on molecules without heavy atoms, (a) and (b) are the main sources
of error.
Ab initio methods scale very steeply with system size N . RHF formally scales as N 4 (the two-electron
integral step), however, it approaches N 3 for larger systems due to integral screening (Fock matrix
diagonalization scales as N 3 ). Another common notation for scaling behavior is O(N x ) where O()
denotes the order of the asymptotic scaling behavior. MP2 scales as N 5 , CISD and CCSD as N 6 ,
MP4 and CCSD(T) as N 7 , and FCI combinatorially. On the other hand, DFT method scales as N 3 .
145
18.5 Energy Gaps & Reactivity Descriptors
In Koopmans’ theorem, the ionization potential (IP) and the electron affinity (EA) are given
by
IP ≈ −εHOMO , EA ≈ −εLUMO (18.3)
The fundamental gap Efund is the difference between the ionization potential and electron affinity:
The optical gap Eopt is the difference between the first singlet excited (S1 ) and singlet ground (S0 )
states:
Eopt = S1 − S0 (18.5)
The difference between the fundamental gap and the optical gap is a measure of the electron-hole
pair binding energy, EB :
EB = Efund − Eopt (18.6)
EA
IP
EB
E S1
E fund
Eopt
S0
The main global reactivity descriptors are chemical potential (µ), absolute hardness (η),
absolute electronegativity (χ) softness, (σ), the electrophilicity (ω), and the nucleophilicity,
(ε):
µ2
∂E IP + EA 1 ∂µ IP − EA 1 1
µ = −χ = ≈ , η= ≈ , σ= , ω= , ε=
∂N V 2 2 ∂N V 2 η η ω
where E is energy, N is the number of electrons and V is the potential. Fukui local reactivity
descriptors are defined as
for electrophilic fj− , nucleophilic fj+ , or free radical fj0 attacks, on the reference molecule, where qj is
the atomic charge (evaluated from Mulliken population analysis, electrostatic derived charge, etc.)
at the jth atomic site in the neutral (N ), anionic (N + 1) or cationic (N − 1) chemical species. To
a good approximation, it can also turns out that
Here ρHOMO and ρLUMO are the normalized electron densities of the frontier orhitals. If electron
transfer is important, then chemical reaction occurs at the site where ρ has its largest value.
146
18.6 Excited Elecronic States
The energy difference between the excited electronic state (ES) and the ground electronic state (GS)
determined at the GS geometry (QGS ) corresponds to an idealized vertical absorption E vert-ab ,
where as the same data calculated on the ES geometry (QES ) corresponds to the vertical fluores-
cence E vert-fl (if the ES is a singlet state),
or the vertical phosphorescence E vert-ph (if the ES is a triplet state). The adiabatic energy
E adia is the difference of total electronic energies computed for the ES and GS in their corresponding
optimal geometries:
E adia = E ES (QES ) − E GS (QGS ) (18.11)
In addition, there is a variation of the zero-point vibrational energy (ZPVE) between the GS
and the ES:
∆E ZPVE = E ZPVE (QGS ) − E ZPVE (QES ) (18.12)
The so called E 0−0 energy is then defined as
The previous calculations should be performed in solution to be able to compare with experimen-
tal E 0-0 values. Experimental E 0-0 values are the absorption-fluorescence crossing points (AFCPs)
which are obtained from the intersection of normalized absorption and fluorescence spectra: E 0-0 =
1239.84/λintersection .
where E represents the excitation energy of different excited states, in cm−1 , and f is the oscillator
strength of corresponding excited state.
147
18.7 Redox Potentials
The ionization potential (IP) and the electron affinity (EA) can be calculated using the direct “verti-
cal” scheme (VIP and VEA), or the more accurrate “adiabatic” scheme (AIP and AEA) that allows
some realistic geometry relaxation:
dye
Ground state oxidation potential (GSOP), or Eox , can be obtained using many levels of approx-
imations:
1. According to Koopman’s theorem, the negative of the highest occupied molecular orbital
(HOMO) energy, i.e. −εHOMO , is an approximative estimate of the vertical GSOP: GSOP ≈
−εHOMO .
2. In a more accurate approximation, one can estimate the vertical GSOP as the energy difference
in solution between the neutral species, at its and the oxidized species at the same neutral
geometry:
+
GSOP ≈ VIP = Esln (Q0sln ) − Esln
0
(Q0sln ) (18.17)
3. An even more accurate approximation is to consider the geometry relaxation after ionization
in solution, which give rise to the estimation of the adiabatic GSOP as
+
GSOP ≈ AIP = Esln (Q+ 0 0
sln ) − Esln (Qsln ) (18.18)
4. The rigorous way to obtain the GSOP is to compute the free energy difference between the
neutral and the oxidized GS species ∆Gox :
GSOP ≈ ∆Gox = G+ 0
sln − Gsln (18.19)
which can be calculated following a thermodynamic cycle. The Gibbs energy of a species i
(Gisln ) is:
Gisln = Givac (Qivac ) + ∆Gisolv (18.20)
where Givac (Qivac ) is the Gibbs free energy in gas phase and ∆Gisolv is the Gibbs free energy of
solvation:
148
(a) Givac (Qivac ) is the geometry optimized in vacuum, followed by frequency calculations to take
into account the vibrational contribution to the total partition function and correction to
Gibbs energy.
(b) ∆Gisolv is obtained by a single-point calculation in solution and a reference calculation in
gas phase at the geometry optimized in solution:
dye*
Excited state oxidation potential (ESOP), sometimes denoted (Eox ), is obtained by subtracting
0−0
the energyy corresponding to the lowest transition (E ) associated with the λmax from the GSOP.
E 0−0 value can be simply approximated (with less accuracy) as the energy of the vetical absorption
E vert-ab :
ESOP ≈ GSOP − E 0−0 ≈ GSOP − E vert-ab (18.22)
Light harvesting efficiencies (LHE) are calculated using the oscillator strengths f obtained by elec-
tronic excited state calculations. LHE is expressed as:
DSSC’s Efficiency: Driving Forces for Electron Injection & Dye Regen-
eration
Free energy of electron injection ∆Ginj is calculated as the difference between excited state oxidation
potential of the dye and the ground state reduction potential (ECB ) of the conduction band (CB) of
the semiconductor (ECB = 4.0 eV for TiO2 ):
∆Ginj = ESOP − ECB ≈ GSOP − E 0−0 − ECB ≈ GSOP − E vert-ab − ECB (18.24)
Free energy of dye regeneration ∆Greg is calculated as the difference between the ground state
redox redox
oxidation potential and the the redox potential of the electrolyte Eelectrolyte (Eelectrolyte = 4.8 eV for
− −
I /I3 redox couple):
redox
∆Greg = Eelectrolyte − GSOP (18.25)
The reorganization energy in intramolecular charge transfer consists of inner reorganization energy
(λinn ) and outer reorganization energy (λout ). λinn corresponds to the energy cost due to geometry
149
modifications to go from a neutral to a charged geometry and vice versa while λout comes from the
solvent response. λinn can be found based on the Nelson’s four-point method. Inner eorganization
energy for hole transfer λhole and for electron transfer λelectron are calculated as:
where E(TiO2 )n +dye , E(TiO2 )n , and Edye , are the energies of the dye-TiO2 complex, semiconductor, and
free dye; and n is the number of the TiO2 units in a cluster. Negative values indicate stability upon
adsorption.
The total static dipole moment (µ), average linear polarizability (α), and first-order hyperpolariz-
abilities (β) can be calculated using the x, y, and z components of the corresponding variables:
150
18.9 Notation in Quantum Chemistry (IUPAC Recom.)
18.9.1 Ab-initio HF-SCF & Hückel Molecular Orbital (HMO) Theory1
Name Symbol Definition Notes
molecular orbital ϕi (µ) 2
molecular spin orbital ϕi (µ)α(µ); 2
ϕi (µ)β(µ) 2
total wavefunction Ψ Ψ = (n!)−1/2 ||ϕi (µ)|| 2, 3
Ĥµcore Ĥµcore −(1/2)∇2µ
P
core hamiltonian of a single electron = − ZA /rµA 4
A
one electron integrals:
ϕ∗i (1)Ĥ1core ϕi (1)dτ1
R
expectation value of the core hamilt. Hii Hii = 2, 4, 5
two-electron repulsion integrals:
ϕ∗i (1)ϕ∗j (2) r112 ϕi (1)ϕj (2)dτ1 dτ2
RR
coulomb integral Jij Jij = 2, 6
ϕ∗i (1)ϕ∗j (2) r112 ϕj (1)ϕi (2)dτ1 dτ2
RR
exchange integral Kij Kij = 2, 6
P
one-electron orbital energy εi εi = Hii + (2Jij − Kij ) 7
j
P PP P
total electronic energy E E = 2 Hii + (2Jij − Kij ) = (εi + Hii ) 7, 8
i i j i
coulomb operator Jˆi Jˆi ϕj (2) = ⟨ϕi (1)| r112 |ϕi (1)⟩ϕj (2)
exchange operator K̂i K̂i ϕj (2) = ⟨ϕi (1)| r112 |ϕj (1)⟩ϕi (2)
F̂ = Ĥ core + (2Jˆi − K̂i )
P
Fock operator F̂ 7, 9
i
atomic-orbital basis function χr P
molecular orbital ϕi ϕi = χr cri 10
rR
coulomb integral Hrr , αr Hrr = χ∗r Ĥχr dτ 10, 11
Hrs = χ∗r Ĥχs dτ
R
resonance integral Hrs , βrs 10, 12
energy parameter x −x = (α
R − E)/β 13
overlap integral Srs , S Srs = χ∗r χs dτ 10
n
bi c2ri
P
charge order qr qr = 14, 15
i=1
Pn
bond order prs prs = bi cri csi 15
i=1
1
Results in quantum chemistry are typically expressed in atomic units. All lengths, energies, masses, charges, and
angular momenta are expressed as dimesionless ratios to the corresponding atomic units a0 , Eh , me , e, and ℏ.
2
The indices i and j label the molecular orbitals, and either µ or the numerals 1 and 2 label the electron coordinate.
3
The double vertical bars denote anti-symmetrized product of the occupied molecular spin orbitals ϕi α and ϕi β
(sometimes denoted ϕi and ϕi ); for a closed-shell system Ψ would be a normalized Slater determinant. (n!)−1/2 is
the normalization factor and n the number of electrons.
4
ZA is the proton number (charge number) of nucleus A, and rµA is the distance of electron µ from nucleus A.
5
Hii is the energy of an electron in orbital ϕi in the field of the core.
6
The inter-electron repulsion integral is written in various shorthand notations: In Jij = ⟨ij|ij⟩ the first and third
indices refer to the index of electron 1 and the second and fourth indices to electron 2. In Jij = (i∗ i|j ∗ j), the first
two indices refer to electron 1 and the second two indices to electron 2. Usually the functions are real and the stars
are omitted. The same index convention is used for the exchange integral: Kij = ⟨ij|ji⟩ or Kij = (i∗ j|j ∗ i).
7
These relations apply to closed-shell systems only, and the sum extend over the occupied molecular orbitals.
8
The sum over j includes the term with j = i, for which Jii = Kii , so that 2Jii − Kii = Jii .
9
The HF equations read (F̂ − εj )ϕj = 0. The Fock operator involves all of its eigenfunctions ϕi through Jˆi and K̂i .
10
Ĥ is an effective hamiltonian for a single electron, i and j label the molecular orbitals, and r and s label the
atomic orbitals. In HMO Theory Hrs ̸= 0 only for bonded pairs of atoms r and s, and all Srs = 0 for r ̸= s.
11
Note that the name “coulomb integral” has a different meaning in HMO theory, where it refers to the energy of
the orbital χr in the field of the nuclei, from HF theory, where it refers to a two-electron repulsion integral.
12
This expression describes a bonding interaction between atomic orbitals r and s. For an anti-bonding interaction,
the corresponding resonsnce integral is given by the −ve of the resonance intgral for the bonding interaction.
13
In the simplest application of Hückel theory to the π electrons of planar conjugated hydrocarbons, α is is taken
to be the same for all carbon atoms, and β to be the same for all bonded pairs of carbon atomsr.
14
P
−eqr : the electronic charge on atom r. qr is the contribution of all nπ electrons to the total charge at r; qr = n.
15
bi gives the number of electrons which occupy a given orbital εi ; for non-degenerate orbitals, bi can be 0, 1 or 2.
151
18.9.2 HF-Roothan SCF using LCAO-MO1
Name Symbol Definition Notes
atomic-orbital basis function χr 2
P
molecular orbital ϕi ϕi = χr cri
r
χ∗r χs dτ , c∗ri Srs csj = δij
R P
overlap matrix element Srs Srs =
r,s
occ
c∗ri csi
P
density matrix element Prs Prs = 3
r,s
integrals over the basis functions:
Hrs = χ∗r (1)Ĥ1core χs (1)dτ1
R
one-electron integrals Hrs
RR ∗
two-electron integrals (rs|tu) (rs|tu) = χr (1)χs (1) r112 χ∗t (2)χu (2)dτ1 dτ2 4, 5
Prs Hrs + 21 Prs Ptu [(rs|tu) − 21 (ru|ts)]
PP PPPP
total electronic energy E E= 3, 5
r s r s t u
Ptu [(rs|tu) − 12 (ru|ts)]
PP
matrix element of Fock operator Frs Frs = Hrs + 3, 6
t u
1
Results in quantum chemistry are typically expressed in atomic units. All lengths, energies, masses, charges, and
angular momenta are expressed as dimesionless ratios to the corresponding atomic units a0 , Eh , me , e, and ℏ.
2
The indices r and s label the basis functions. In numerical computations the basis functions are either taken as Slater-
type orbitals (STO) or as Gaussian-type orbitals (GTO). An STO basis function in spherical polar coordinate has the
general form χ(r, θ, ϕ) = N rn−1 e−ζnl r Ylm (θ, ϕ) where ζnl is a shielding parameter representing the effective charge in
the state with quantum numbers n and l. GTO functions are typically expressed in cartesian space coordinate, in the
2
form χ(x, y, z) = N xa y b z c e−αr . Commonly, a linear combination of such functions with varying exponents α is used
in such a way as to model an STO. N denotes a normalization factor.
3
For closed-shell species with two electrons per occupied orbital. The sum extends over all occupied molecular orbitals.
Prs may also be called the bond order between atoms r and s.
4
The contracted notation for two-electron integrals over the basis functions (rs|tu) is based on the same convention
outlined in note 6 in the previous subsection (Ab-initio HF-SCF).
5
Here the two-electron integral is expressed in terms of integrals over the spatial atomic-orbital basis function. The
matrix elements Hii , Jij , and Kij may be similarly expressed in terms of integrals over the spatial atomic-orbital basis
functions according to the following equations:
XX
Hii = c∗ri csi Hrs
r s
XXXX
Jij = (i∗ i|j ∗ j) = c∗ri csi c∗tj cuj (r∗ s|t∗ u)
r s t u
XXXX
∗ ∗
Kij = (i j|j i) = c∗ri csi c∗tj cuj (r∗ u|t∗ s)
r s t u
6
The Hartree-Fock-Roothaan SCF equations, expressed
P in terms of the matrix elements of the Fock operator Frs , and
the overlap matrix elements Srs , take the form: (Frs − εi Srs )csi = 0.
s
152
18.10 Other Useful Data
153
18.10.3 Periodic Table of the Elements
1 18
1A 8A
1 2
H He
hydrogen 2 13 14 15 16 17 helium
1.008 2A 3A 4A 5A 6A 7A 4.0026
3 4 5 6 7 8 9 10
Li Be B C N O F Ne
lithium beryllium boron carbon nitrogen oxygen fluorine neon
6.941 9.0122 10.811 12.011 14.007 15.999 18.998 20.180
11 12 13 14 15 16 17 18
Na Mg Al Si P S Cl Ar
sodium magnesium 3 4 5 6 7 8 9 10 11 12 aluminum silicon phosphorus sulfur chlorine argon
22.990 24.305 3B 4B 5B 6B 7B 8B 8B 8B 1B 2B 26.982 28.085 30.974 32.065 35.453 39.948
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr
potassium calcium scandium titanium vanadium chromium manganese iron cobalt nickel copper zinc gallium germanium arsenic selenium bromine krypton
39.098 40.078 44.956 47.867 50.942 51.996 54.938 55.845 58.933 58.693 63.546 65.382 69.723 72.630 74.922 78.971 79.904 83.798
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe
rubidium strontium yttrium zirconium niobium molybdenum technetium ruthenium rhodium palladium silver cadmium indium tin antimony tellurium iodine xenon
85.468 87.62 88.906 91.224 92.906 95.95 (98) 101.07 102.91 106.42 107.87 112.41 114.82 118.71 121.76 127.60 126.90 131.29
55 56 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
Cs Ba 57-71 Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn
caesium barium hafnium tantalum tungsten rhenium osmium iridium platinum gold mercury thallium lead bismuth polonium astatine radon
132.91 137.33 178.49 180.95 183.84 186.21 190.23 192.22 195.08 196.97 200.59 204.38 207.2 208.98 (209) (210) (222)
87 88 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
Fr Ra 89-103 Rf Db Sg Bh Hs Mt Ds Rg Cn Nh Fl Mc Lv Ts Og
francium radium rutherfordium dubnium seaborgium bohrium hassium meitnerium darmstadtium roentgenium copernicium nihonium flerovium moscovium livermorium tennessine oganesson
(223) (226) (267) (268) (269) (270) (277) (278) (281) (282) (285) (286) (289) (290) (293) (294) (294)
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
Lanthanides La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm Yb Lu
lanthanum cerium praseodymium neodymium promethium samarium europium gadolinium terbium dysprosium holmium erbium thulium ytterbium lutetium
138.91 140.12 140.91 144.24 (145) 150.36 151.96 157.25 158.93 162.50 164.93 167.26 168.93 173.05 174.97
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
Actinides Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No Lr
actinium thorium protactinium uranium neptunium plutonium americium curium berkelium californium einsteinium fermium mendelevium nobelium lawrencium
(227) 232.04 231.04 238.03 (237) (244) (243) (247) (247) (251) (252) (257) (258) (259) (266)
Notes:
1. This table is based on “IUPAC Periodic Table of the Elements”, dated November 28, 2016. www.iupac.org.
2. The upper number in a box is the atomic number, the lower number is the conventional atomic mass.
3. For elements with no stable isotopes, the mass number of the isotope with the longest half-life is in paranthesis.
4. Color code: Blue: gas, Red: liquid, Yellow: artificially prepared, Gray: Metalloids.
154
Bibliography
[1] Foresman, J. B.; Frisch, Æ., Exploring Chemistry with Electronic Structure Methods, 3rd ed.,
Gaussian Inc., 2015.
[2] Jensen, F., Introduction to Computational Chemistry, 3rd ed., Wiley, 2017
[4] Atkins, P. W.; de Paula, J., Atkins’ Physical Chemistry, 11th ed., OUP, 2018.
[5] Engel, T., Quantum Chemistry & Spectroscopy, 4th ed., Pearson Education, Inc., 2019.
[6] Lowe, J.; Peterson, K., Quantum Chemistry, 3rd ed., Academic Press, 2006.
[7] Atkins, P. W.; Friedman, R. S., Molecular Quantum Mechanics, 5th ed., Oxford University
Press, 2011.
[9] McQuarrie, D. A., Quantum Chemistry, 2nd ed., University Science Books, 2008.
[10] Piela, J., Ideas of Quantum Chemistry, 2nd ed., Elsevier, 2014.
[11] Roos, B.; Lindh, R.; Malmqvist, P.; Veryazov, V.; Widmark, P., Multiconfigurational Quantum
Chemistry, Wiley, 2016.
[12] Schatz, G. C.; Ratner, M. A., Quantum Mechanics in Chemistry, Dover, 2002 Cambridge Uni-
versity Press, 2009.
[13] Simons, J.; Nichols, A., Quantum Mechanics in Chemistry, Oxford University Press, 1997.
155