arX
iv:1
401.
7441
v1 [
cond
-mat
.mtr
l-sc
i] 2
9 Ja
n 20
14
Daubechies Wavelets for Linear Scaling Density Functional Theory1
Stephan Mohr,1, 2 Laura E. Ratcliff,2 Paul Boulanger,2, 3 Luigi2
Genovese,2 Damien Caliste,2 Thierry Deutsch,2 and Stefan Goedecker13
1Institut fur Physik, Universitat Basel, Klingelbergstr. 82, 4056 Basel, Switzerland4
2Laboratoire de simulation atomistique (L Sim), SP2M,5
UMR-E CEA / UJF-Grenoble 1, INAC, Grenoble, F-38054, France6
3Institut Neel, CNRS and Universite Joseph Fourier, B.P. 166, 38042 Grenoble Cedex 09, France7
(Dated: January 30, 2014)8
We demonstrate that Daubechies wavelets can be used to construct a minimal set of optimizedlocalized contracted basis functions in which the Kohn-Sham orbitals can be represented with anarbitrarily high, controllable precision. Ground state energies and the forces acting on the ions canbe calculated in this basis with the same accuracy as if they were calculated directly in a Daubechieswavelets basis, provided that the amplitude of these contracted basis functions is sufficiently small onthe surface of the localization region, which is guaranteed by the optimization procedure described inthis work. This approach reduces the computational costs of DFT calculations, and can be combinedwith sparse matrix algebra to obtain linear scaling with respect to the number of electrons in thesystem. Calculations on systems of 10,000 atoms or more thus become feasible in a systematicbasis set with moderate computational resources. Further computational savings can be achievedby exploiting the similarity of the contracted basis functions for closely related environments, e.g.in geometry optimizations or combined calculations of neutral and charged systems.
I. INTRODUCTION9
The Kohn-Sham (KS) formalism of density functional10
theory (DFT)1,2 is one of the most popular electronic11
structure methods due to its good balance between ac-12
curacy and speed. Thanks to the development of new13
approximations to the exchange correlation functional,14
this approach now allows many quantities (bond lengths,15
vibration frequencies, elastic constants, etc.) to be calcu-16
lated with errors of less than a few percent, which is suf-17
ficient for many applications in solid state physics, chem-18
istry, materials science, biology, geology and many other19
fields. Although the KS approach has some shortcom-20
ings – e.g. its inability to accurately describe the HOMO-21
LUMO separation or many-body (e.g. excitonic) effects,22
thus reducing its predictive power in the field of optics –23
it has become the standard for the quantum simulation24
of matter and also provides a well defined starting point25
for more accurate methods, such as the GW approxima-26
tion3.27
Despite the efforts put forth to increase the efficiency28
of DFT calculations and the increasing computing power29
of modern supercomputers, the applicability for stan-30
dard calculations is limited to systems containing about31
a thousand atoms, which is small compared to the size32
of systems of interest in nanoscience. The reason for33
this is that standard electronic structure programs using34
systematic basis sets such as plane waves4–6 , finite el-35
ements7 or wavelets8 need a number of operations that36
scales as the number of orbitals, Norb, squared times the37
number of basis functions, Nbasis, used to represent them.38
Since both the number of orbitals and the number of basis39
functions scale as the number of atoms, the overall cost40
scales as O(N2orbNbasis) = O(N3
atom). Electronic struc-41
ture programs that use Gaussians9 or atomic orbitals1042
require in a standard implementation a matrix diagonal-43
ization which scales as O(N3basis).44
To circumvent this problem, one can exploit Kohn’s45
nearsightedness principle11–13, which states that, for sys-46
tems with a finite gap or for metals at finite temperature,47
all physical quantities are determined by the local envi-48
ronment. This is a consequence of the exponentially fast49
decay of the density matrix14–20. Therefore, it is theo-50
retically possible to express the KS wavefunctions of a51
given system in terms of a minimal, localized basis set.52
In order to get highly accurate results while still keeping53
the size of the basis relatively small, such a basis has to54
depend on the local chemical environment. If this basis55
set were known or could be approximated beforehand, it56
would lead to a computationally cheap tight-binding like57
approach21,22. Of course, in practice it is not possible58
to determine this optimal localized basis set beforehand;59
instead it has to be built up iteratively during the cal-60
culation. This would result in O(N3orb) scaling, which is61
still equivalent to O(N3atom), but with a much smaller62
prefactor than systematic approaches (e.g. plane waves)63
where the number of basis functions is far greater than64
the number of orbitals (Nbasis ≫ Norb).65
However, the use of a strictly localized basis offers yet66
another possibility. As has been demonstrated during67
the past twenty years23,24, it is possible to truncate the68
density matrix and thus transform it into a sparse form69
by neglecting elements either when they are below a cer-70
tain threshold, or when they correspond to localized or-71
bitals which are too distant from each other. This reduces72
the complexity of the algorithm to O(Norb)=O(Natom)73
and leads to so-called linear scaling (LS) DFT methods.74
Methods of this type have been implemented in numer-75
ous codes such as onetep25, Conquest26, CP2K27 and76
siesta28. Note, however, that the extent of the trunca-77
tion impacts the accuracy due to the imposition of an78
additional constraint on the system, and is therefore left79
2
as a freely selectable parameter for the user. This addi-80
tional constraint also comes at the cost of extra computa-81
tional steps, so that the prefactor is greater than for stan-82
dard DFT codes, even for a single iteration in the self-83
consistency cycle. Furthermore, there can be problems84
with ill-conditioning when using strictly localized basis85
sets, which further increases the prefactor. The combina-86
tion of these two problems means that for small systems87
the total calculation time is actually greater when one88
imposes locality, but thanks to the better scaling, there89
is a crossover point where the new algorithms become90
more efficient.91
Our minimal set of localized contracted basis functions,92
called support functions in the following, is obtained by93
an environment dependent optimization where the sup-94
port functions are represented in terms of a fixed un-95
derlying wavelet basis set. In the language of quantum96
chemistry, these support functions could be denoted as97
environment dependent contracted wavelets. Because of98
the environment dependency, the size of this basis set is99
however for a given accuracy much smaller than the size100
of typical contracted Gaussian basis sets and we refer to101
this basis set therefore also as a minimal basis set.102
The choice of the underlying basis set is one of the most103
important aspects impacting the accuracy and efficiency104
of a linear scaling DFT code. Ideally, it should feature105
compact support while still being orthogonal, thus allow-106
ing for a systematic convergence – properties which are107
all offered by Daubechies wavelets basis sets29. Further-108
more, wavelets have built in multiresolution properties,109
enabling an adaptive mesh with finer sampling close to110
the atoms where the most significant part of the orbitals111
is located; this can be particularly beneficial for inho-112
mogeneous systems. Wavelets also have the distinct ad-113
vantage that calculations can be performed with all the114
standard boundary conditions – free, wire, surface or pe-115
riodic. This also means we can perform calculations on116
charged and polarized systems using free boundary con-117
ditions without the need for a compensating background118
charge. It is therefore evident that the combination of the119
above features makes wavelets ideal for a LSDFT code.120
This paper is organized as follows. We first give an121
overview of the method, focussing in particular on the122
imposition of the localization constraint in Daubechies123
wavelets. We then discuss the details, highlighting the124
novel features, following which we consider the calcula-125
tion of atomic forces. For this latter point, we demon-126
strate the remarkable result that, thanks to the compact127
support of Daubechies wavelets, the contribution of the128
Pulay-like forces, arising from the introduction of the lo-129
calization regions, can be safely neglected in a typical130
calculation. We then present results for a number of sys-131
tems, illustrating the accuracy of the method for ground132
state energies and atomic forces. We also demonstrate133
the improved scaling compared with standard BigDFT,134
showing that we are able to achieve linear scaling. Fi-135
nally, we highlight two cases where the minimal basis136
functions can be reused, resulting in further significant137
computational savings.138
II. MINIMAL CONTRACTED BASIS139
A. Kohn-Sham formalism in a minimal basis set140
The standard approach for performing Kohn-Sham141
DFT calculations is to calculate the Kohn-Sham orbitals142
|Ψi〉 which satisfy the equation143
HKS |Ψi〉 = εi |Ψi〉 , (1)
with144
HKS = −1
2∇2 + VKS [ρ] + VPSP , (2)
where VKS [ρ] contains the Hartree potential – solution145
to the Poisson equation – and the exchange-correlation146
potential, while VPSP contains the potential arising147
from the pseudopotential and the external potential148
created by the ions. In the case of BigDFT, these149
are norm-conserving GTH-HGH30 pseudopotentials and150
their Krack variants31, possibly with a nonlinear core cor-151
rection32. In our approach the KS orbitals are in turn152
expressed as a linear combination of support functions153
|φα〉:154
|Ψi(r)〉 =∑
α
cαi |φα(r)〉 . (3)
The density – which can be obtained from the one-155
electron orbitals via ρ(r) =∑
i fi|Ψi(r)|2, where fi is the156
occupation number of orbital i – is given by157
ρ(r) =∑
α,β
φ∗α(r)Kαβφβ(r), (4)
where Kαβ =∑
i ficαi c
βi is the density kernel. The latter158
is related to the density matrix formulation of Hernandez159
and Gillan33, since – as follows from Eq. (3) –160
F (r, r′) =∑
i
fi |Ψi(r)〉 〈Ψi(r′)|
=∑
α,β
|φα(r)〉Kαβ 〈φβ(r
′)| .(5)
Thus the density kernel is the representation of the den-161
sity matrix in the support function basis. We choose to162
have real support functions and thus from now on we will163
neglect the complex notation for this quantity.164
The density matrix decays exponentially with respect165
to the distance |r − r′| for systems with a finite gap or166
for metals at finite temperature14–20. In these cases it167
can therefore be represented by strictly localized basis168
functions. A natural and exact choice for these would be169
the maximally localized Wannier functions which have170
the same exponential decay34. Of course, these Wannier171
functions are not known beforehand. Therefore, in our172
case, the contracted basis functions are constructed in173
3
situ during the self-consistency cycle and are expected174
to reach a quality similar to that of the exact Wannier175
functions.176
In the formalism we have presented so far, the KS or-177
bitals have to be optimized by minimizing the total en-178
ergy with respect to the support functions and density179
kernel. For a self-consistent calculation this is equivalent180
to minimizing the band structure energy, i.e.181
EBS =∑
α,β
KαβHαβ , (6)
subject to the orthonormality condition of the KS or-182
bitals,183
〈Ψi |Ψj〉 =∑
α,β
cα∗i Sαβcβj = δij , (7)
where Hαβ = 〈φα | HKS |φβ〉 and Sαβ = 〈φα |φβ〉 are the184
Hamiltonian and overlap matrices of the support func-185
tions, respectively. This is equivalent to imposing the186
idempotency condition on the density kernel Kαβ,187
∑
γ,δ
KαγSγδKδβ = Kαβ , (8)
which can be achieved using the McWeeny purification188
scheme35 or directly imposing the orthogonality con-189
straint on the coefficients cαi190
The algorithm therefore consists of two key compo-191
nents: support function and density kernel optimization.192
The workflow is illustrated in Fig. 1; it consists of a flex-193
ible double loop structure, with the outer loop control-194
ling the overall convergence, and two inner loops which195
optimize the support functions and density kernel, re-196
spectively. The first of these inner loops is done non-197
self-consistently (i.e. with a fixed potential), whereas the198
second one is done self-consistently.199
B. Daubechies wavelets in BigDFT200
BigDFT8 uses the orthogonal least asymmetric201
Daubechies29 family of order 2m = 16, illustrated in202
Fig. 2. These functions have a compact support and203
are smooth, which means that they are also localized204
in Fourier space. This wavelet family is able to exactly205
represent polynomials up to 8th order. Such a basis is206
therefore an optimal choice given that we desire at the207
same time locality and interpolating power. An exhaus-208
tive presentation of the use of wavelets in numerical sim-209
ulations can be found in Ref. 36. A wavelet basis set is210211
generated by the integer translates of the scaling func-212
tions and wavelets, with arguments measured in units of213
the grid spacing h. In three dimensions, a wavelet ba-214
sis set can easily be obtained as the tensor product of215
one-dimensional basis functions, combining wavelets and216
scaling functions along each coordinate of the Cartesian217
grid (see e.g. Ref. 8).218
FIG. 1. Structure of the minimal basis approach: for the ba-sis optimization loop the hybrid scheme can be used insteadof trace or energy minimization, and for the kernel optimiza-tion loop either direct minimization or the Fermi operatorexpansion method can be used in place of diagonalization.
-1.5
-1
-0.5
0
0.5
1
1.5
-6 -4 -2 0 2 4 6 8
x
φ(x)ψ(x)
FIG. 2. Least asymmetric Daubechies wavelet family of order2m = 16; both the scaling function φ(x) and wavelet ψ(x)differ from zero only within the interval [1−m,m].
In a simulation domain, we have three categories of219
grid points: those which are closest to the atoms (“fine220
region”) carry one (three-dimensional) scaling function221
and seven (three-dimensional) wavelets; those which are222
further away from the atoms (“coarse region”) carry only223
one scaling function, corresponding to a resolution which224
is half that of the fine region; and those which are even225
further away (“empty region”) carry neither scaling func-226
tions nor wavelets. The fine region is typically the region227
where chemical bonding takes place, whereas the coarse228
region covers the region where the tails of the wavefunc-229
tions decay smoothly to zero. We therefore have two230
resolution levels whilst maintaining a regularly spaced231
grid.232
A support function φα(r) can be expanded in this233
4
wavelet basis as follows:234
φ(r) =∑
i1,i2,i3
si1,i2,i3ϕi1,i2,i3(r)
+∑
j1,j2,j3
7∑
l=1
d(ℓ)j1,j2,j3
ψ(ℓ)j1,j2,j3
(r), (9)
where ϕj1,j2,j3(r) = ϕ(x − j1)ϕ(y − j2)ϕ(z − j3) is the235
tensor product of three one-dimensional scaling functions236
centered at the grid point (i1, i2, i3), and ψ(ℓ)j1,j2,j3
(r) are237
the seven tensor products containing at least one one-238
dimensional wavelet centered on the grid point (j1, j2, j3).239
The sums over i1, i2, i3 (j1, j2, j3) run over all grid points240
where scaling functions (wavelets) are centered, i.e. all241
the points of the coarse (fine) grid.242
To determine these regions of different resolution, we243
construct two spheres around each atom a; a small one244
with radius Rfa = λf · rfa and a large one with radius245
Rca = λc · rca (Rc
a > Rfa). The values of rfa and rca are246
characteristic for each atom type and are related to the247
covalent and van der Waals radii, whereas λf and λc can248
be specified by the user in order to control the accuracy249
of the calculation. The fine (coarse) region is then given250
by the union of all the small (large) spheres, as shown251
in Fig. 3. Hence in BigDFT the basis set is controlled252
by these three user specified parameters. By reducing h253
and/or increasing λc and λf the computational degrees254
of freedom are incremented, leading to a systematic con-255
vergence of the energy.256
III. LOCALIZATION REGIONS257
Thanks to the nearsightedness principle it is possible to258
define a basis of strictly localized support functions such259
that the KS orbitals given in terms of this contracted ba-260
sis are exactly equivalent to the expression based solely261
on the underlying Daubechies basis. However, as pre-262
sented so far, the support functions φα(r) of Eq. (9) are263
expanded over the entire simulation domain. We want264
them to be strictly localized while still containing various265
resolution levels, as illustrated by Fig. 3, and so we set to266
zero all scaling function and wavelet coefficients which lie267
outside a sphere of radius Rcut around the point Rα on268
which the support function is centered. In general, these269
centers Rα could be anywhere, but we choose them to270
be centered on an atom a and we thus assume from now271
on that Rα = Ra. Consequently we define a localization272
projector L(α), which is written in the Daubechies basis273
space as274
L(α)i1,i2,i3;j1,j2,j3
= δi1j1δi2j2δi3j3θ(Rcut−|R(i1,i2,i3)−Rα|) ,
(10)where θ is the Heaviside function. We use this projector275
to constrain the function |φα〉 to be localized throughout276
the calculation, i.e.277
|φα〉 = L(α)|φα〉 . (11)
FIG. 3. A two level adaptive grid for an alkane: the highresolution grid points are shown with bold points while thelow resolution grid points are shown with smaller points. Alsovisible are three localization regions (red, blue and green) withradii of 3.7 A centered on different atoms in which certainsupport functions will reside. The coarse grid points shownin yellow do not belong to any of the three localization regions.
Clearly, if |φα〉 is localized around Rα and Rcut is large278
enough, L(α) leaves |φα〉 unchanged and no approxima-279
tion is introduced to the KS equation.280
It is important to note that the localization constraint281
of Eq. (11) determines the expression of dφα
dRβoutside the282
localization region of φα. Indeed, as the Daubechies basis283
set is independent of Rα, differentiating Eq. (11) with284
respect to Rβ leads to285
(1− L(α)
)|dφαdRβ
〉 = δαβ∂L(α)
∂Rα|φα〉 . (12)
This result will be used in Appendix C 2 to demonstrate286
that the Pulay-like forces are negligible for a typical cal-287
culation with our approach.288
A. Imposing the localization constraint289
In what follows, we demonstrate that choosing the sup-290
port functions to be orthogonal allows for a more straigh-291
forward application of the localization constraint. Due292
to the orthonormality of the KS orbitals we cannot di-293
rectly minimize the band structure energy (Eq. (6)) with294
respect to the support functions, rather we have to min-295
imize the following functional:296
Ω =∑
α,β
KαβHαβ −∑
i,j
∑
α,β
Λij
(cα∗i c
βj Sαβ − δij
), (13)
with the Lagrange multiplier coefficients Λij determined297
by the relation298
∑
i,j
cα∗i cβj Λij =
∑
ρ,σ
KαρHρσ(S−1)σβ , (14)
5
where (S−1)αβ is the inverse overlap matrix. The gradi-299
ent |gα〉 =∣∣∣ δΩδ〈φα|
⟩is therefore300
|gα〉 =∑
β
KαβHKS |φβ〉 −∑
β,ρ,σ
KαρHσρ(S−1)σβ |φβ〉 .(15)
However, we wish to impose the localization condition301
|φα〉 = L(α)|φα〉 on the support functions and therefore302
the functional to be minimized becomes303
Ω′ = Ω−∑
α
⟨φα
∣∣∣ 1− L(α)∣∣∣ ℓα⟩, (16)
where the components of the vector |ℓα〉 are the Lagrange304
multipliers of this locality constraint. The gradient for305
Ω′,∣∣∣ δΩ′
δ〈φα|
⟩, can therefore be written as306
|g′α〉 = |gα〉 − (1 − L(α)) |ℓα〉 . (17)
Using the stationarity condition 0 = |g′α〉 and combining307
with the fact that 1 − L(α) is a projection operator, i.e.308
(1− L(α))2 = 1− L(α), we have309
(1− L(α)) |ℓα〉 = (1− L(α)) |gα〉 . (18)
Therefore, using Eq. (17),310
|g′α〉 = L(α) |gα〉 . (19)
i.e. the gradient is explicitly localized. This yields the311
following result for the gradient312
|g′α〉 =∑
β,ρ
Kαρ(S1/2) βρ
[L(α)HKS
∣∣∣φβ⟩
−∑
σ
⟨φσ
∣∣∣HKS
∣∣∣ φρ⟩L(α)
∣∣∣φσ⟩]
. (20)
Here the localized gradient is expressed in terms313
of the orthogonalized support functions∣∣∣φα⟩
=314
∑β(S
−1/2) βα |φβ〉. Requiring the support functions to315
be orthogonal i.e. Sαβ = δαβ therefore further simplifies316
the evaluation of the gradient as it no longer becomes317
necessary to calculate S−1 or S1/2. Moreover, it avoids318
the need for distinguishing between covariant and con-319
travariant indices37.320
B. Localization of the Hamiltonian application321
As shown in Ref. 8, the Hamiltonian operator in a322
Daubechies wavelets basis set is defined by a set of convo-323
lution operations, combined with the application of non-324
local pseudopotential projectors. The nature of these325
operations is such that HKS |φβ〉 will have a greater ex-326
tent than |φβ〉. We therefore define a second localization327
operator, L′(β), with a corresponding cutoff radius R′cut,328
such that R′cut is equal to Rcut plus half of the convolu-329
tion filter length times the grid spacing, which in our case330
corresponds to an additional eight grid points. When ap-331
plying the Hamiltonian, we impose332
HKS |φβ〉 = L′(β)HKSL
(β)|φβ〉 . (21)
For both the convolution operations and the nonlo-333
cal pseudopotential applications, this procedure guaran-334
tees that the Hamiltonian application is exact within335
the localization region of L(β). However, the values of336
HKSL(β)|φβ〉 are approximated outside this region due337
to the semilocal nature of the convolutions and the pseu-338
dopotential projectors. This impacts the evaluation of339
the Hamiltonian matrixHαβ for all elements whose local-340
ization regions do not coincide, and thus also affects the341
gradient |gα〉. Nonetheless, we have verified that further342
enlargement of R′cut will have negligible impact on the343
accuracy, while adding additional overheads, see Sec.VI.344
Apart from these technical details, most of the basic345
operations are identical to their implementation in the346
standard BigDFT code8 and are therefore not repeated347
here. The only difference is that these operations are now348
done only in the localization regions (corresponding to349
either L(β) or L′(β)) and not in the entire computational350
volume.351
IV. SELF-CONSISTENT CYCLE352
A. Support function optimization353
As an initial guess for the support functions we use354
atomic orbitals, which are generated by solving the355
atomic Schrodinger equation and therefore possess long356
tails which need to be truncated at the borders of the357
localization regions. If the values at the borders are not358
negligible, the resulting kink will cause the kinetic energy359
to become very large due to the definition of the Lapla-360
cian operator in a wavelet basis set, and so to assure sta-361
bility during the optimization procedure, the localization362
regions would need to be further enlarged. To overcome363
this problem, even for small localization regions, it is ad-364
vantageous to decrease the extent of the atomic orbitals365
before the initial truncation by adding a confining quar-366
tic potential centered on each atom, aα(r−Rα)4, to the367
atomic Schrodinger equation.368
For the first few iterations of the outer loop (Fig. 1)369
we maintain the quartic confining potential of the atomic370
input guess. This implies that the total Hamiltonian be-371
comes dependent on the support function, Hα = HKS +372
aα(r − Rα)4, and we can no longer minimize the band373
structure energy (Eq. (6)) to obtain the support func-374
tions. Instead, we choose to minimize the functional375
minφα
∑
α
〈φα | Hα |φα〉 , (22)
while applying both orthogonality and localization con-376
straints on the support functions, as discussed in Sec-377
tion III A.378
6
Apart from the improved localization, the use of the379
confining potential has yet another advantage. The band380
structure energy (Eq. (6)) is invariant under unitary381
transformations among the support functions if there are382
no localization constraints. This corresponds to some383
zero eigenvalues in the Hessian characterizing the op-384
timization of the support functions. The introduction385
of a localization constraint violates this invariance and386
leads to small but non-zero eigenvalues. The condition387
number, defined as the ratio of the largest and small-388
est (nonzero) eigenvalue of the Hessian, can thus become389
very large, potentially turning the optimization into a390
strongly ill-conditioned problem. On the other hand, if391
the unitary invariance is heavily violated in Eq. (22) by392
the introduction of a strong localization potential, the393
small eigenvalues grow and the condition number im-394
proves as a consequence.395
After a few iterations of the outer loop, the support396
functions are sufficiently localized to continue the opti-397
mization without a confining potential, i.e. by minimizing398
the band structure energy. This procedure will lead to399
highly accurate support functions while still preserving400
locality. As an alternative it is also possible to define401
a so-called “hybrid mode” which combines the two cate-402
gories of support function optimization and thus provides403
a smoother transition between the two. In this case the404
target function is given by405
Ωhy =∑
α
Kαα 〈φα|Hα|φα〉+∑
β 6=α
Kαβ 〈φα|HKS |φβ〉 .
(23)In the beginning a strong confinement is used, making406
this expression similar to the functional of Eq. (22); how-407
ever the confining potential is reduced throughout the408
calculation so that towards the end the strength of the409
confinement is negligible and Eq. (23) reverts to the full410
energy expression. A prescription for reducing the con-411
finement is presented in Appendix A.412
1. Orthogonalization413
The Lagrange multiplier formalism conserves the or-414
thogonormality of the contracted basis only to first or-415
der. An additional explicit orthogonalization has to be416
performed after each update of the contracted basis to re-417
store exact orthogonality. This is done using the Lowdin418
procedure. The calculation of S−1/2, which is required419
in this context, can pose a bottleneck. However, as our420
basis functions are close to orthonormality, the exact cal-421
culation can safely be replaced by a first order Taylor422
approximation.423
It is important to note that the support functions will424
only be nearly orthogonal rather than exactly orthog-425
onal, as exact orthogonality is in general not possible426
for functions exhibiting compact support in a discretized427
space. This near-orthogonality is in contrast with most428
other minimal basis implementations which use fully non-429
orthogonal support functions25,26. The asymptotic decay430
behavior of the orthogonal and non-orthogonal support431
functions is identical. However the prefactor differs and432
leads to a better localization of the non-orthogonal func-433
tions38. However, in practice, we have found that the434
introduction of the orthogonality constraint does not sig-435
nificantly increase the size required for the localization436
regions, provided that a sufficiently strong confining po-437
tential is applied to localize the support functions at the438
start of the calculation.439
In order to counteract the small deviations from idem-440
potency caused by the changing overlap matrix, we purify441
the density kernel during the support function optimiza-442
tion either by directly orthonormalizing the expansion443
coefficients of the KS orbitals or by using the McWeeny444
purification transformation.445
2. Gradient and preconditioning446
The optimization is done via a direct minimization447
scheme or with direct inversion of the iterative subspace448
(DIIS)39, both combined with an efficient precondition-449
ing scheme. The derivation of the gradient of the target450
function Ω with respect to the support functions involves451
some subtleties for the trace and hybrid modes since in452
these cases the Hamiltonian depends explicitly on the453
support function, leading to an asymmetry of the La-454
grange multiplier matrix Λαβ =⟨φα
∣∣∣ δΩδ〈φβ |
⟩. In order455
to correctly derive the gradient expression we follow the456
same guidelines as Ref. 40; assuming nearly orthogonal457
orbitals the final result is given by458
|gα〉 =δΩ
δ 〈φα|−
1
2
∑
β
(Λαβ + Λβα) |φβ〉 . (24)
This is a generalization of the ordinary expression and459
thus also valid if the Hamiltonian does not explicitly de-460
pend on the support function, i.e. for the energy mode.461
As discussed in Section III A the gradient is suitably lo-462
calized once derived, i.e. |gα〉 ← L(α)|gα〉.463
To precondition the gradient |gα〉 we use the standard464
kinetic preconditioning scheme8. To ensure that the pre-465
conditioning does not negatively impact the localization466
of the gradient, we have found that it is important to467
add an extra term to account for the confining potential468
if present. In this case, the expression becomes469
(−1
2∇2 + aα(r−Rα)
4 − εα) |gprecα 〉 = − |gα〉 , (25)
where εα is an approximate eigenvalue. This also has470
the effect of improving the convergence. The inclusion471
of the confining potential adds only a small overhead as472
it can be evaluated via convolutions in the same manner473
as the kinetic energy. Furthermore, the preconditioning474
equations do not need to be solved with high accuracy,475
only approximately.476
7
B. Density kernel optimization477
For the optimization of the density kernel we have im-478
plemented three schemes: diagonalization, direct min-479
imization and the Fermi operator expansion method480
(FOE)41,42. Once the kernel has been updated, we recal-481
culate the charge density via Eq (4); the new density is482
then used to update the potential, with an optional step483
wherein the density is mixed with the previous one in or-484
der to improve convergence. This procedure is repeated485
until the kernel is converged; in practice, we consider this486
convergence to have been reached once the mean differ-487
ence of the density of two consecutive iterations is below488
a given threshold, i.e. ∆ρ < c.489
The direct diagonalization method consists of finding490
the solution of the generalized eigenproblem for a given491
Hamiltonian and overlap matrix. Its implementation492
therefore relies straightforwardly on linear algebra solvers493
and will not be detailed here.494
In the direct minimization approach the band-495
structure energy is minimized subject to the orthogo-496
nality of the support functions. To this end, we ex-497
press the gradient of the Kohn-Sham orbitals, |gi〉 =498
HKS |Ψi〉−∑
j Λij |Ψj〉, in terms of the support functions,499
i.e. |gi〉 =∑
α dαi |φα〉. The d
αi are obtained by solving500
∑
α
Sβαdαi =
∑
α
Hβαcαi −
∑
j
∑
α
Sβαcαj Λji,
Λji =∑
γ,δ
cγ∗j cδiHγδ.
(26)
The coefficients are optimized using this gradient via501
steepest descents or DIIS. Once the gradient has con-502
verged to the required threshold, the density kernel is503
calculated from the coefficients and occupancies.504
In the Fermi operator expansion method, the density505
matrix may be defined as a function of the Hamiltonian506
as F = f(H), where f is the Fermi function. In terms of507
the support functions, this corresponds to an expression508
for the density kernel in terms of the Hamiltonian matrix,509
i.e. K = f(H). The central idea of the FOE24,41,42 is to510
find an expression for f(H) which can be efficiently eval-511
uated numerically. One particularly simple possibility is512
a polynomial expansion; for numerical stability we use513
Chebyshev polynomials43. As will be shown in detail in514
Appendix B, the density kernel can be constructed using515
only matrix vector multiplications thanks to the recur-516
sion formulae for the Chebyshev matrix polynomials.517
1. Suitability of the methods518
All three methods for calculating the density kernel519
(direct diagonalization, direct minimization and FOE)520
yield the same final result, thus the main differences lie521
in their performance, where one of the most important522
points is the performance of the linear algebra. Due to523
the localized nature of the support functions, the overlap,524
Hamiltonian and density kernel matrices are in general525
sparse, with the level and pattern of sparsity depending526
on the localization radii of the support functions and the527
dimensionality of the system in question. We can take528
advantage of this sparsity by storing and using these ma-529
trices in compressed form and indeed this is necessary to530
achieve a fully linear scaling algorithm.531
For diagonalization, exploiting the sparsity is very hard532
due to the lack of efficient parallel solvers for sparse ma-533
trices. The method therefore performs badly for large534
systems due to its cubic scaling, but thanks to the small535
prefactor it can still be useful for smaller systems con-536
sisting of a few hundred atoms.537
For direct minimization the situation is better, since538
both the solution of the linear system of Eq. (26) and the539
orthonormalization can be approximated using Taylor540
expansions for S−1 and S−1/2, respectively. It can also541
be easily parallelized, so that the cubic scaling terms only542
become problematic for systems containing more than a543
few thousand atoms, as demonstrated in section VII (see544
Fig. 7). Indeed, for moderate system sizes, the extra545
overhead associated with the manipulation of sparse ma-546
trices makes dense matrix algebra cheaper, and so many547
physically interesting systems are already considerably548
accelerated.549
Even though the minimization method does not scale550
linearly in its current implementation, there are some sit-551
uations where its use is advantageous. For example, the552
unoccupied states are generally not well represented by553
the contracted basis set following the optimization pro-554
cedure, and for cases where we have only the density555
kernel and not the coefficients, it becomes necessary to556
use another approach to calculate them, such as optimiz-557
ing a second set of minimal basis functions44, which can558
be expensive. However, as with the diagonalization ap-559
proach, the ability to work directly with the coefficients560
makes it possible to optimize the support functions and561
coefficients to accurately represent a few states above the562
Fermi level at the same time as the occupied states, with-563
out significantly impacting the cost.564
The FOE approach leads to linear scaling if the spar-565
sity of the Hamiltonian is exploited and if the build up566
of the density matrix is done within localization regions.567
These localization regions do not need to be identical568
to the localization regions employed for the calculation569
of the contracted basis set – in general they will actu-570
ally be larger. The final result turns out to be relatively571
insensitive to the choice of size for the density matrix572
localization regions, above a sensible minimum value.573
By exploiting this sparsity, we can access system sizes574
of around ten thousand atoms with a moderate use of575
parallel resources.576
C. Parallelization577
Like standard BigDFT we have a multi-level578
MPI/OpenMP parallelization scheme45. The details of579
8
0
5
10
15
20
25
0 500 1000 1500 2000 2500 3000 3500 4000
Speedup
wrt
160
core
s
Number of cores (MPI tasks x 4 OMP threads)
SpeedupIdeal speedup
Amdahl’s law, P=0.97
0
20
40
60
80
100
40 60 120 240 480 640 9600246810121416
Tim
e/effic
iency
(%)
Speedup
Number of MPI tasks
CommsLinAlgConvPotentialDensityOtherSpeedupEfficiency
FIG. 4. Parallel scaling for a water droplet containing 960atoms for between 160 and 3840 cores, with the effective andideal speedup with respect to 160 cores, including a fit of Am-dahl’s law [below], and a breakdown of the calculation timefor selected number of processors [above]. The categories pre-sented are for communications, linear algebra, convolutions,calculation of the potential and density, and miscellaneous.
the MPI parallelization are presented in Appendix D. In580
Fig. 4 we show the effective speedup as a function of the581
number of cores for a large water droplet, keeping the582
number of OpenMP threads at four. We measured the583
parallelization up to 3840 cores, by taking as a reference584
a 160 core run. As can be seen, the effective speedup585
reaches about 92% of the ideal value at 480 cores, de-586
creasing to 62% for 3840 cores. Fitting the data to Am-587
dahl’s law46 shows that at least 97% of the code has been588
parallelized; the true value is higher as the times shown589
include the communications and are relative to 160 cores590
rather than a serial run. Fig. 4 also shows a breakdown of591
the total calculation time into different categories, where592
we see that the communications start to become limiting593
for the highest number of cores, demonstrating that this594
is at the upper bound which is appropriate for this sys-595
tem size. For a larger system, this limit will of course be596
higher.597
V. CALCULATION OF IONIC FORCES598
In a self-consistent KS calculation (i.e. when the charge599
density is derived from the numerical set of wavefunc-600
tions), the forces acting on atom a are given by the nega-601
tive gradient of the band structure energy with respect to602
the atomic positions Ra. The Hellmann-Feynman force,603
given by the expression604
F(HF )a = −
∑
i
fi
⟨Ψi
∣∣∣∣∂H
∂Ra
∣∣∣∣Ψi
⟩, (27)
involves only the functional derivative of the Hamiltonian605
operator. This term is evaluated numerically in the com-606
putational setup used to express the ground state energy.607
As explained in more detail in Appendix C 1, with the cu-608
bic version of BigDFT, only the Hellmann-Feynman term609
contributes to the forces, as the remaining part tends to610
zero in the limit of small grid spacings.611
However, when the KS orbitals are expressed in terms612
of the support functions, there is an additional contribu-613
tion which is not captured by the computational setup.614
As demonstrated in Appendix C 2, it is given by615
F(P )a = −2
∑
α,β
Re
(Kαβ
⟨χβ
∣∣∣∣∂L(α)
∂Ra
∣∣∣∣φα⟩)
, (28)
where616
|χα〉 = HKS |φα〉 −∑
j
∑
ρ,σ
cρjεjc
σ∗j Sσα |φρ〉 (29)
is the residual vector of the support function |φα〉, which617
is related to the support function gradient |gα〉 (see618
Eq. (15)). This term can be considered as the equivalent619
of a Pulay contribution to the ionic forces, arising from620
the explicit dependence of the localization operators on621
the atomic positions.622
The vector ∂L(α)
∂Ra|φα〉 only depends on the value of the623
support function on the borders of the localization re-624
gions (Eq. (C12)). Therefore, if the scalar product be-625
tween the residues |χβ〉 and the values of the support626
functions |φα〉 at the boundaries of their localization re-627
gions is smaller than the norm of the residue itself (quan-628
tifying the accuracy of the results), the Pulay term can629
be safely neglected.630
As mentioned in Section IVA, the Laplacian operator631
in the wavelet basis causes the kinetic energy to be high632
if the values at the edges are non-negligible. Such a sit-633
uation is therefore penalized by the energy minimization634
and so the values at the borders are guaranteed to remain635
low. Indeed, we have seen excellent agreement between636
the Hellmann-Feynman term only and the forces calcu-637
lated using standard BigDFT, as will be demonstrated638
in sections VI and VIII A.639
The Hellmann-Feynman force is thus the only relevant640
term even in the contracted basis approach and is given641
by642
F(HF )a = −
∑
α,β
Kαβ
⟨φα
∣∣∣∣∂H
∂Ra
∣∣∣∣φβ⟩. (30)
It is identical to the implementation in standard643
BigDFT8 and so the different terms are not repeated644
here. The only difference is that instead of applying645
the operator to the wavefunctions, we now apply it to646
all overlapping support functions. This can be done effi-647
ciently since each support function overlaps with only a648
few neighbors.649
VI. ACCURACY650
We have applied our minimal basis approach to a num-651
ber of systems, depicted in Fig. 5, in order to demonstrate652
9
both its accuracy and its applicability. All calculations653
have been done using the local density approximation654
(LDA) exchange-correlation functional47 and HGH pseu-655
dopotentials30. In addition, we have used free bound-656
ary conditions, avoiding the need for the supercell ap-657
proximation. The values of the wavelet basis parame-658
ters for the different systems, as well as the localization659
radii, were selected in order to achieve accuracies better660
than 1meV/atom. This corresponded to values between661
0.13 A and 0.20 A for h, 5.0 and 7.0 for λc and 7.0 and662
8.0 for λf . Unless otherwise stated, we have used the663
direct minimization scheme for the density kernel opti-664
mization. For hydrogen atoms one basis function was665
used per atom, whereas for all other elements four basis666
functions were used per atom, except where otherwise667
stated.668
A. Benchmark systems669
We demonstrate excellent agreement with the tradi-670
tional cubic scaling method for both energy and forces671
– of the order of 1meV/atom for the energy and a672
few meV/A for the forces, as shown in Tab. I. We673
also demonstrate systematic convergence of the total en-674
ergy and forces with respect to localization radius for a675
molecule of C60, where the largest localization regions676
are close to the total system size, as depicted in Fig. 6.677
For all these systems the level of accuracy achieved for678
the forces is of the same order as that of the cubic code679
using the Hellmann-Feynman term only.680
B. Silicon defect energy681
In order to demonstrate the accuracy of our method682
for a practical application we calculated the energy of a683
vacancy defect in a hydrogen terminated silicon cluster684
containing 291 atoms, shown in Fig. 5(f). As shown in685
Tab. II, the difference in defect energy between the cu-686
bic reference calculation and the linear version is 129meV687
using 4 support functions per Si atom and 1 support func-688
tion per hydrogen atom. Even more accurate results can689
be achieved by increasing the number of support func-690
tions per Si atom to 9, which reduces the error to 12meV.691
Increasing the localization radii does not further improve692
the accuracy as the result is already within the noise693
level. To achieve these results, support functions were694
optimized using the hybrid mode and the density kernel695
was optimized using the FOE approach with a cutoff of696
7.94 A for the kernel construction.697
C. Consistency of energies and forces698
Following the discussion in Section V, we have calcu-699
lated the average value of the support functions on the700
borders of their localization regions for various systems701
and found this to be at least three orders of magnitude702
smaller than the norm of the support function residue703
(defined in Eq. (C1)). This is in line with our expec-704
tations, as discussed in Section V, and implies that the705
Pulay terms should be negligible compared to the error706
introduced to the Hellmann-Feynman term due to the707
localization constraint. Indeed, this agrees with the cal-708
culated forces for the systems presented thus far. To709
further verify that the Pulay term can be neglected and710
to quantify the different sources of errors, we have also711
checked that the calculated forces are consistent with the712
energy, i.e. that they correspond to its negative deriva-713
tive. To this end, initial and final configurations R(a)714
and R(b) of a given system were chosen, where R rep-715
resents the atomic positions. Small steps ∆R were then716
taken between R(a) to R(b). If the forces F are correctly717
evaluated we should have718
∆E =
∫ b
a
F(r) · dr ≈
b∑
µ=a
F(Rµ) ·∆Rµ, (31)
where µ labels the intermediate steps between configu-719
rations R(a) and R(b). This approximation can be com-720
pared with the exact value obtained by directly calculat-721
ing the energy differences, i.e. E(R(b))−E(R(a)). These722
two values should agree with each other up to the noise723
level of the calculation.724
To analyze the different terms contributing to the noise725
in the forces, we use a combination of the hybrid and726
FOE methods with five progressive setups which give an727
estimate of the magnitude of the various error sources:728
1. Using the cubic scaling scheme where all orbitals729
can extend over the full simulation cell.730
2. Using the minimal basis approach but without lo-731
calization constraints or confining potential.732
3. Applying a confining potential but no localization733
constraints.734
4. Using a finite localization radius of 7.94 A for the735
density kernel, but not for the support functions.736
5. Applying in addition strong localization radii of737
4.76 A to the support functions.738
This test was done for a 92 atom alkane – despite the rel-739
atively small system size the introduction of finite cutoff740
radii for the support functions and the density kernel741
construction has a strong effect since for chain-like struc-742
tures the volume of the localization region is only a small743
fraction of the total computational volume. The results744
are shown in Tab. III, for a step size of ∆Rµ = 0.003 A.745
As expected, without the application of the localization746
constraint, the errors for the minimal basis calculations747
are of the same order of magnitude as the reference cu-748
bic calculation. This is also the case when a finite cutoff749
radius for the construction of the density kernel is in-750
troduced. Once a finite localization is imposed on the751
10
(a)Cinchonidine (C19H22N2O),Rcut =5.29A
(b)Boron cluster48, Rcut =5.82A (c)Alkane,Rcut =5.29A
(d)Water droplet, Rcut =5.29A
(e)C60 (f)Silicon cluster, Rcut =4.76A (g)SiC49, Rcut =5.82A (h)Ladderpolythiophene,Rcut =5.29A
FIG. 5. The different systems studied, where gray denotes carbon, white hydrogen, red oxygen, gold nitrogen, bronze boron,green silicon and blue sulfur. The values used for the support function localization radii are also given.
Num. atomsEnergy (eV) Forces (eV/A)
Min. Basis Cubic (Min. - Cub.)/atom Min. Basis Cubic Av. (Min. - Cub.)
Cinchonidine 44 −4.273 · 103 −4.273 · 103 2.0 · 10−3 2.734 · 10−2 2.772 · 10−2 4.3 · 10−3
Boron cluster 80 −6.141 · 103 −6.141 · 103 2.4 · 10−3 1.305 · 10−2 1.316 · 10−2 4.2 · 10−3
Alkane 257 −1.592 · 104 −1.592 · 104 3.7 · 10−4 3.760 · 10−2 3.767 · 10−2 1.0 · 10−3
Water 450 −7.011 · 104 −7.012 · 104 1.7 · 10−3 3.730 · 10−2 3.730 · 10−2 2.5 · 10−3
TABLE I. Energy and force differences between the minimal basis approach and the standard cubic version of the code. Forthe forces, the root mean squared force is given for each approach, as well as the average difference between each component.
support functions the discrepancy between the energy752
difference and the force integral increases by an order of753
magnitude, however it remains small, agreeing with our754
previous observations about the Pulay forces for large755
enough localization radii.756
VII. SCALING AND CROSSOVER POINT757
We have applied the minimal basis method to alka-758
nes, applying both the direct minimization and FOE ap-759
proaches for the kernel optimization. The time taken per760
iteration is compared with the traditional cubic-scaling761
version in Fig. 7. The number of iterations required to762
reach convergence is similar for the cubic and minimal763
basis approaches and is approximately constant across764
system sizes, so that the total time taken shows similar765
behaviour. The results clearly demonstrate the improved766
scaling of the method, with a crossover point for the to-767
tal time at around 150 atoms. This will of course be768
system dependent – the chain like nature of the alkanes769
makes them a particularly favorable system for the min-770
imal basis approach. We also plot cubic polynomials for771
the timing data; whilst the cubic scaling approach only772
has a very small cubic term, both this and the quadratic773
term are noticeably reduced for both the FOE and di-774
rect minimization approaches. Indeed, the FOE method775
is predominantly linear scaling, compared to direct min-776
11
1e−05
1e−04
1e−03
1e−02
1e−01
1e+00
2 3 4 5 6 71e−04
1e−03
1e−02
1e−01
1e+00(E
−Ecu
bic)
/atom
(eV)
Av.
(F−Fcu
bic)
(eV/Å
)
Localization radius (Å)
EnergyForces
FIG. 6. Convergence of the total energy and forces with re-spect to the localization radius as compared to the cubic val-ues for a molecule of C60. For large localization regions theaccuracy of the forces is sufficient for high accuracy geometryoptimizations.
pristine vacancy ∆ ∆-∆cubic
eV eV eV meV
cubic −20674.223 −20563.056 111.167 –
4/1 −20667.556 −20556.518 111.038 129
9/1 −20672.856 −20561.701 111.155 12
TABLE II. Total energies and energy differences for a siliconcluster (Fig. 5(f)) with and without a vacancy defect. Thefirst column shows the number of support functions per Si andH atom respectively and the last column shows the differencein defect energy between the cubic and linear versions.
Setup
Sup.func.
Con
f.pot.
Kαβcutoff
φαcutoff
∆E∫F(r) · dr diff.
1 5.666961 5.667190 −2.3 · 10−4
2 5.666966 5.666999 −3.3 · 10−5
3 5.666958 5.667024 −6.5 · 10−5
4 5.667239 5.667024 2.1 · 10−4
5 5.669992 5.673043 −3.1 · 10−3
TABLE III. Force calculations for the five setups described inthe text, with all quantities given in eV.
imization which has larger quadratic and cubic terms,777
mainly due to the linear algebra, as expected.778
The minimal basis approach also gives considerable779
savings in memory; for the above example the memory780
requirements for the cubic version still prohibit calcula-781
tions on systems bigger than around 2000 atoms for the782
chosen number of processors, whereas for the minimal ba-783
sis method the memory requirements allow calculations784
of up to 4000 atoms using direct minimization, and nearly785
8000 atoms with FOE.786
To take full advantage of the improvements made to787
BigDFT, it is not enough for the time taken per iter-788
ation to scale favorably with respect to system size, it789
is also necessary for the number of iterations needed790
to reach convergence not to increase with system size.791
0
10
20
30
40
50
60
Time/iteration(s)
ax3+ bx
2+ cx:
2.8e-09; 8.4e-06; 4.1e-031.3e-10; 6.1e-07; 3.9e-032.1e-11; 1.2e-11; 3.8e-03
0
2
4
6
8
10
12
0 2000 4000 6000 8000
Memory(GB)
Number of atoms
CubicDminFOE
FIG. 7. Comparison between the time taken per iterationand memory usage for the cubic scaling and minimal basisapproaches using both direct minimization (Dmin) and FOEfor increasing length alkanes, where the time is for the wave-function optimization, neglecting the input guess. The coeffi-cients are shown for the corresponding cubic polynomials. Afixed number of 301 MPI tasks and 8 OpenMP threads wasused.
We have demonstrated such behavior for increasing sized792
randomly generated non-equilibrium water droplets, as793
shown in Fig. 8. The number of iterations required to794
reach a good level of agreement with the cubic scaling795
version of the code remains approximately constant, with796
the fluctuations due to the random noise in the bond797
lengths of the water molecules. Furthermore, the energy798
converges rapidly to a value very close to that obtained799
with the cubic code, as illustrated by the upper panel.800
We have also observed similar convergence behavior for801
other systems, including alkanes, as mentioned above.802
VIII. FLEXIBILITY OF THE MINIMAL BASIS803
FORMALISM804
A. Geometry optimization805
As a further test of the quality of the forces and as806
a demonstration of the flexibility of the minimal basis807
formalism, we have performed a geometry optimization808
for a segment of a SiC nanotube containing 288 atoms,809
depicted in Fig. 5(g). Here we can take advantage of810
the minimal basis formalism by reusing the optimized811
support functions from the previous geometry step as812
an improved input guess, moving them with the atoms813
using an interpolation scheme to account for atomic dis-814
placements which are not multiples of the grid spacing h.815
This has the effect of reducing the number of iterations816
required to converge the support functions for each new817
geometry. In fact, for cases where the atoms have only818
moved a small amount, they will hardly need optimizing819
12
1e-04
1e-03
1e-02
1e-01
1e+00
0 50 100 150 200 250 300 350 400 450
(E-
Ecubic
)/
ato
m(e
V)
Number of atoms
1e-04
1e-03
1e-02
1e-01
1e+00
0 5 10 15 20 25(E-
Ecubic
)/
ato
m(e
V)
Iteration number
60 atoms
FIG. 8. Convergence behavior for water droplets, showing theconvergence with respect to outer loop iteration number forthe system containing 60 atoms [above] and the energy dif-ference with the converged cubic value after certain numbersof iterations [below]. The number of iterations is indicatedby the color used; circles denote iterations with a confiningpotential and squares without.
at all and so substantial savings can be made. A similar820
procedure also exists for the cubic version, but the mini-821
mal basis approach can profit much more because of the822
direct relation between the support function centers and823
the atomic positions.824
We compared the convergence behavior and time taken825
for the minimal basis approach both with and without826
reusing the support functions at each geometry step with827
that for the standard cubic approach, for which the re-828
sults are shown in Fig. 9. It is clear that the Hellmann-829
Feynman forces are sufficiently accurate to optimize the830
structure to the required level – in this case forces of831
below 10−2 eV/A are readily achieved. For this system832
size we are below the crossover point, such that when833
the support functions are reset at each geometry step834
the time taken per step is greater than that for the cu-835
bic approach. However, the reuse of support functions836
results in a significant reduction in the number of steps837
required to fully converge the support functions, and so838
the total time is less than that required for the cubic839
approach. This means that the crossover point will be840
reduced for geometry optimizations or molecular dynam-841
ics calculations, opening up further possibilities for the842
highly accurate study of dynamics of large systems.843
B. Charged systems844
As previously mentioned, the ability to use free bound-845
ary conditions is essential for charged systems. This has846
enabled us to perform calculations of isolated segments847
of ladder polythiophene (LPT) (Fig. 5(h)), initially in a848
neutral state and then adding a charge of plus or minus849
1e−03
1e−02
1e−01
1e+00
RMSF(eV/Å)
0
5
10
Tim
e/s
tep(m
in.)
0
20
40
60
80
100
0 5 10 15
Cum
.tim
e(m
in.)
CubicReset
Reformat0.00
0.05
0.10
0 5 10 15
∆R(Å)
FIG. 9. Geometry optimization for a segment of a SiC nan-otube for the cubic and minimal basis approaches where thesupport functions are regenerated from atomic orbitals at eachgeometry step (“reset”) and where they are reformatted forreuse at the next iteration (“reformat”). The time taken foreach step, cumulative time, force convergence and averagedistance from the final structure optimized using standardBigDFT are plotted for each step of the geometry optimiza-tion.
Q ∆EQ |∆(EQ − E0)|
0 opt. 176 –
−2opt. 147 28
unopt. 292 117
+2opt. 128 47
unopt. 200 24
TABLE IV. Energy differences between the standard cubicversion and the minimal basis approach for absolute ener-gies (∆EQ) and energy differences with the neutral system(|∆(EQ − E0)|) for 63 atom segments of LPT with a netcharge Q. Results are shown both for fully optimized sup-port functions (“opt”) and for support functions reused fromthe neutral calculation (“unopt”). All results are in meV.
two electrons. The support functions from the neutral850
case are also well suited to the charged system so that851
only kernel optimizations are required, which can reduce852
the computational cost by an order of magnitude.853
In Tab. IV we compare the agreement between the min-854
imal basis approach and the standard cubic approach for855
a system containing 63 atoms. We demonstrate an agree-856
ment of the order of 100meV for the energy differences857
for both the fully optimized set of support functions and858
the reuse of the support functions from the neutral sys-859
tem. For the negatively charged calculations we have also860
confirmed that this level of accuracy is maintained up to861
300 atoms, beyond which size the cost of calculations862
with the cubic version of BigDFT increases significantly.863
In order to converge the results obtained with the min-864
imal basis to a good level of accuracy, we used 9 support865
13
functions for carbon and sulfur and 1 per hydrogen. For866
charged systems we have found that the direct minimiza-867
tion method is more stable, as it allows us to update the868
coefficients in smaller steps before updating the kernel869
and therefore density, rather than fully converging them870
before each update.871
We expect such support function reuse to be generally872
applicable for systems where the addition of a charge873
only results in a perturbation of the electronic structure.874
However it may be necessary to optimize a few unoccu-875
pied states (using direct minimization or diagonalization)876
in order to ensure that the contracted basis is sufficiently877
accurate for negatively charged systems.878
IX. CONCLUSION879
We have presented a self-consistent minimal basis ap-880
proach within BigDFT which leads to a reduced scaling881
behavior with system size and allows the treatment of882
larger systems than can be treated with the cubic version;883
for very large systems linear scaling is clearly visible. The884
use of a small set of nearly orthogonal contracted basis885
functions which are optimized in situ in the underlying886
wavelet basis set gives rise to sparse matrices of rela-887
tively small size. For the optimization of these so-called888
support functions we use a confining potential which on889
the one hand helps to keep the support functions strictly890
localized, and on the other hand helps to alleviate the no-891
torious ill-conditioning which is typical of linear scaling892
approaches.893
The standard cubic scaling version of BigDFT has been894
previously demonstrated to give highly accurate results895
and so we use this as a standard of comparison for our896
method. We have demonstrated for a number of different897
systems excellent agreement with the cubic version for898
both energy and forces. In particular, we have demon-899
strated that it is not necessary to include Pulay-like cor-900
rection terms to the atomic forces, thanks to the nature of901
the Laplacian operator in the wavelet basis which ensures902
the support functions remain negligible on the borders903
of the localization regions. In addition, we have shown904
consistent convergence behavior across a range of system905
sizes. From the viewpoint of scaling with the number of906
atoms we have demonstrated linear scaling for the FOE907
method where the linear algebra has been written to ex-908
ploit the sparsity of the matrices.909
Finally, we have highlighted some of the advantages of910
using localized support functions expressed in a wavelet911
basis set. These include the ability to further accelerate912
geometry optimizations by reusing the support functions913
from the previous geometry, and the possibility of achiev-914
ing a good level of accuracy for a charged calculation by915
reusing the support functions from a neutral calculation.916
By directly working in the basis of the support func-917
tions, we can therefore reduce the number of degrees of918
freedom needed to express the KS operators for a tar-919
geted accuracy. Aside from reducing the computational920
overhead, this flexible approach paves the way for fu-921
ture developments, where the contracted basis functions922
can be reused in other situations, including for example923
constrained DFT calculations of large systems. Work is924
ongoing in this direction.925
The authors would like to acknowledge funding926
from the European project MMM@HPC (RI-261594),927
the CEA-NANOSCIENCE BigPOL project, the ANR928
projects SAMSON (ANR-AA08-COSI-015) and NEW-929
CASTLE, and the Swiss CSCS grants s142 and h01.930
CPU time and assistance were provided by CSCS, IDRIS,931
Oak Ridge National Laboratory and Argonne National932
Laboratory.933
Appendix A: Prescription for reducing the934
confinement935
To derive a prescription for reducing the confinement,936
it is assumed that the change in the target function be-937
tween successive iterations of the minimization procedure938
can be approximated to first order by939
∆Ω′(n) =
∑
α
〈gα(n)|∆φα(n)〉 , (A1)
where |∆φα(n)〉 is the change in support function between940
iterations n and n+1, i.e. |∆φα(n)〉 = |φα(n+1)〉−|φ
α(n)〉, and941
|gα(n)〉 is the gradient of the target function with respect to942
the support function at iteration n. Due to the influence943
of the confinement and the localization regions, the gra-944
dient of the support functions and thus ∆Ω′(n) will not go945
down to zero. However, the actual change in the target946
function, ∆Ω(n), will at some point go to zero, meaning947
that further optimization becomes impossible for the lo-948
calization region and confining potential currently used.949
In this case, the only way to further minimize the target950
function is to decrease the confining potential. Therefore951
at each step of the minimization the ratio between the952
actual and estimated decreases in the target function is953
determined:954
κ =∆Ω(n)
∆Ω′(n)
. (A2)
This value is then used to update the confinement pref-955
actor, aα, at the start of the following support function956
optimization loop, via957
anewα = κaoldα . (A3)
If κ is of the order of one, this implies there is still some958
scope for optimizing the support functions using the cur-959
rent confining potential and it should not be updated. If,960
on the other hand, κ is much smaller, it will hardly be961
possible to further improve the support functions and so962
the magnitude of the confining potential should be de-963
creased. In this way one gets a smooth transformation964
from the hybrid expression to the energy expression.965
14
Appendix B: Fermi operator expansion966
In the FOE method the density kernel is given as a967
sum of Chebyshev polynomials. Since these polynomials968
are only defined in the interval [−1, 1], it is necessary to969
shift and scale the Hamiltonian such that its eigenvalue970
spectrum lies within this interval. If εmin and εmax are971
the smallest and largest eigenvalues that would result972
from diagonalizing the Hamiltonian matrix according to973
Hci = εiSci, then the scaled Hamiltonian, H, has to be974
built using975
H = σ(H− τS), (B1)
with976
σ =2
εmax − εmin, τ =
εmin + εmax
2. (B2)
Now the density kernel can be calculated according to977
K ≈ p(H′) =c0
2I+
npl∑
i=1
ciTi(H′) (B3)
with978
H′ = S−1/2HS−1/2,
H = σ(H− τS),
σ =2
εmax − εmin, τ =
εmin + εmax
2.
(B4)
where I is the identity matrix, Ti the Chebyshev polyno-979
mial of order i, S the support function overlap matrix and980
S−1/2 is calculated using a first order Taylor expansion.981
To determine the expansion coefficients ci, one has to982
recall that the density matrix of Eq. (5) is a projection983
operator onto the occupied subspace of the KS orbitals:984
〈ψi|F |ψj〉 = f(εj)δij . (B5)
Since F and H have the same eigenfunctions, one can985
express the polynomial p(H) in the same way, leading to986
〈ψi|p(H)|ψj〉 = p(εj)δij (B6)
with987
p(ε) =c0
2+
npl∑
i=1
ciTi(ε). (B7)
By comparing Eqs. (B5) and (B6) it becomes clear that988
the polynomial expansion p(ε) has to approximate the989
Fermi function f(ε) in the interval [−1, 1]. Thus the coef-990
ficients ci are simply given by the expansion of the Fermi991
function in terms of the Chebyshev polynomials. The992
time for this step is negligible compared to the other op-993
erations related to the FOE. However, in practice it turns994
out that it is advantageous to replace the Fermi function995
by996
f(ε) =1
2
[1− erf
(ε− µ
∆ε
)], (B8)
since it approaches the limits 0 and 1 faster as one goes997
away from the chemical potential. ∆ε is typically a frac-998
tion of the band gap.999
The last step is to evaluate the Chebyshev polynomials1000
and to build the density kernel. If the lth column of the1001
Chebyshev matrix T is denoted by tl, then these vectors1002
fulfill the recursion relation1003
t0l = el,
t1l = H′el,
tj+1l = 2H′t
jl − t
j−1l ,
(B9)
where el is the lth column of the identity matrix. The lth1004
column of the density kernel, denoted by kl, is then given1005
by the linear combination of all the columns tl according1006
to Eq. (B3), i.e.1007
kl =c0
2t0l +
npl∑
i=1
citil . (B10)
This demonstrates that the density kernel can be con-1008
structed using only matrix vector multiplications.1009
Since the correct value of the Fermi energy is initially1010
unknown, this procedure has to be repeated until the1011
correct value has been found, so that Tr(KS) is equal1012
to the number of electrons in the system. The band-1013
structure energy can then be calculated by reversing the1014
scaling and shifting operations:1015
EBS =Tr(KH)
σ+ τTr(KS). (B11)
Appendix C: Pulay forces1016
1. The traditional cubic approach1017
Numerically, the set of |Ψi〉 is expressed in a finite ba-1018
sis set. This means that the action of HKS can in prin-1019
ciple lie outside the span of the |Ψi〉. Let us suppose1020
that the KS Hamiltonian and orbitals are expressed in1021
a basis set which is complete enough to describe them1022
within a targeted accuracy. For the Daubechies basis1023
in the traditional BigDFT approach, this happens when1024
the grid spacing h is such as to describe the PSP and1025
orbital oscillations, and the radii λc,f such as to contain1026
the decreasing tails of the wavefunctions. This situation1027
indeed corresponds to the traditional setup of a BigDFT1028
run. We can therefore define a residual function1029
|χi〉 = HKS |Ψi〉 − ǫi |Ψi〉 , (C1)
which is of course zero when the numerical KS orbital1030
is the exact KS orbital. By definition 〈Ψj |χi〉 = 0 ∀i, j.1031
The norm of this vector, once projected in the basis set1032
used to express |Ψi〉, is often used as a convergence crite-1033
rion for the ground state energy.1034
Even though the basis set is finite, the orthog-1035
onality of the KS orbitals holds exactly, implying1036
15
Re(⟨Ψi
∣∣∣ dΨi
dRa
⟩)= 0. It is thus easy to show that the1037
numerical atomic forces are defined as follows:1038
−dEBS
dRa=−
∑
i
〈Ψi|∂HKS
∂Ra|Ψi〉
− 2∑
i
Re
(⟨χi
∣∣∣∣dΨi
dRa
⟩), (C2)
where the first term of the right hand side of the above1039
equation is the Hellmann-Feynman contribution to the1040
forces. The norm of |χi〉 (Eq. (C1)) can be reduced within1041
the same basis set to meet this targeted accuracy. There-1042
fore the projection of∣∣∣ dΨi
dRa
⟩onto the basis set used for1043
the calculation can be safely neglected as it is associ-1044
ated with the same numerical precision. Consequently,1045
the atomic forces can be evaluated by the Hellmann-1046
Feynman term only as the remaining part is proportional1047
to the desired accuracy.1048
2. The minimal basis approach1049
As mentioned in the main text, when the KS orbitals1050
are expressed in terms of the support functions, an addi-1051
tional Pulay-like term should in principle be taken into1052
account. To demonstrate this, we define – in analogy1053
to Eq. (C1) – the support function residue |χα〉, which1054
becomes, using the identity Hρσ =∑
j cρjεjc
σ∗j ,1055
|χα〉 = HKS |χα〉 −
(∑
ρ,σ
|φρ〉 Hρσ 〈φσ|
)|φα〉
= HKS |φα〉 −∑
j
∑
ρ,σ
cρjεjc
σ∗j Sσα |φρ〉 .
(C3)
Next, inserting the definition of χi (Eq. (C1)) into the1056
non-Hellmann-Feynman contribution of Eq. (C2) and us-1057
ing the relation Re(⟨Ψi
∣∣∣ dΨi
dRa
⟩)= 0 one obtains1058
Fa − F(HF )a = −2
∑
i
Re
(⟨Ψi
∣∣∣∣HKS
∣∣∣∣dΨi
dRa
⟩). (C4)
Expanding the KS orbitals in terms of the support func-1059
tions, using the relation Hαβ =∑
j
∑ρ,σ εjc
ρj c
σ∗j SαρSσβ1060
and the orthonormality of the KS orbitals, we can write1061
Fa − F(HF )a = −2
∑
i
∑
α,β
Re
(cα∗i c
βi
⟨φα
∣∣∣∣HKS
∣∣∣∣dφβdRa
⟩)
− 2∑
i
∑
α,β
Re
(cα∗i
dcβidRa
〈φα | HKS |φβ〉
)
= −2∑
i
∑
α,β
Re
(cα∗i c
βi
⟨φα
∣∣∣∣HKS
∣∣∣∣dφβdRa
⟩)
− 2∑
i
∑
β,σ
Re
(dcβidRa
εicσ∗i Sσβ
).
(C5)From the orthonormality of the KS orbitals one can de-1062
rive the relation1063
2∑
α,β
Re
(dcαidRa
cβ∗i Sαβ
)= −
∑
α,β
cα∗i cβi
dSαβ
dRa. (C6)
Inserting this into Eq. (C5) yields1064
Fa − F(HF )a = −2
∑
i
∑
α,β
Re
(cα∗i c
βi
⟨φα
∣∣∣∣HKS
∣∣∣∣dφβdRa
⟩)
+∑
i
∑
β,σ
Re
(cβi c
σ∗i
dSσβ
dRaεi
).
(C7)Again using the KS orthonormality condition, we can1065
write1066
Fa − F(HF )a = −2
∑
i
∑
α,β
Re
(cα∗i c
βi
⟨φα
∣∣∣∣HKS
∣∣∣∣dφβdRa
⟩)+∑
i,j
∑
α,β,ρ,σ
Re
(cα∗i c
βi c
σ∗j εic
ρjSαρ
dSσβ
dRa
)
= −2∑
α,β
Re
(Kβα
⟨φα
∣∣∣∣HKS
∣∣∣∣dφβdRa
⟩)+ 2
∑
j
∑
α,β,ρ,σ
Re
(Kβαcσ∗j εjc
ρjSαρ
⟨φσ
∣∣∣∣dφβRa
⟩),
(C8)
which becomes in terms of the support function residue1067
of Eq.(C3)1068
Fa − F(HF )a = −2
∑
α,β
Re
(Kβα
⟨χα
∣∣∣∣dφβdRa
⟩). (C9)
This result contains Eq. (C2) when no localization pro-1069
jectors are applied to the support function. Therefore the1070
only term of the forces which cannot be captured within1071
the localization regions is the part which is projected1072
outside. The extra Pulay term due to the localization1073
constraint is therefore1074
F(P )a = −2
∑
α,β
Re
(Kβα
⟨χα
∣∣∣∣ (1− L(β))
∣∣∣∣dφβdRa
⟩).
(C10)
16
Using Eq. (12), we can show1075
F(P )a = −2
∑
α,β
Re
(Kβα
⟨χα
∣∣∣∣∂L(β)
∂Ra
∣∣∣∣φβ⟩)
. (C11)
When the localization regions are atom-centered, the1076
derivative of the projector L(a) (as defined in Eq. (11))1077
can be evaluated analytically in the underlying basis set1078
and is given by1079
∂L(α)
∂Rβ i1,i2,i3;j1,j2,j3
= δαβδi1j1δi2j2δi3j3R(i1,i2,i3) −Rα
Rcut
× δ(Rcut − |R(i1,i2,i3) −Rα|) .(C12)
This demonstrates that the Pulay term is only associated1080
with the value of the support functions at the border of1081
the localization regions.1082
Appendix D: Parallelization1083
It is a natural choice to divide the support functions1084
between MPI tasks so that each one handles only a sub-1085
set of support functions. For some operations these can1086
be treated independently but for others, such as the cal-1087
culation of scalar products between overlapping support1088
functions needed to build the overlap and Hamiltonian1089
matrices, communication of support functions between1090
MPI tasks is required. One could directly exchange in a1091
point-to-point fashion the parts of the support functions1092
which overlap with each other, so that the scalar products1093
can be calculated locally on each task. Although concep-1094
tually straightforward, this has severe drawbacks. Firstly1095
the amount of data being communicated is tremendous1096
since the support functions generally have quite a notable1097
overlap. This also results in a very poor ratio between1098
computation and communication – in the extreme case1099
where each task handles only one support function, each1100
communicated element is only used for one operation.1101
Secondly there can be enormous load imbalancing for free1102
boundary conditions as support functions in the center1103
of the system usally have more neighboring support func-1104
tions than those near the edges. Finally, the data is split1105
into a large number of small messages, which could result1106
in a large overhead due to the latency of the network.1107
We therefore use a different approach, which requires1108
a so-called “transposed” rather than “direct” arrange-1109
ment of data. In this layout the simulation cell is parti-1110
tioned among MPI tasks and the support functions are1111
distributed to the various tasks such that each one can1112
calculate a partial overlap matrix for a given region of1113
the cell. Each task therefore has to receive those parts of1114
all support functions which extend into its region. The1115
partial matrices are then summed to build the full over-1116
lap matrix using MPI Allreduce. This partitioning of the1117
cell is done such that the load balancing among the MPI1118
tasks is optimal, which in general does not correspond1119
to a naive uniform distribution of the simulation cell. To1120
FIG. 10. An example depicting four support functions (Ro-man numerals) in a system consisting of four grid points (Ara-bic numerals) [above]. The support functions are constructedsuch that each extends over two grid points. The various datalayouts are also illustrated [below]: the direct layout whereeach MPI task has all the data for certain support functions[left]; the naive transposed layout where each MPI task hasall the data for some grid points [centre]; and the optimizedtransposed layout which is similar to the previous case, butwith an optimal load balancing [right].
FIG. 11. Illustration of the transposition process for the sys-tem shown in Fig. 10. In step a, the data is rearranged locallyon each MPI task, after which it is communicated using a sin-gle collective call (MPI Alltoallv), as shown in step b. Finallyin step c it is again rearranged locally to reach the final layout.
determine the optimal layout a weight is assigned to each1121
grid point, given by m2, where m is the number of sup-1122
port functions touching it (if symmetry can be exploited1123
the weight should rather be m(m+1)2 ); the total weight1124
(i.e. the sum of all partial weights) is then divided among1125
all MPI tasks as evenly as possible. In Fig. 10 this proce-1126
dure is illustrated with a toy example, where in the upper1127
part the support functions and their overlaps are shown1128
and in the lower part the resulting direct and transposed1129
(both naive and optimal) data layouts are given.1130
In addition to the better load balancing this approach1131
has the advantage that considerably less data has to be1132
communicated – since the transposed layout is just a re-1133
distribution of the standard layout, the total amount of1134
17
data that is communicated is equal to the total size of1135
all support functions, whereas in the point-to-point ap-1136
proach, the same data is often sent to multiple processes.1137
Furthermore, the communication can be done more ef-1138
ficiently: after some local rearrangement of the data for1139
each MPI task, it can be communicated with a single MPI1140
call (MPI Alltoallv) – in practice there are two calls since1141
the coarse and fine parts are handled separately. After1142
the data has been received some local rearrangement is1143
again required to reach the correct layout. These three1144
steps – local rearrangement, communication and further1145
local rearrangement – are illustrated in Fig. 11. Due1146
to the latency of the network, two MPI calls will likely1147
be more efficient than the very large number of small1148
messages that have to be sent for the point-to-point ap-1149
proach.1150
For the calculation of the charge density, which is for-1151
mally identical to the calculation of scalar products, a1152
similar approach is used. Since these two operations are1153
the most important ones from the viewpoint of commu-1154
nication and parallelization, this results in an excellent1155
scaling with respect to the number of cores.1156
1 P. Hohenberg and W. Kohn, Phys. Rev. 136, B864 (1964)1157
2 W. Kohn and L. J. Sham, Phys. Rev. 140, A1133 (1965)1158
3 L. Hedin and S. Lundqvist (Academic Press, 1970) pp. 1 –1159
1811160
4 X. Gonze, B. Amadon, P.-M. Anglade, J.-M. Beuken,1161
F. Bottin, P. Boulanger, F. Bruneval, D. Caliste, R. Cara-1162
cas, M. Cote, T. Deutsch, L. Genovese, P. Ghosez, M. Gi-1163
antomassi, S. Goedecker, D. Hamann, P. Hermet, F. Jol-1164
let, G. Jomard, S. Leroux, M. Mancini, S. Mazevet,1165
M. Oliveira, G. Onida, Y. Pouillon, T. Rangel, G.-M. Rig-1166
nanese, D. Sangalli, R. Shaltaf, M. Torrent, M. Verstraete,1167
G. Zerah, and J. Zwanziger, Comput. Phys. Commun.1168
180, 2582 (2009)1169
5 P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car,1170
C. Cavazzoni, D. Ceresoli, G. L. Chiarotti, M. Cococcioni,1171
I. Dabo, A. D. Corso, S. de Gironcoli, S. Fabris, G. Fratesi,1172
R. Gebauer, U. Gerstmann, C. Gougoussis, A. Kokalj,1173
M. Lazzeri, L. Martin-Samos, N. Marzari, F. Mauri,1174
R. Mazzarello, S. Paolini, A. Pasquarello, L. Paulatto,1175
C. Sbraccia, S. Scandolo, G. Sclauzero, A. P. Seitsonen,1176
A. Smogunov, P. Umari, and R. M. Wentzcovitch, J. Phys.:1177
Condens. Matter 21, 395502 (2009)1178
6 M. D. Segall, P. J. D. Lindan, M. J. Probert, C. J. Pickard,1179
P. J. Hasnip, S. J. Clark, and M. C. Payne, J. Phys.: Con-1180
dens. Matter 14, 2717 (2002)1181
7 J. E. Pask and P. A. Sterne, Modell. Simul. Mater. Sci.1182
Eng. 13, R71 (2005)1183
8 L. Genovese, A. Neelov, S. Goedecker, T. Deutsch,1184
S. A. Ghasemi, A. Willand, D. Caliste, O. Zilberberg,1185
M. Rayson, A. Bergman, and R. Schneider, J. Chem. Phys.1186
129, 014109 (2008)1187
9 M. Valiev, E. Bylaska, N. Govind, K. Kowalski,1188
T. Straatsma, H. V. Dam, D. Wang, J. Nieplocha, E. Apra,1189
T. Windus, and W. de Jong, Comput. Phys. Commun.1190
181, 1477 (2010)1191
10 V. Blum, R. Gehrke, F. Hanke, P. Havu, V. Havu, X. Ren,1192
K. Reuter, and M. Scheffler, Comput. Phys. Commun.1193
180, 2175 (2009)1194
11 W. Kohn, Phys. Rev. 133, A171 (1964)1195
12 W. Kohn, Phys. Rev. Lett. 76, 3168 (1996)1196
13 S. Goedecker, Phys. Rev. B 58, 3501 (1998)1197
14 J. D. Cloizeaux, Phys. Rev. 135, A685 (1964)1198
15 J. D. Cloizeaux, Phys. Rev. 135, A698 (1964)1199
16 W. Kohn, Phys. Rev. 115, 809 (1959)1200
17 R. Baer and M. Head-Gordon, Phys. Rev. Lett. 79, 39621201
(1997)1202
18 S. Ismail-Beigi and T. A. Arias, Phys. Rev. Lett. 82, 21271203
(1999)1204
19 S. Goedecker, Phys. Rev. B 58, 3501 (1998)1205
20 L. He and D. Vanderbilt, Phys. Rev. Lett. 86, 5341 (2001)1206
21 M. Elstner, D. Porezag, G. Jungnickel, J. Elsner,1207
M. Haugk, T. Frauenheim, S. Suhai, and G. Seifert,1208
Phys. Rev. B 58, 7260 (1998)1209
22 B. Aradi, B. Hourahine, and T. Frauenheim,1210
J. Phys. Chem. A 111, 5678 (2007)1211
23 D. R. Bowler and T. Miyazaki, Rep. on Prog. in Phys. 75,1212
036503 (2012)1213
24 S. Goedecker, Rev. Mod. Phys. 71, 1085 (1999)1214
25 C.-K. Skylaris, P. D. Haynes, A. A. Mostofi, and M. C.1215
Payne, J. Chem. Phys. 122, 084119 (2005)1216
26 D. R. Bowler and T. Miyazaki, J. Phys.: Condens. Matter1217
22, 074207 (2010)1218
27 J. VandeVondele, M. Krack, F. Mohamed, M. Parrinello,1219
T. Chassaing, and J. Hutter, Comput. Phys. Commun.1220
167, 103 (2005)1221
28 J. M. Soler, E. Artacho, J. D. Gale, A. Garcıa, J. Junquera,1222
P. Ordejon, and D. Sanchez-Portal, J. Phys.: Condens.1223
Matter 14, 2745 (2002)1224
29 I. Daubechies, Ten Lectures on Wavelets (SIAM, 1992)1225
30 C. Hartwigsen, S. Goedecker, and J. Hutter, Phys. Rev. B1226
58, 3641 (1998)1227
31 M. Krack, Theor. Chem. Acc. 114, 145 (2005)1228
32 A. Willand, Y. O. Kvashnin, L. Genovese, A. Vazquez-1229
Mayagoitia, A. K. Deb, A. Sadeghi, T. Deutsch, and1230
S. Goedecker, J. Chem. Phys. 138, 104109 (2013)1231
33 E. Hernandez and M. J. Gillan, Phys. Rev. B 51, 101571232
(1995)1233
34 N. Marzari and D. Vanderbilt, Phys. Rev. B 56, 128471234
(1997)1235
35 R. McWeeny, Rev. Mod. Phys. 32, 335 (1960)1236
36 S. Goedecker, Wavelets and Their Application: For the So-1237
lution of Partial Differential Equations in Physics (Presses1238
polytechniques et universitaires romandes, 1998)1239
37 A. Ruiz-Serrano and C.-K. Skylaris, J. Chem. Phys. 139,1240
164110 (2013)1241
38 P. W. Anderson, Phys. Rev. Lett. 21, 13 (1968)1242
39 P. Pulay, Chem. Phys. Lett. 73, 393 (1980)1243
40 S. Goedecker and C. J. Umrigar, Phys. Rev. A 55, 17651244
(1997)1245
41 S. Goedecker and L. Colombo, Phys. Rev. Lett. 73, 1221246
(1994)1247
42 S. Goedecker and M. Teter, Phys. Rev. B 51, 9455 (1995)1248
43 W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P.1249
Flannery, Numerical Recipes 3rd Edition: The Art of Sci-1250
18
entific Computing, 3rd ed. (Cambridge University Press,1251
New York, NY, USA, 2007)1252
44 L. E. Ratcliff, N. D. M. Hine, and P. D. Haynes,1253
Phys. Rev. B 84, 165131 (2011)1254
45 L. Genovese, B. Videau, M. Ospici, T. Deutsch,1255
S. Goedecker, and J.-F. Mehaut, C. R. Mec. 339, 1491256
(2011)1257
46 G. M. Amdahl, IEEE Solid-State Circuits Newsletter 12,1258
19 (2007)1259
47 D. M. Ceperley and B. J. Alder, Phys. Rev. Lett. 45, 5661260
(1980)1261
48 S. De, A. Willand, M. Amsler, P. Pochet, L. Genovese, and1262
S. Goedecker, Phys. Rev. Lett. 106, 225502 (2011)1263
49 P. Pochet, L. Genovese, D. Caliste, I. Rousseau,1264
S. Goedecker, and T. Deutsch, Phys. Rev. B 82, 0354311265
(2010)1266
Top Related