SCF method for metals
Minimization
In
the Mauri and Grassmann algorithm, one is finding all of the bands (occupied
subspace) at once by minimizing
Tr[Y†HY].
(1)

This
is fine for insulators and semiconductors but runs into problems for metals as
shown in Fig 1. For insulators and semiconductors there are no unoccupied
bands. All bands contribute equally to the energy. For metals all bands do not
contribute equally. The reason is that metals there is no gap (or at least a
very small one) in the bandstructure. So at 0 temperature, H is ill-conditioned
because there is no gap. This solution is partially occupy the states near the
Fermi level in such a fashion as to keep the total number of electrons
constant. This does remove the ill-conditioning, but only if the occupations
are implemented properly.
In (1), each band is equally weighted;
therefore, the attempt at minimizing (1) will spend effort updating bands that
contribute very little to the overall energy and potential. So if (1) is
implemented for metals, we have problems. In the first few SCF cycles the
minimization is fine as all bands need improvement and any updates help. After
a few SCF cycles though the lower energy bands are nearly converged. At this
point the partially occupied bands are not converged though. The algorithm then
in an attempt to minimize the expectation value of H with the bands can cause
the lower bands to become worse. If all bands were equally occupied this would
still bring a lower energy. Since these bands contribute very little to the
energy and potential, they corrupt the lower bands without bringing any
significant benefit in improving themselves.
The solution (Grassmann_metal algorithm)
is to minimize
Tr[FY†HY]. (2)
Here F is the matrix describing how the bands are occupied. If Yis the eigenvectors than F is diagonal and corresponds to the values derived from for example Fermi-Dirac or Gaussian smearing. Now the corresponding importance of each band is taken into account. An attempt is made to update the partially occupied bands, but only in so much as they help to lower the energy.
This
is simply accomplished by scaling the gradient by F. So we have
G=(HY - YY†HY)F.
(3)
We also
have to change the definition of our inner products. These formulas can be seen
as a preconditioning of G.
The
inner product is now
(G•G)
= Tr[ (HY - YY†HY)† (HY - YY†HY)F ]
More
comparisons of these methods can be seen here

One can see the poor performance starting around 1e-07. A simple fix of giving an occupation of one for every occupation of .75 and above. This helps some of the trouble near the end and gives more monotonic behavior.

Looking at several values and comparing to .75, shows that .75 seems the best value.
A small improvement can still be made for the flattening out still visible at the end.
The solution is to slowly give a full occupation to all bands within the minimization process. If we include the occupations too fast, then the minimization suffers by emphasizing these upper bands too much. At the final stages if too few of the bands are fully occupied then the convergence flattens out as these upper bands are being updated extremely slow. We use several stages based on the energy difference between different SCF stages to increment the # of bands to full occupation.
An important element in the algorithm is a check that Tr[FH] decreases. If this does not decrease with the initial trial step (done by calculating the Hessian matrix element in the search direction) then a quadratic approximation is used. This becomes useful in the last few steps where the trial step suffers from the upper bands being fully occupied.
