, which contains
55 atoms in a triclinic unit cell, is already highly complex, and represents a formidable
computational challenge. Calculations have also been begun for the monoclinic
, which has a unit
cell twice as large. The size of the (electronic orbital) data for the latter system
has driven the calculation from the Cray C-90 to the Thinking Machines CM-5 even more
than the CPU-time requirements.
Instead, I will here discuss work done within our group by R. Benedek, L.H.Yang, A.P.Smith, and M. Minkoff on organic superconductors. The science is exciting but requires that resources be pushed hard, especially in the area of memory availability. These calculations deal with very large internal datasets which modify traditional expectations focussed on the power of the processor. Consider that a MIPS R4000 based machine runs at about 6% the speed of a Cray 2 for these problems. For this problem, one can exploit virtual memory without undue disk thrashing if about a quarter of the problem can be kept in real memory. Thus, one achieves about 1/30 a Cray 2 AT FULL MEMORY if the workstation is configured with 200Mb memory and 600Mb swap space. Note that the slower "memory" access does also slow the calculation down. Nonetheless, one has to bid with a very expensive (low) nice value to get large memory--and certainly not the whole memory--on the Cray 2 and access it 1/30th of the time. The Cray C-90 greatly tips the balance, not only due to its faster processor but also due to its faster memory. The C-90 made the results to be discussed here feasible. But, on the other hand, it has only twice the memory of the Cray 2. In order to proceed with the next (kappa) configuration, one needs at least five times the memory. That memory is only available on a parallel machine--neither the workstation nor the Crays can offer it. The kappa configuration requires about 1.5 Gigawords (6Gbytes) of memory which, under todays standard configurations, implies committing to using 128-256 nodes on the basis of memory considerations alone.
material and many more for
. The optimization techniques employed require frequent evaluations of the
potential multiplying the wavefunction so they must be done quickly. Fast Fourier Transforms are used to this end and
thus are the major critical kernel of the calculation. For
,
a 32x45x72 grid was used while for
, the grid had to be
extended to a 64x64x128 grid to accommodate the greater structure in the unit cell. It is to be noted here that the large number of basis functions and the large grid size imply very large amounts of data which must be kept available. The Brillouin zone (reciprocal space unit cell) was sampled at only 4 points to achieve the self consistent results. While this is an extremely small number for a metal, it is hoped that the large number of atoms, and thus small Brillouin zone, would make it adequate. A standard technique of Gaussian broadening the eigenvalues by 10 mRy. was used. Test calculations using a single point were found to yield a density and potential that gave rise to similar band dispersion. Another consequence of the materials being metallic is that one must reorthonormalize states and reoccupy states at each step.
. In this material, superconductivity is
significantly enhanced when a pressure of 1 kbar is applied to suppress the competing charge density wave. This
geometrically simplest of the ET-based superconductors has "only" 55 atoms in the (triclinic) unit cell. Its 213
valence electrons arrange themselves into 106 filled bands plus the half filled band 107. At least, that is what
is found by both the EHT and pseudopotential calculations. (The ASW results do not agree but remember that additional
severe approximations were applied in that calculation.) Shubnikov-deHaas[6] and tilted field magnetoresistance[7] data
strongly support the simpler picture of the EHT and pseudopotential calculations. Although the measurements were actually
taken on a material with a different anion, the anion has relatively little effect on the Fermi Surface. Of course, the next
question is how do the EHT and pseudopotential calculations compare?
The answer is seen in Fig.2.
The current SCF calculation
and the EHT calculation agree fairly well although the SCF calculation
has a slightly wider bandwidth and accordingly smaller electronic
mass. Also, the band of the SCF calculation more closely approaches
the Fermi energy near the M point. This will be of some interest as
photoemission results become available[8]. Fig.3 shows the charge
density. The ET molecule (Fig.1) can be clearly discerned along with
the charge of the anions in the plane below.
The analysis of these results continues. The first issue is to get a more
numerically rigorous alignment of the Fermi energy --- an
"engineering detail" that is important so that reliable numerical
masses can be determined and enhancement factors estimated. Next
is to dissect the wavefunction character associated with the band that
crosses the Fermi energy. Conventional wisdom is that the in-plane
conduction proceeds as sulfur to sulfur hopping. It will be useful to
take a close look at that in this model as well as the distribution of the
state over the ET molecule. One would like to know about how much
polarization effects are influencing the behavior of the state ---
expected to be small due to the close agreement of the EHT and SCF
calculations --- and whether it can be represented by simple models
amenable to further manipulation.As photoemission measurements[9]
become available, not only will we learn more about the anisotropy of
the Fermi surface but gain information about relaxation and correlation
in these systems through comparison to calculations. There is a rich
field of questions that can be explored by performing further
calculations within this structure but varying the anions since,
although the Fermi surface will probably not be found to vary greatly,
the superconducting properties do vary.
However, the real touchstone will be to exploit a very different
structure. Since one is working at the limits of computational ability,
the choice is dictated: the next simplest structure is the k- phase.
This variation drastically changes the positioning of the ET molecules
so the similarities and differences can be insightful. The calculations
are well underway but would be premature to discuss as this time
other than in terms of their resource requirements.
The situation is quite different for the k- phased materials. Such calculations require 1.5 Gw of memory (the largest memory available on a C-90 is 1 Gw at a doubling of the cost of the machine). It is this requirement even more than the processor time which drives the use of a parallel machine. To that end, the code has been converted to CM-Fortran and parallelized as guided by CMAX. It has run on several Thinking Machines computers but is now focussed on the CM-5. The first major task to accommodate to the parallel architecture is reorganization of the data distribution. As mentioned, the memory associated with between 128 and 256 nodes is needed to run the job. The number is larger than a simple division of the memory between nodes. This is, in part, because one needs to expand memory usage by about a factor of three to operate efficiently as a parallel application: so much for seamless computing!. On the bright side, this is not a bad match of node count for appropriate processor effort thereby giving some level of balance. The next step is efficient implementation of the two critical kernels: FFT's and eigensystem analysis; both of which one expects to get from program libraries on any production level machine. A more memory and computationally efficient scheme could be achieved with differently organized FFT routines. These will come with greater maturity of libraries but are inappropriate efforts for the application programmers (ie physicists). Thereafter, one has to deal cleverly with different features of the problem than on a serial implementation: Global masked sums and global dot products being the natural examples.
Another aspect which sneaks into the picture is I/O. The I/O is significantly slower on the parallel machines. Since the restart file is about 0.5 Gw in size, this is a serious issue. On the C-90, a restart file is written after each iteration. On the parallel machines, that is too expensive and one chooses to write that file only every n'th iteration. That leaves one with a lot of good work exposed and unprotected for much longer times.
______________________________________________________ [1] M. H. Whangbo, et. al., in Organic Superconductivity, [ed] V. Z. Kresin and W. A. Little, Plenum, New York, 1990. [2] J. Kubler and C. B. Sommers, in The Physics and Chemistry of Organic Superconductors, [ed] G. Saito and S. Kagoshima, Springer, Berlin, 1990. [3] W. Y. Ching, et. al., Bull. Am. Phys. Soc. 39,880(1994). [4] N. Troullier and J. L. Martins, Phys. Rev. B 43,1993 (1991). [5] M. Teter, M. Payne, and D. G. Allan, Phys. Rev. B 40,12255(1989). [6] R. Benedek, L. H. Yang, C. Woodward, and B. I. Min, Phys. Rev. B 45, 2607 (1992). [7] W. Kang, et al, Phys. Rev. Lett. 62,2559 (1989). [8] M. V. Kartsovnik et at, J. Phys. (France) 2,89 (1992). [9] R. Liu, et al (unpublished).