Example: barber

PWscf User’s Guide (v.6.2) - Quantum ESPRESSO

1 Introduction This guide covers the usage of the PWscf (Plane-Wave Self-Consistent Field) package, a core component of the Quantum ESPRESSO distribution.


  Guide, User, Repossess, Pwscf user s guide, Pwscf




Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Text of PWscf User’s Guide (v.6.2) - Quantum ESPRESSO

PWscfUser s Guide ( )(only partially updated)Contents1 What canPWscfdo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Terms of use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32 Compilation43 Input data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Electronic structure calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization and dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direct interface withCASINO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Socket interface with i-PI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Execution time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Memory requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . File space requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallelization issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Understanding the time report . . . . . . . . . . . . . . . . . . . . . . . . . . . . execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Restarting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . trapping (experimental!) . . . . . . . . . . . . . . . . . . . . . . . 185 Compilation problems withPLUMED. . . . . . . . . . . . . . . . . . . . . . . . . 2511 IntroductionThis guide covers the usage of thePWscf(Plane-Wave Self-Consistent Field) package, a corecomponent of theQuantum ESPRESSOdistribution. Further documentation, beyond whatis provided in this guide, can be found in the directoryPW/Doc/, containing a copy of this notice: due to the lack of time and of manpower, this manual is only partiallyupdated and may contain outdated guide assumes that you know the physics thatPWscfdescribes and the methods itimplements. It also assumes that you have already installed, or know how to install,QuantumESPRESSO. If not, please read the general User s Guide forQuantum ESPRESSO, foundin directoryDoc/two levels above the one containing this guide; or consult the web site: who want to modify or contribute toPWscfshould read the Developer Manual: What canPWscfdoPWscfperforms many different kinds of self-consistent calculations of electronic-structure prop-erties within Density-Functional Theory (DFT), using a Plane-Wave (PW) basis set and pseu-dopotentials (PP). In particular: ground-state energy and one-electron (Kohn-Sham) orbitals, atomic forces, stresses; structural optimization, also with variable cell; molecular dynamics on the Born-Oppenheimer surface, also with variable cell; macroscopic polarization and finite electric fields via the modern theory of polarization(Berry Phases); modern theory of orbital magnetization; free-energy surface calculation at fixed cell through meta-dynamics, if patched of the above works for both insulators and metals, in any crystal structure, for manyexchange-correlation (XC) functionals (including spin polarization, DFT+U, meta-GGA, non-local and hybrid functionals), for norm-conserving (Hamann-Schluter-Chiang) PPs (NCPPs) inseparable form or Ultrasoft (Vanderbilt) PPs (USPPs) or Projector Augmented Waves (PAW)method. Noncollinear magnetism and spin-orbit interactions are also implemented. An im-plementation of finite electric fields with a sawtooth potential in a supercell is also note that NEB calculations are no longer performed , but are instead carried (see main user guide), a dedicated code for path optimization which can usePWscfas computational PeopleThePWscfpackage (which in earlier releases includedPHononandPostProc) was originallydeveloped by Stefano Baroni, Stefano de Gironcoli, Andrea Dal Corso (SISSA), Paolo Giannozzi(Univ. Udine), and many others. We quote in particular:2 David Vanderbilt s group at Rutgers for Berry s phase calculations; Paolo Umari (Univ. Padua) for finite electric fields; Ralph Gebauer (ICTP, Trieste) and Adriano Mosca Conte (SISSA, Trieste) for non-collinear magnetism; Andrea Dal Corso for spin-orbit interactions; Carlo Sbraccia (Princeton) for improvements to structural optimization and to manyother parts; Dario Alf`e (University College London) for implementation of Born-Oppenheimer molec-ular dynamics; Renata Wentzcovitch and collaborators (Univ. Minnesota) for variable-cell moleculardynamics; Lorenzo Paulatto ( VI) for PAW implementation, built upon previous work byGuido Fratesi ( Bicocca) and Riccardo Mazzarello (ETHZ-USI Lugano); Matteo Cococcioni (Univ. Minnesota) for DFT+U implementation; Timo Thonhauser (WFU) for vdW-DF, svdW-DF, and variants;and the more recent contributors: Pietro Bonf`a (CINECA) for memory estimator; Michele Ceriotti and Riccardo Petraglia (EPFL Lausanne) for interfacing with i-PI; Taylor Barnes (LBL) and Nicola Varini (CSCS) for improvements to hybrid functionals; Robert DiStasio (Cornell), Biswajit Santra (Princeton), Hsin-Yu Ko (Princeton), ThomasMarkovich (Harvard) for Tkatchenko-Scheffler stress in PWscf; Jong-Won Song (RIKEN) for Gau-PBE functional; Alberto Otero de la Roza (Merced Univ.) for XDM (exchange-hole dipole moment) modelof dispersions, PW86 (unrevised) and B86B functionals; Hannu-Pekka Komsa (CSEA/Lausanne) for the HSE functional; Gabriele Sclauzero (IRRMA Lausanne) for DFT+U with on-site occupations obtainedfrom pseudopotential projectors; Alexander Smogunov (CEA) for DFT+U with noncollinear magnetization and for calcu-lation of Magnetic Anisotropy Energy using the Force Theorem; Burak Himmetoglou (UCSB) for DFT+U+J; Xiaochuan Ge (SISSA) for Smart MonteCarlo Langevin dynamics; Andrei Malashevich (Univ. Berkeley) for calculation of orbital magnetization;3 Minoru Otani (AIST), Yoshio Miura (Tohoku U.), Nicephore Bonet (MIT), Nicola Marzari(Univ. Oxford), Brandon Wood (LLNL), Tadashi Ogitsu (LLNL), for ESM, EffectiveScreening Method (PRB 73, 115407 [2006]); Dario Alf`e, Mike Towler (University College London), Norbert Nemec ( ) forthe interface guide was mostly written by Paolo Giannozzi. Mike Towler wrote thePWscftoCASINOsubsection. Michele Ceriotti and Riccardo Petraglia wrote the subsection on i-PI Terms of useQuantum ESPRESSOis free software, released under the GNU General Public , or the file License in thedistribution).We shall greatly appreciate if scientific work done usingQuantum ESPRESSOdistribu-tion will contain an explicit acknowledgment and the following reference:P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, C. Cavazzoni, D. Ceresoli,G. L. Chiarotti, M. Cococcioni, I. Dabo, A. Dal Corso, S. Fabris, G. Fratesi, S. deGironcoli, R. Gebauer, U. Gerstmann, C. Gougoussis, A. Kokalj, M. Lazzeri, , N. Marzari, F. Mauri, R. Mazzarello, S. Paolini, A. Pasquarello,L. Paulatto, C. Sbraccia, S. Scandolo, G. Sclauzero, A. P. Seitsonen, A. Smo-gunov, P. Umari, R. M. Wentzcovitch, 21, 395502 (2009), the formQuantum ESPRESSOfor textual citations of the code. Please also seepackage-specific documentation for further recommended citations. Pseudopotentials shouldbe cited as (for instance)[ ] We used the pseudopotentials and for all exchange-correlation functionals can be found in the header of CompilationPWscfis included in the coreQuantum ESPRESSOdistribution. Instruction on how to in-stall it can be found in the general documentation (User s Guide) forQuantum pwfrom the mainQuantum ESPRESSOdirectory ormakefrom thePW/subdirectory produces inPW/srcand a link to thebin/directory. Inaddition, several utility programs, and related links inbin/, are produced inPW/tools: PW/ input data forPWscf, calculates distances and angles betweenatoms in a cell, taking into account periodicity PW/ energy-vs-volume data to an equation of state PW/ lists of k-points4 PW/ , respectively input and output files (not datafiles!) (the latter, courtesy of Pietro Bonf`a) and produce an XSF-formatted file suitable for plotting with XCrySDen: , a pow-erful crystalline and molecular structure visualization program. BEWARE: script requires to be located somewhere in your PATH. PW/ ,PW/ scripts that process the output (notdata files!). Usage:awk -f < my-pw-file > -f < my-pw-file > files so produced are suitable for use withxbs, a very simple X-windows utility todisplay molecules, available at: PW/ : script converting from CIF (Crystallographic Information File) toa format suitable forQuantum ESPRESSO. Courtesy of Carlo Nervi (Univ. Torino,Italy).Documentation for the auxiliary codes can be found in the codes themselves, in the headerof UsingPWscfInput files be either written by hand or produced via thePWguigraphical interfaceby Anton Kokalj, included in theQuantum ESPRESSOdistribution. (where is the version number) for more info onPWgui, orGUI/READMEif you are usingSVN may take the tests and examples distributed withQuantum ESPRESSOas templatesfor writing your own input files. In the following, whenever we mention Example N , we referto those. Input files are those in theresults/subdirectories, with names ending (they will appear after you have run the examples). Input dataInput data is organized as several namelists, followed by other fields ( cards ) introduced bykeywords. The namelists are&CONTROL:general variables controlling the run&SYSTEM:structural information on the system under investigation&ELECTRONS:electronic variables: self-consistency, smearing&IONS (optional):ionic variables: relaxation, dynamics&CELL (optional): variable-cell optimization or dynamicsOptional namelist may be omitted if the calculation to be performed does not require depends on the value of variablecalculationin namelist &CONTROL. Most variablesin namelists have default values. Only the following variables in &SYSTEM must always bespecified:ibrav(integer)Bravais-lattic e indexcelldm(real, dimension 6) crystallographic constantsnat(integer)number of atoms in the unit cellntyp(integer)number of types of atoms in the unit cellecutwfc(real)kinetic energy cutoff (Ry) for metallic systems, you have to specify how metallicity is treated in you chooseoccupations= smearing , you have to specify the smearing typesmearingandthe smearing widthdegauss. Spin-polarized systems are as a rule treated as metallic system,unless the total magnetization,totmagnetizationis set to a fixed value, or if occupationnumbers are fixed (occupations= from input and card OCCUPATIONS).Explanations for the meaning of variablesibravandcelldm, as well as on alternative waysto input structural data, are in filesPW/ Thesefiles are the reference for input data and describe a large number of other variables as all variables have default values, which may or may not fit your lines in namelists can be introduced by a ! , exactly as in fortran the namelists, you have several fields ( cards ) introduced by keywords with self-explanatory names:ATOMICSPECIESATOMICPOSITIONSKPOINT SCELLPARAMETERS (optional)OCCUPATIONS (optional)6The keywords may be followed on the same line by an option. Unknown fields are ignored. Seethe files mentioned above for details on the available cards .Comments lines in cards can be introduced by either a ! or a # character in the firstposition of a about k points: The k-point grid can be either automatically generated or manuallyprovided as a list of k-points and a weight in the Irreducible Brillouin Zone only of the Bravaislattice of the crystal. The code will generate (unless instructed not to do so: see variablenosym)all required k-points and weights if the symmetry of the system is lower than the symmetry ofthe Bravais lattice. The automatic generation of k-points follows the convention of Monkhorstand Data filesThe output data files are written in the , as specified by vari-ablesoutdirandprefix(a string that is prepended to all file names, whose default value is:prefix= pwscf ).outdircan be specified as well in environment variable is used to write the file in a XML format, whose definition can be found inthe Developer Manual. In order to use the data directory on a different machine, you shouldconvert the binary files to formatted, transfer them, convert them back to unformatted as inthe following example:$bindir/iotk convert -t [ transfer file to the other machine ]$bindir/iotk convert -b you need wavefunctions as well, you must to set Insuch case, wavefunction files are found in *, one per should do the conversion for all of NOTICE: starting with , a new I/O system will replace the current execution stops if you create a in the working directory ( the program is executed), or in theoutdirdirectory. Note that with some versions ofMPI, the working directory is the directory where the executable is! The advantage of thisprocedure is that all files are properly closed, whereas just killing the process may leave dataand output files in an unusable Electronic structure calculationsSingle-point (fixed-ion) SCF calculationSetcalculation= scf (this is actually thedefault). Namelists &IONS and &CELL will be ignored. For LSDA spin-polarized calculations(that is: with a fixed quantization axis for magnetization), setnspin=2. Note that the numberof k-points will be internally doubled (one set of k-points for spin-up, one set for spin-down).See Example structure calculationFirst perform a SCF calculation as above; then do a non-SCFcalculation with the desired k-point grid and numbernbndof bands. Usecalculation= bands if you are interested in calculating only the Kohn-Sham states for the given set of k-points ( symmetry lines: see for ).Specify insteadcalculation= nscf if you are interested in further processing of the results of7non-SCF calculations (for instance, in DOS calculations). In the latter case, you should specifya uniform grid of points. For DOS calculations you should chooseoccupations= tetrahedra ,together with an automatically generated uniform k-point grid (card KPOINTS with option automatic ). Specifynosym=. avoid generation of additional k-points in low symme-try cases. Variablesprefixandoutdir, which determine the names of input or output files,should be the same in the two runs. See Examples 01, 06, 07,NOTA BENE: Since , both atomic positions and the scf potential are read from thedata file so that consistency is magnetization, spin-orbit interactionsThe following input variables arerelevant for noncollinear and spin-orbit calculations:noncolinlspinorbstartingmag netization(one for each type of atoms)To make a spin-orbit calculationnoncolinmust be true. Ifstartingmagnetizationis setto zero (or not given) the code makes a spin-orbit calculation without spin magnetization (itassumes that time reversal symmetry holds and it does not calculate the magnetization). Thestates are still two-component spinors but the total magnetization is different from zero, it makes a noncollinear spin polarizedcalculation with spin-orbit interaction. The final spin magnetization might be zero or differentfrom zero depending on the system. Note that the code will look only for symmetries that leavethe starting magnetization to make a spin-orbit calculation you must use fully relativistic pseudopoten-tials at least for the atoms in which you think that spin-orbit interaction is large. If all thepseudopotentials are scalar relativistic the calculation becomes equivalent to a noncollinear cal-culation without spin orbit. (Andrea Dal Corso, 2007-07-27) See Example 06 for noncollinearmagnetism, Example 07 (and references quoted therein) for spin-orbit +UDFT+U (formerly known as LDA+U) calculation can be performed within a sim-plified rotationally invariant form of theUHubbard correction. Note that for all atoms havingaUvalue there should be an item in one in sub-routinePW/ , defining respectively the angular momentum and the occupancy ofthe orbitals with the Hubbard correction. If your Hubbard-corrected atoms are not there, youneed to edit these files and to Example 08 and its Interactions (DFT-D)For DFT-D (DFT + semiempirical dispersion interac-tions), see the description of input variableslondon*, sample filesPW/tests/vdw.*, and thecomments in source and Hybrid functionalsSince , calculations in the Hartree-Fock ap-proximation, or using hybrid XC functionals that include some Hartree-Fock exchange, nolonger require a special preprocessing before compilation. SeeEXXexample/and its interaction with non-local functional (vdW-DF)See examplevdwDFexampleand references quoted in via Berry PhaseSee Example 04, its file README, the documentation inthe header ofPW/ electric fieldsThere are two different implementations of macroscopic electric : via an external sawtooth potential (input variabletefield=.true.) and via themodern theory of polarizability (lelfield=.true.). The former is useful for surfaces, especiallyin conjunction with dipolar corrections (dipfield=.true.): seeexamples/dipoleexampleforan example of application. Electric fields via modern theory of polarization are documentedin example 10. The exact meaning of the related variables, for both cases, is explained in thegeneral input magnetizationModern theory of orbital magnetization [Phys. Rev. Lett. 95,137205 (2005)] for insulators. The calculation is performed by setting input variablelorbm=. nscf run. If finite electric field is present (lelfield=.true.) only Kubo terms are computed[see New J. Phys. 12, 053032 (2010) for details]. Optimization and dynamicsStructural optimizationFor fixed-cell optimization, specifycalculation= relax andadd namelist &IONS. All options for a single SCF calculation apply, plus a few others. You mayfollow a structural optimization with a non-SCF band-structure calculation (since , youdo not need any longer to update the atomic positions in the input file for non scf calculation).See Example DynamicsSpecifycalculation= md , the time stepdt, and possibly the num-ber of MD stopsnstep. Use variableiondynamicsin namelist &IONS for a fine-grainedcontrol of the kind of dynamics. Other options for setting the initial temperature and forthermalization using velocity rescaling are available. Remember: this is MD on the electronicground state, not Car-Parrinello MD. See Example surface calculationsOncePWscfis patched with thePLUMEDplug-in, it ispossible to use most PLUMED functionalities by runningPWscfas:. -plumedplus theother usualPWscfarguments. The input file forPLUMEDmust be found in the specifiedoutdirwith fixed optimizationSince the newer BFGS algorithm covers the case of variable-cell optimization as well. Note however that variable-cell calculations (both optimization anddynamics) are performed with plane waves and G-vectorscalculated for the starting cell. Thismeans that if you re-run a self-consistent calculation for the final cell and atomic positionsusing the same cutoffecutwfc(and/orecutrhoif applicable), you may not find exactly thesame results, unless your final and initial cells are very similar, or unless your cutoff(s) are veryhigh. In order to provide a further check, a last step is performed in which a scf calculation isperformed for the converged structure, with plane waves and G-vectorscalculated for the final9cell. Small differences between the two last steps are thus to be expected and give an estimateof the reliability of the variable-cell optimization. If you get a large difference, you are likelyquite far from convergence in the plane-wave basis set and you need to increase the cutoff(s).Variable-cell molecular dynamics A common mistake many new users make is to set thetime stepdtimproperly to the same order of magnitude as for CP algorithm, or not settingdtat all. This will produce a not evolving dynamics . Good values for the original RMW (RMWentzcovitch) dynamics aredt= 50 70. The choice of the cell mass is a delicate off-optimal mass will make convergence slower. Too small masses, as well as too long timesteps, can make the algorithm unstable. A good cell mass will make the oscillation times forinternal degrees of freedom comparable to cell degrees of freedom in non-damped Variable-CellMD. Test calculations are advisable before extensive calculation. I have tested the dampingalgorithm that I have developed and it has worked well so far. It allows for a much longertime step (dt=100 150) than the RMW one and is much more stable with very small cellmasses, which is useful when the cell shape, not the internal degrees of freedom, is far out ofequilibrium. It also converges in a smaller number of steps than RMW. (Info from Cesar DaSilva: the new damping algorithm is the default since v. ). Direct interface withCASINOPWscfnow supports the Cambridge quantum Monte Carlo program CASINO directly. For moreinformation on theCASINOcode ~mdt26 take the output ofPWSCFand improve it giving considerably more accurate totalenergies and other quantities than DFT is capable wishing to learn how to use CASINO may like to attend one of the annualCASINOsummer schools in Mike Towler s Apuan Alps Centre for Physics in Tuscany, information can be found interface betweenPWscfandCASINOis provided through a file with astandard format containing geometry, basis set, and orbital coefficients, whichPWscfwill pro-duce on demand. For SCF calculations, the name of this file may , on user requests (see below). If the files are produced from an MDrun, the files have a ,.0002,.0003etc. corresponding to the sequence of is implemented by three routines in thePWdirectory of the espresso distri-bution: : the main routine : writes theCASINO in various formats : does the plane-wave to blip conversion, if requestedRelevant behavior ofPWscfmay be modified through an optional auxiliary input file, (see below).Note that in versions prior to , this functionality was provided through separate post-processing utilities available in the PP directory: these are no longer supported. For QMC-MDruns, PWSCF etc previously needed to be patched using the patch script - this is no longer to withPWscfUse the -pw2casino option when , -pw2casino < input_file > will then be generated capable of doing the plane wave to blip conversion directly (the blip utilityprovided in theCASINOdistribution is not required) and so by default,PWscfproduces the binary blip wave function options may be modified by providing a thefollowing format:&inputppblip_convert=. or all of the 5 keywords may be provided, in any order. The default values are as givenabove (and these are used if is not meanings of the keywords are as follows:blipconvert: reexpand the converged plane-wave orbitals in localized blip functions prior towriting theCASINOwave function file. This is almost always done, since wave functionsexpanded in blips are considerably more efficient in quantum Monte Carlo calculations. Ifblipconvert=. file is produced (orbitals expanded in plane waves);ifblipconvert=.true., either fileor is produced,depending on the value ofblipbinary(see below).blipbinary: if true, and ifblipconvertis also true, write the blip wave function as an un-formatted This is much smaller than the , but is not generally portable across all : orbital coefficients (.b1)are written out indouble precision; if the user runs into hardware limitsblipsinglepreccan be set which case the coefficients are written in single precision, reducing the memoryand disk requirements at the cost of a small amount of : the quality of the blip expansion ( , the fineness of the blip grid) can beimproved by increasing the grid multiplicity parameter given by this keyword. Increasingthe grid multiplicity results in a greater number of blip coefficients and therefore largermemory requirements and file size, but the CPU time should be unchanged. For veryaccurate work, one may want to experiment with grid multiplicity larger that Note,however, that it might be more efficient to keep the grid multiplicity to and increasethe plane wave cutoff : if this is set to a positive integer greater than zero,PWscfwill sample thewave function, the Laplacian and the gradient at a large number of random points in the11simulation cell and compute the overlap of the blip orbitals with the original plane-waveorbitals: =< BW|PW > < BW|BW > < PW|PW >The closer is to 1, the better the blip representation. By increasingblipmultiplicity,or by increasing the plane-wave cutoff, one ought to be able to make as close to 1 asdesired. The number of random points used is given , note that DFT trial wave functions produced byPWSCFmust be generated usingthe same pseudopotential as in the subsequent QMC calculation. This requires the use of toolsto switch between the different file formats used by the two the CASINOtabulated format ,PWSCFofficially supports the UPFv2 format(though it will read other deprecated formats). This can be done through the casino2upf and upf2casino tools included in the upftools directory (see the upftools/README file forinstructions). An alternative converter casinogon is included in theCASINOdistribution whichproduces the deprecated GON format but which can be useful when using non-standard Socket interface with i-PIThe i-PI universal force engine performs advanced Molecular Dynamics (MD) (such as PathIntegral Molecular Dynamics, Thermodynamic Integration, Suzuki-Chin path integral, MultipleTime Step molecular dynamics) and other force related computations ( information about i-PI).PWscfusers wishing to learn how to use i-PI should refer to the i-PI communication betweenPWscfand i-PI relies on a socket interface. Thisallows running i-PI andPWscfon different computers provided that the two computers havean Internet connection. Basically, i-PI works as a server waiting for a connection of a suit-able software (for examplePWscf). When this happens, i-PI injects atomic positions and cellparameters into the software, that will return forces and stress tensor to file containing the interface The the necessary infrastructure to the socket to use the i-PI intefaceSince the communication goes through the Internet, thePWscfinstance needs to know the address of the i-PI server that can be specified with thecommand line option--ipi(or-ipi) followed by the address of the computer running i-PIand the port number where i-PI is listening, --ipi localhost:3142 -in > i-PI andPWscfare running on the same machine, a UNIX socket is preferable since allowsfaster communications, --ipi socketname:UNIX -in > the last case,UNIXis a keyword that tells toPWscfto look for an UNIX socket connectioninstead of an INET one. More extensive examples and tutorials can be found file must contain all the information to perform a single point calculation(calculation = "scf") which are also used to initialize thePWscfrun. Thus, it is importantthat thePWscfinput contains atomic positions and cell parameters which are as close as possibleto those specified in the i-PI Execution timeThe following is a rough estimate of the complexity of a plain scf calculation , forNCPP. USPP and PAW give raise additional terms to be calculated, that may add from afew percent up to 30-40% to execution time. For phonon calculations, each of the 3Natmodesrequires a time of the same order of magnitude of self-consistent calculation in the same system(possibly times a small multiple). , each time step takes something in the order ofTh+Torth+Tsubdefined time required for the self-consistent solution at fixed ionic positions,Tscf, is:Tscf=NiterTiter+TinitwhereNiter= number of self-consistency iterations (niter),Titer= time for a single iteration,Tinit= initialization time (usually much smaller than the first term).The time required for a single self-consistency iterationTiteris:Titer=NkTdiag+Trho+Tscf whereNk= number of k-points,Tdiag= time per Hamiltonian iterative diagonalization,Trho= time for charge density calculation,Tscf= time for Hartree and XC potential time for a Hamiltonian iterative diagonalizationTdiagis:Tdiag=NhTh+Torth+ TsubwhereNh= number ofH products needed by iterative diagonalization,Th= time perH product,Torth= CPU time for orthonormalization,Tsub= CPU time for subspace timeThrequired for aH product isTh=a1MN+a2MN1N2N3log(N1N2N3) + first term comes from the kinetic term and is usually much smaller than the others. Thesecond and third terms come respectively from local and nonlocal ,a2,a3areprefactors ( small numbersO(1)),M= number of valence bands (nbnd),N= number ofPW (basis set dimension:npw),N1,N2,N3= dimensions of the FFT grid for wavefunctions(nr1s,nr2s,nr3s;N1N2N3 8N), P = number of pseudopotential projectors, summed onall atoms, on all values of the angular momentuml, andm= 1,...,2l+ timeTorthrequired by orthonormalization isTorth=b1NM2xand the timeTsubrequired by subspace diagonalization isTsub=b2M3x13whereb1andb2are prefactors,Mx= number of trial wavefunctions (this will vary betweenMand 2 4M, depending on the algorithm).The timeTrhofor the calculation of charge density from wavefunctions isTrho=c1MNr1Nr2Nr3log(Nr1Nr2Nr3) +c2MNr1Nr2Nr3+Tuswherec1,c2,c3are prefactors,Nr1,Nr2,Nr3= dimensions of the FFT grid for charge density(nr1,nr2,nr3;Nr1Nr2Nr3 8Ng, whereNg= number of G-vectors for the charge density,ngm), andTus= time required by PAW/USPPs contribution (if any). Note that for NCPPsthe FFT grids for charge and wavefunctions are the timeTscffor calculation of potential from charge density isTscf=d2Nr1Nr2Nr3+d3Nr1Nr2Nr3log(Nr1Nr2 Nr3)whered1,d2are hybrid DFTs, the dominant term is by far the calculation of the nonlocal (Vx ) product,taking as much asTexx=eNkNqM2N1N2N3log(N1N2N3)whereNqis the number of points in thek+qgrid, determined by optionsnqx1,nqx2,nqx3,eis a above estimates are for serial execution. In parallel execution, each contribution mayscale in a different manner with the number of processors (see below). Memory requirementsA typical self-consistency or molecular-dynamics run requires a maximum memory in the orderofOdouble precision complex numbers, whereO=mMN+PN+pN1N2N3+qNr1Nr2Nr3withm,p, q= small factors; all other variables have the same meaning as above. Note that ifthe point only (k= 0) is used to sample the Brillouin Zone, the value of N will be cut hybrid DFTs, additional storage ofOxdouble precision complex numbers is needed (forFourier-transformed Kohn-Sham states), withOx=xNqMN1N2N3andx= for onlycalculations,x= 1 memory required by the phonon code follows the same patterns, with somewhat largerfactorsm,p, File space requirementsA will require an amount of temporary disk space in the order of O doubleprecision complex numbers:O=NkMN+qNr1Nr2Nr3whereq= 2 mixingndim(number of iterations used in self-consistency, default value = 8) ifdiskiois set to high ; q = 0 Parallelization run in principle on any number of processors. The effectiveness of parallelization isultimately judged by the scaling , how the time needed to perform a job scales with thenumber of processors, and depends upon: the size and type of the system under study; the judicious choice of the various levels of parallelization (detailed in ); the availability of fast interprocess communications (or lack of it).Ideally one would like to have linear scaling, T0/NpforNpprocessors, whereT0isthe estimated time for serial execution. In addition, one would like to have linear scaling ofthe RAM per processor:ON O0/Np, so that large-memory systems fit into the RAM of on k-points: guarantees (almost) linear scaling if the number of k-points is a multiple of the numberof pools; requires little communications (suitable for ethernet communications); reduces the required memory per processor by distributing wavefunctions (but not otherquantities like the charge density), unless you setdiskio= high .Parallelization on PWs: yields good to very good scaling, especially if the number of processors in a pool is adivisor ofN3andNr3(the dimensions along the z-axis of the FFT grids,nr3andnr3s,which coincide for NCPPs); requires heavy communications (suitable for Gigabit ethernet up to 4, 8 CPUs at most,specialized communication hardware needed for 8 or more processors ); yields almost linear reduction of memory per processor with the number of processors inthe note on scaling: optimal serial performances are achieved when the data are as much aspossible kept into the cache. As a side effect, PW parallelization may yield superlinear (betterthan linear) scaling, thanks to the increase in serial speed coming from the reduction of datasize (making it easier for the machine to keep data in the cache).VERY IMPORTANT: For each system there is an optimal range of number of processors onwhich to run the job. A too large number of processors will yield performance degradation. Ifthe size of pools is especially delicate:Npshould not exceedN3andNr3, and should ideally beno larger than 1/2 1/4N3and/orNr3. In order to increase scalability, it is often convenientto further subdivide a pool of processors into task groups . When the number of processorsexceeds the number of FFT planes, data can be redistributed to task groups so that eachgroup can process several wavefunctions at the same optimal number of processors for linear-algebra parallelization, taking care of mul-tiplication and diagonalization ofM Mmatrices, should be determined by observing the15performances ofcdiagh/rdiagh( ) orortho( ) for different numbers of processors inthe linear-algebra group (must be a square integer).Actual parallel performances will also depend on the available software (MPI libraries) andon the available communication hardware. For PC clusters, OpenMPI ( )seems to yield better performances than other implementations (info by Kostantin Kudin). Notehowever that you need a decent communication hardware (at least Gigabit ethernet) in orderto have acceptable performances with PW parallelization. Do not expect good scaling withcheap hardware: PW calculations are by no means an embarrassing parallel note that multiprocessor motherboards for Intel Pentium CPUs typically have just onememory bus for all processors. This dramatically slows down any code doing massive access tomemory (as most codes in theQuantum ESPRESSOdistribution do) that runs on processorsof the same Understanding the time reportThe time report printed at the end of contains a lot of useful information that canbe used to understand bottlenecks and improve Serial executionThe following applies to calculations taking a sizable amount of time (at least minutes): for shortcalculations (seconds), the time spent in the various initializations dominates. Any discrepancywith the following picture signals some anomaly. For a typical job with norm-conserving PPs, the total (wall) time is mostly spent inroutine electrons , calculating the self-consistent solution. Most of the time spent in electrons is used by routine cbands , calculating Kohn-Sham states. sumband (calculating the charge density), vofrho (calculating thepotential), mixrho (charge density mixing) should take a small fraction of the time. Most of the time spent in cbands is used by routines cegterg (k-points) or regterg (Gamma-point only), performing iterative diagonalization of the Kohn-Sham Hamiltonianin the PW basis set. Most of the time spent in *egterg is used by routine hpsi , calculatingH products. cdiaghg (k-points) or rdiaghg (Gamma-only), performing subspace diagonalization,should take only a small fraction. Among the general routines , most of the time is spent in FFT on Kohn-Sham states: fftw , and to a smaller extent in other FFTs, fft and ffts , and in calbec , calculating | products. Forces and stresses typically take a fraction of the order of 10 to 20% of the total PAW and Ultrasoft PP, you will see a larger contribution by sumband and a nonnegligible newd contribution to the time spent in electrons , but the overall picture is unchanged. Youmay drastically reduce the overhead of Ultrasoft PPs by using input option tqr=.true. . Parallel executionThe various parallelization levels should be used wisely in order to achieve good results. Letus summarize the effects of them on CPU: Parallelization on FFT speeds up (with varying efficiency) almost all routines, with thenotable exception of cdiaghg and rdiaghg . Parallelization on k-points speeds up (almost linearly) cbands and called routines;speeds up partially sumband ; does not speed up at all vofrho , newd , mixrho . Linear-algebra parallelization speeds up (not always) cdiaghg and rdiaghg task-group parallelization speeds up fftw OpenMP parallelization speeds up fftw , plus selected parts of the calculation, plus(depending on the availability of OpenMP-aware libraries) some linear algebra operationsand on RAM: Parallelization on FFT distributes most arrays across processors ( all G-space and R-spaces arrays) but not all of them (in particular, not subspace Hamiltonian and overlapmatrices) Linear-algebra parallelization also distributes subspace Hamiltonian and overlap matrices. All other parallelization levels do not distribute any memoryIn an ideally parallelized run, you should observe the following: CPU and wall time do not differ by much Time usage is still dominated by the same routines as for the serial run Routine fftscatter (called by parallel FFT) takes a sizable part of the time spent inFFTs but does not dominate estimate of parallelization parametersYou need to know the number of k-points,Nk the third dimension of the (smooth) FFT grid,N3 the number of Kohn-Sham states,MThese data allow to set bounds on parallelization: k-point parallelization is limited toNkprocessor pools:-nk Nk FFT parallelization shouldn t exceedN3processors, if you run with-nk Nk, useN=Nk N3MPI processes at most (mpirun -np N ...) UnlessMis a few hundreds or more, don t bother using linear-algebra parallelizationYou will need to experiment a bit to find the best compromise. In order to have good loadbalancing among MPI processes, the number of k-point pools should be an integer divisor ofNk; the number of processors for FFT parallelization should be an integer divisor symptoms of bad/inadequate parallelization a large fraction of time is spent in vofrho , newd , mixrho , orthe time doesn t scale well or doesn t scale at all by increasing the number of processorsfor k-point : use (also) FFT parallelization if possible a disproportionate time is spent in cdiaghg / rdiaghg .Solutions: use (also) k-point parallelization if possible use linear-algebra parallelization, with scalapack if possible. a disproportionate time is spent in fftscatter , orin fftscatter the difference betweenCPU and wall time is : if you do not have fast (better than Gigabit ethernet) communication hardware, donot try FFT parallelization on more than 4 or 8 procs. use (also) k-point parallelization if possible the time doesn t scale well or doesn t scale at all by increasing the number of processorsfor FFT : use task groups : try command-line option-ntg 4or-ntg 8. This may improveyour RestartingSince QE restarting from an arbitrary point of the code is no more code must terminate properly in order for restart to be possible. A clean stop can betriggered by one the following three conditions:1. The amount of time specified by the input variable maxseconds is reached2. The user creates a file named $ either in the working directory or in outputdirectory $outdir (variables $outdir and $prefix as specified in the control namelist)3. (experimental) The code is compiled with signal-trapping support and one of the trappedsignals is received (see the next section for details).After the condition is met, the code will try to stop cleanly as soon as possible, which cantake a while for large calculation. Writing the files to disk can also be a long process. In orderto be safe you need to reserve sufficient time for the stop process to the previous execution of the code has stopped properly, restarting is possible settingrestartmode= restart in the control Signal trapping (experimental!)In order to compile signal-trapping add -DTERMINATEGRACEFULLY to MANUALDFLAGSin the file. Currently the code intercepts SIGINT, SIGTERM, SIGUSR1, SIGUSR2,SIGXCPU; signals can be added or removed editing the queue systems will send a signal some time before killing a job. The exact be-haviour depends on the queue systems and could be configured. Some examples:With PBS: send the default signal (SIGTERM) 120 seconds before the end:#PBS -l signal=@120 send signal SIGUSR1 10 minutes before the end:#PBS -l signal=SIGUSR1@600 you cand also send a signal manually with qsig or send a signal and then stop:qdel -W 120 jobidwill send SIGTERM, wait 2 minutes than force LoadLeveler (untested): the SIGXCPU signal will be sent when wallsoftlimitisreached, it will then stop the job whenhardlimitis reached. You can specify both limits as:# @ wallclocklimit = hardlimit, you can give thirty minutes to stop using:# @ wallclocklimit = 5:00,4:305 says error while loading shared libraries or cannot open shared object file and does not startPossible reasons: If you are running on the same machines on which the code was compiled, this is a libraryconfiguration problem. The solution is machine-dependent. On Linux, find the path tothe missing libraries; then either add it to file/ runldconfig(mustbe done as root), or add it to variable LDLIBRARYPATH and export it. Anotherpossibility is to load non-shared version of libraries (ending with .a) instead of sharedones (ending with .so). If you arenotrunning on the same machines on which the code was compiled: you needeither to have the same shared libraries installed on both machines, or to load statically alllibraries (using appropriateconfigureor loader options). The same applies to Beowulf-style parallel machines: the needed shared libraries must be present on all in examples with parallel executionIf you get error messages in the exam-ple scripts not errors in the codes on a parallel machine, such as :run exam-ple: -n: command not foundyou may have forgotten to properly define PARAPREFIX prints the first few lines and then nothing happens (parallel execution)Ifthe code looks like it is not reading from input, maybe it isn t: the MPI libraries need to beproperly configured to accept input redirection. -iand the input file name ( ), or inquire with your local computer wizard (if any). Since , this is for sure thereason if the code stops atWaiting for stops with error while reading dataThere is an error in the input data, typicallya misspelled namelist variable, or an empty input file. Unfortunately with most compilers thecode just reportsError while reading XXX namelistand no further useful information. Hereare some more subtle sources of trouble: Out-of-bound indices in dimensioned variables read in the namelists; Input data files containing M (Control-M) characters at the end of lines, or non-ASCIIcharacters ( non-ASCII quotation marks, that at a first glance may look the sameas the ASCII character). Typically, this happens with files coming from Windows orproduced with smart may cause the code to crash with rather mysterious error messages. If none of the aboveapplies and the code stops at the first namelist (&CONTROL) and you are running in parallel,see the previous mumbles something likecannot recoverorerror reading recover fileYou aretrying to restart from a previous job that either produced corrupted files, or did not do whatyou think it did. No luck: you have to restart from stops withinconsistent DFTerrorAs a rule, the flavor of DFT used in thecalculation should be the same as the one used in the generation of pseudopotentials, whichshould all be generated using the same flavor of DFT. This is actually enforced: the type ofDFT is read from pseudopotential files and it is checked that the same DFT is read from allPPs. If this does not hold, the code stops with the above error message. Use at your ownrisk input variableinputdftto force the usage of the DFT you stops with error in cdiaghg or rdiaghgPossible reasons for such behavior are notalways clear, but they typically fall into one of the following cases: serious error in data, such as bad atomic positions or bad crystal structure/supercell; a bad pseudopotential, typically with a ghost, or a USPP giving non-positive chargedensity, leading to a violation of positiveness of the S matrix appearing in the USPPformalism; a failure of the algorithm performing subspace diagonalization. The LAPACK algorithmsused bycdiaghg(for generic k-points) orrdiaghg(for only case) are very robust andextensively tested. Still, it may seldom happen that such algorithms fail. Try to useconjugate-gradient diagonalization (diagonalization= cg ), a slower but very robustalgorithm, and see what buggy libraries. Machine-optimized mathematical libraries are very fast but sometimesnot so robust from a numerical point of view. Suspicious behavior: you get an errorthat is not reproducible on other architectures or that disappears if the calculation isrepeated with even minimal changes in parameters. Known cases: HP-Compaq alphaswith cxml libraries, Mac OS-X with system BLAS/LAPACK. Try to use compiled BLASand LAPACK (or better, ATLAS) instead of machine-optimized crashes with no error message at allThis happens quite often in parallel execu-tion, or under a batch queue, or if you are writing the output to a file. When the programcrashes, part of the output, including the error message, may be lost, or hidden into error fileswhere nobody looks into. It is the fault of the operating system, not of the code. Try to runinteractively and to write to the screen. If this doesn t help, move to next crashes withsegmentation faultor similarly obscure messagesPossible rea-sons: too much RAM memory or stack requested (see next item). if you are using highly optimized mathematical libraries, verify that they are designed foryour hardware. If you are using aggressive optimization in compilation, verify that you are using theappropriate options for your machine The executable was not properly compiled, or was compiled on a different and incompat-ible environment. buggy compiler or libraries: this is the default explanation if you have problems with theprovided tests and works for simple systems, but not for large systems or whenever more RAMis neededPossible solutions: Increase the amount of RAM you are authorized to use (which may be much smaller thanthe available RAM). Ask your system administrator if you don t know what to do. Insome cases the stack size can be a source of problems: if so, increase it with commandlimitsorulimit). Reducenbndto the strict minimum (for insulators, the default is already the minimum,though). Reduce the work space for Davidson diagonalization to the minimum by settingdiagodavidndim=2;also consider using conjugate gradient diagonalization (diagonalization= cg ), slow butvery robust, which requires almost no work space. If the charge density takes a significant amount of RAM, reducemixingndimfrom itsdefault value (8) to 4 or so. In parallel execution, use more processors, or use the same number of processors with lesspools. Remember that parallelization with respect to k-points (pools) does not distributememory: only parallelization with respect to R- (and G-) space If none of the above is sufficient or feasible, you have to either reduce the cutoffs and/orthe cell size, or to use a machine with more crashes witherror in davciodavciois the routine that performs most of the I/Ooperations (read from disk and write to disk) ;error in davciomeans a failure of anI/O operation. If the error is reproducible and happens at the beginning of a calculation: check if youhave read/write permission to the scratch directory specified in variableoutdir. Also:check if there is enough free space available on the disk you are writing to, and check yourdisk quota (if any). If the error is irreproducible: your might have flaky disks; if you are writing via thenetwork using NFS (which you shouldn t do anyway), your network connection might benot so stable, or your NFS implementation is unable to work under heavy load If it happens while restarting from a previous calculation: you might be restarting fromthe wrong place, or from wrong data, or the files might be corrupted. Note that, sinceQE , restarting from arbitrary places is no more supported: the code must terminatecleanly. If you are running two or more instances the same time, check if you are usingthe same file names in the same temporary directory. For instance, if you submit a seriesof jobs to a batch queue, do not use the sameoutdirand the sameprefix, unless youare sure that one job doesn t start before a preceding one has crashes in parallel execution with an obscure message related to MPI errorsRandom crashes due to MPI errors have often been reported, typically in Linux PC cannot rule out the possibility that bugs inQuantum ESPRESSOcause such behavior,but we are quite confident that the most likely explanation is a hardware problem (defectiveRAM for instance) or a software bug (in MPI libraries, compiler, operating system).Debugging a parallel code may be difficult, but you should at least verify if your problem isreproducible on different architectures/software configurations/input data sets, and if there issome particular condition that activates the bug. If this doesn t seem to happen, the odds arethat the problem is not inQuantum ESPRESSO. You may still report your problem, butconsider that reports likeit crashes (obscure MPI error)contain 0 bits of informationand are likely to get 0 bits of stops with error messagethe system is metallic, specify occupationsYou didnot specify state occupations, but you need to, since your system appears to have an odd numberof electrons. The variable controlling how metallicity is treated isoccupationsin namelist&SYSTEM. The default,occupations= fixed , occupies the lowest (N electrons)/2 statesand works only for insulators with a gap. In all other cases, use smearing ( tetrahedra for DOS calculations). See input reference documentation for more stops withinternal error: cannot bracket EfPossible reasons: serious error in data, such as bad number of electrons, insufficient number of bands,absurd value of broadening;22 the Fermi energy is found by bisection assuming that the integrated DOS N(E ) is an in-creasing function of the energy. This is not guaranteed for Methfessel-Paxton smearing oforder 1 and can give problems when very few k-points are used. Use some other smearingfunction: simple Gaussian broadening or, better, Marzari-Vanderbilt cold smearing . yieldsinternal error: cannot bracket Efmessage but does not stopThis mayhappen under special circumstances when you are calculating the band structure for selectedhigh-symmetry lines. The message signals that occupations and Fermi energy are not correct(but eigenvalues and eigenvectors are). Removeoccupations= tetrahedra in the input datato get rid of the runs but nothing happensPossible reasons: in parallel execution, the code died on just one processor. Unpredictable behavior mayfollow. in serial execution, the code encountered a floating-point error and goes on producingNaNs (Not a Number) forever unless exception handling is on (and usually it isn t). Inboth cases, look for one of the reasons given above. maybe your calculation will take more time than you yields weird resultsIf results are really weird (as opposed to misinterpreted): if this happens after a change in the code or in compilation or preprocessing options, trymake clean, recompile. Themakecommand should take care of all dependencies, but donot rely too heavily on it. If the problem persists, recompile with reduced optimizationlevel. maybe your input data are grid is machine-dependentYes, they are! The code automatically chooses the small-est grid that is compatible with the specified cutoff in the specified cell, and is an allowed valuefor the FFT library used. Most FFT libraries are implemented, or perform well, only withdimensions that factors into products of small numbers (2, 3, 5 typically, sometimes 7 and 11).Different FFT libraries follow different rules and thus different dimensions can result for thesame system on different machines (or even on the same machine, with a different FFT). Seefunction allowed a consequence, the energy may be slightly different on different machines. The onlypiece that explicitly depends on the grid parameters is the XC part of the energy that iscomputed numerically on the grid. The differences should be small, though, especially for setting the FFT grids to a desired value is possible, but slightly tricky, usinginput variablesnr1,nr2,nr3andnr1s,nr2s,nr3s. The code will still increase them if notacceptable. Automatic FFT grid dimensions are slightly overestimated, so one may tryverycarefullyto reduce them a little bit. The code will stop if too small values are required, it willwaste CPU time and memory for too large that in parallel execution, it is very convenient to have FFT grid dimensions alongzthat are a multiple of the number of does not find all the symmetries you first the symmetryoperations (rotations) of the Bravais lattice; then checks which of these are symmetry operationsof the system (including if needed fractional translations). This is done by rotating (andtranslating if needed) the atoms in the unit cell and verifying if the rotated unit cell coincideswith the original that your coordinates are correct (please carefully check!), you may not find allthe symmetries you expect because: the number of significant figures in the atomic positions is not large enough. In , the variableaccepis used to decide whether a rotation is a symmetryoperation. Its current value (10 5) is quite strict: a rotated atom must coincide withanother atom to 5 significant digits. You may change the value of accep and recompile. they are not acceptable symmetry operations of the Bravais lattice. This is the casefor C60, for instance: theIhicosahedral group of C60contains 5-fold rotations that areincompatible with translation symmetry. the system is rotated with respect to symmetry axis. For instance: a C60molecule in thefcc lattice will have 24 symmetry operations (Thgroup) only if the double bond is alignedalong one of the crystal axis; if C60is rotated in some arbitrary way, not findany symmetry, apart from inversion. they contain a fractional translation that is incompatible with the FFT grid (see nextparagraph). Note that if you change cutoff or unit cell volume, the automatically com-puted FFT grid changes, and this may explain changes in symmetry (and in the numberof k-points as a consequence) for no apparent good reason (only if you have fractionaltranslations in the system, though). a fractional translation, without rotation, is a symmetry operation of the system. Thismeans that the cell is actually a supercell. In this case, all symmetry operations containingfractional translations are disabled. The reason is that in this rather exotic case there is nosimple way to select those symmetry operations forming a true group, in the mathematicalsense of the : symmetry operation # N not allowedThis is not an error. If a symmetryoperation contains a fractional translation that is incompatible with the FFT grid, it is discardedin order to prevent problems with symmetrization. Typical fractional translations are 1/2or 1/3 of a lattice vector. If the FFT grid dimension along that direction is not divisiblerespectively by 2 or by 3, the symmetry operation will not transform the FFT grid into : you can either force your FFT grid to be commensurate with fractional translation(set variablesnr1,nr2,nr3to suitable values), or set , innamelist &SYSTEM. Note however that the latter is incompatible with hybrid functionals andwith phonon is slow or does not converge at allBad input data will often result inbad scf convergence. Please carefully check your structure first, using that your input data is sensible :241. Verify if your system is metallic or is close to a metallic state, especially if you have fewk-points. If the highest occupied and lowest unoccupied state(s) keep exchanging placeduring self-consistency, forget about reaching convergence. A typical sign of such behavioris that the self-consistency error goes down, down, down, than all of a sudden up again,and so on. Usually one can solve the problem by adding a few empty bands and a Reducemixingbetato or smaller. Try themixingmodevalue that is moreappropriate for your problem. For slab geometries used in surface problems or for elon-gated cells,mixingmode= local-TF should be the better choice, dampening chargesloshing . You may also try to increasemixingndimto more than 8 (default value).Beware: this will increase the amount of memory you Specific to USPP: the presence of negative charge density regions due to either thepseudization procedure of the augmentation part or to truncation at finite cutoff maygive convergence problems. Raising theecutrhocutoff for charge density will do not get the same results in different machines!If the difference is small, do notpanic. It is quite normal for iterative methods to reach convergence through different pathsas soon as anything changes. In particular, between serial and parallel execution there areoperations that are not performed in the same order. As the numerical accuracy of computernumbers is finite, this can yield slightly different is also normal that the total energy converges to a better accuracy than its terms, sinceonly the sum is variational, has a minimum in correspondence to ground-state chargedensity. Thus if the convergence threshold is for instance 10 8, you get 8-digit accuracy onthe total energy, but one or two less on other terms ( XC and Hartree energy). It thisis a problem for you, reduce the convergence threshold for instance to 10 10or 10 12. Thedifferences should go away (but it will probably take a few more iterations to converge).Execution time is time-dependent!Yes it is! On most machines and on most operatingsystems, depending on machine load, on communication load (for parallel machines), on variousother factors (including maybe the phase of the moon), reported execution times may vary quitea lot for the same : N eigenvectors not convergedThis is a warning message that can be safelyignored if it is not present in the last steps of self-consistency. If it is still present in the laststeps of self-consistency, and if the number of unconverged eigenvector is a significant part ofthe total, it may signal serious trouble in self-consistency (see next point) or something badlywrong in input : negative or imaginary , charge ..., ornpt withrhoup< dw< are warning messages that can be safely ignored unlessthe negative or imaginary charge is sizable, let us say of the order of If it is, somethingseriously wrong is going on. Otherwise, the origin of the negative charge is the following. Whenone transforms a positive function in real space to Fourier space and truncates at some finitecutoff, the positive function is no longer guaranteed to be positive when transformed back to25real space. This happens only with core corrections and with USPPs. In some cases it maybe a source of trouble (see next point) but it is usually solved by increasing the cutoff for thecharge optimization is slow or does not converge or ends with a mysteriousbfgs errorTypical structural optimizations, based on the BFGS algorithm, converge to thedefault thresholds ( etotconvthr and forcconvthr ) in 15-25 BFGS steps (depending on thestarting configuration). This may not happen when your system is characterized by floppy low-energy modes, that make very difficult (and of little use anyway) to reach a well convergedstructure, no matter what. Other possible reasons for a problematic convergence are to convergence the self-consistency error in forces may become large with respect tothe value of forces. The resulting mismatch between forces and energies may confuse the lineminimization algorithm, which assumes consistency between the two. The code reduces thestarting self-consistency threshold conv thr when approaching the minimum energy configura-tion, up to a factor defined byupscale. Reducingconvthr(or increasingupscale) yields asmoother structural optimization, but ifconvthrbecomes too small, electronic self-consistencymay not converge. You may also increase variablesetotconvthrandforcconvthrthatde termine the threshold for convergence (the default values are quite strict).A limitation to the accuracy of forces comes from the absence of perfect translational in-variance. If we had only the Hartree potential, our PW calculation would be translationallyinvariant to machine precision. The presence of an XC potential introduces Fourier componentsin the potential that are not in our basis set. This loss of precision (more serious for gradient-corrected functionals) translates into a slight but detectable loss of translational invariance (theenergy changes if all atoms are displaced by the same quantity, not commensurate with theFFT grid). This sets a limit to the accuracy of forces. The situation improves somewhat byincreasing stops during variable-cell optimization in checkallsym withnon orthogonaloperationerrorVariable-cell optimization may occasionally break the starting symmetry ofthe cell. When this happens, the run is stopped because the number of k-points calculated forthe starting configuration may no longer be suitable. Possible solutions: start with a nonsymmetric cell; use a symmetry-conserving algorithm: the Wentzcovitch algorithm (cell dynamics= damp-w )should not break the Compilation problems withPLUMEDxlc compilerIf you get an error message like:Operation between types "char**" and "int" is not in #define snew(ptr,nelem) (ptr)= (nelem==0 ? NULL : (typeof(ptr)) calloc(nelem, sizeof(*(ptr))))#define srenew(ptr,nelem) (ptr)= (typeof(ptr)) realloc(ptr,(nelem)*sizeof(*(ptr)))26wit h#define snew(ptr,nelem) (ptr)= (nelem==0 ? NULL : (void*) calloc(nelem, sizeof(*(ptr))))#define srenew(ptr,nelem) (ptr)= (void*) realloc(ptr,(nelem)*sizeof(*(ptr)))Calli ng C from fortranPLUMED assumes that fortran compilers add a singleat the endof C routines. You may get an error message as :ERROR: Undefined symbol: .init_metadynERROR: Undefined symbol: .meta_force_calculationeliminate thefrom the definition of initmetadyn and metaforcecalculation, i. e. change atline 529void meta_force_calculation_(real *cell, int *istep, real *xxx, real *yyy, real *zzz,withvoid meta_force_calculation(real *cell, int *istep, real *xxx, real *yyy, real *zzz,, and at line 961void init_metadyn_(int *atoms, real *ddt, real *mass,void init_metadyn_(int *atoms, real *ddt, real *mass,27

Related search queries