Manipulate BigDFT input files

The goal of this notebook is to present how the MyBigDFT package allows to manipulate BigDFT input files. In order to run a BigDFT calculation, it is required to provide an initial geometry and generally a set of input parameters (even though default parameters are used if none are given).

The Posinp and Atom classes allow to create input geometries while the InputParams class is meant to specify the input parameters of a BigDFT calculation. All of them are presented in this notebook.

[1]:
from mybigdft import Posinp, Atom, InputParams
import numpy as np

The Posinp class

This class allows to manipulate the BigDFT input geometries in the xyz format:

2   angstroem  # Number of atoms, units
free  # Boundary conditions
N   0.0   0.0   0.0  # Atom type and cartesion coordinates of each atom
N   0.0   0.0   1.1
[2]:
atoms = [Atom('N', [0.0, 0.0, 0.0]), Atom('N', [0.0, 0.0, 1.1])]
pos = Posinp(atoms, units="angstroem", boundary_conditions="free")

The Atom class is mostly used to store the data related to a given atom. It requires an atom type and the cartesian coordinates but does not worry about the units used. It has some extra functionalities, such as a translate method, taking a vector as argument (three components) returning another Atom instance, whose positions are the ones of the pre-existing atom translated by the vector:

[3]:
Atom('N', [0, 0, 0]).translate([0, 0, 1])
[3]:
Atom('N', [0.0, 0.0, 1.0])

Main attributes

A Posinp instance has some attributes:

[4]:
assert pos.atoms == atoms
assert pos.units == "angstroem"
assert pos.boundary_conditions == "free"
assert pos.cell is None

They cannot be set afterwards:

[5]:
try:
    pos.atoms = [Atom('C', [0, 0, 0])]
except AttributeError as e:
    print(repr(e))
AttributeError("can't set attribute")

Representation of a Posinp instance

Printing a Posinp instance returns a string representation in the xyz format:

[6]:
print(pos)
2   angstroem
free
N   0.0   0.0   0.0
N   0.0   0.0   1.1

The actual representation of a Posinp instance is the following:

[7]:
print(repr(pos))
Posinp([Atom('N', [0.0, 0.0, 0.0]), Atom('N', [0.0, 0.0, 1.1])], 'angstroem', 'free', cell=None)

Note that the cell optional argument is set to None here: this is because there is no need to define a cell for free boundary conditions.

Equality of Posinp instances

The order of the atoms is not relevant: changing the order of the atoms in the list do not mean the Posinp instances are different:

[8]:
shuffled_atoms = [Atom('N', [0.0, 0.0, 1.1]), Atom('N', [0.0, 0.0, 0.0])]
shuffled_pos = Posinp(shuffled_atoms, units="angstroem", boundary_conditions="free")
print(shuffled_pos)
assert shuffled_pos == pos  # If the two were different,
2   angstroem
free
N   0.0   0.0   1.1
N   0.0   0.0   0.0

It behaves as expected if, for instance, there is not the same number of atoms or if the atomic types are different:

[9]:
# One atom is missing
assert pos != Posinp([Atom('N', [0.0, 0.0, 1.1])],
                     units="angstroem", boundary_conditions="free")
# One atom has a different type
assert pos != Posinp([Atom('C', [0.0, 0.0, 0.0]), Atom('N', [0.0, 0.0, 1.1])],
                     units="angstroem", boundary_conditions="free")

Iterating over a Posinp instance

You can easily iterate over the atoms of a Posinp instance:

[10]:
for atom in pos:
    print(f"'{atom.type}': {atom.position}")
'N': [0. 0. 0.]
'N': [0.  0.  1.1]

Class methods to intialize a Posinp instance

Other ways of initializing a Posinp instance are provided:

The from_file class method

It allows to read an xyz file written on disk, given a path to this input file:

[11]:
pos = Posinp.from_file("../../../tests/free.xyz")
print(pos)
4   atomic
free
C   0.6661284109   0.0   1.153768252
C   3.330642055   0.0   1.153768252
C   4.662898877   0.0   3.461304757
C   7.327412521   0.0   3.461304757

The from_string class method

This method is mostly meant to allow the formatting of the string representation of a posinp:

[12]:
pos_str = """\
4   reduced
surface   {x}   inf   {z}
C   0.08333333333   0.5   0.25
C   0.41666666666   0.5   0.25
C   0.58333333333   0.5   0.75
C   0.91666666666   0.5   0.75"""
for aCC in [2.65, 2.7]:
    new_str = pos_str.format(x=3*aCC, z=np.sqrt(3)*aCC)
    pos = Posinp.from_string(new_str)
    print(f"cell size for aCC={aCC:.2f}: {pos.cell}")
cell size for aCC=2.65: [7.949999999999999, 'inf', 4.589934640057525]
cell size for aCC=2.70: [8.100000000000001, 'inf', 4.676537180435969]

It would actually be possible to achieve the same thing without having to go through the string formatting. The following example should be the preferred way:

[13]:
atoms = [
    Atom('C', [0.08333333333, 0.5, 0.25]),
    Atom('C', [0.41666666666, 0.5, 0.25]),
    Atom('C', [0.58333333333, 0.5, 0.75]),
    Atom('C', [0.91666666666, 0.5, 0.75]),
]
for aCC in [2.65, 2.7]:
    cell = [3*aCC, 'inf', np.sqrt(3)*aCC]
    pos = Posinp(atoms, "reduced", "surface", cell=cell)
    print(f"cell size for aCC={aCC:.2f}: {pos.cell}")
cell size for aCC=2.65: [7.949999999999999, 'inf', 4.589934640057525]
cell size for aCC=2.70: [8.100000000000001, 'inf', 4.676537180435969]

The from_dict class method

This last class method is meant to initialize a posinp instance from a dictionary. You can use it to initialize your input files, but know using it creates more verbose code than the usual initialization (presented in the begeinning of the notebook). It was actually implemented to retrieve the posinp from a valid input parameters file (when the posinp is defined in it) or from a valid logfile (output file of a BigDFT calculation).

Also, there is no key to specify the boundary conditions: it has to be inferred from the value of the "cell" key. If there is no such key, it means that free boundary conditions must be used. However, when it exists, you must be careful with the values. For instance, if you want to use surface boundary conditions, you must set the second element of the "cell" key to "inf" or ".inf".

[14]:
pos_dict = {
    "units": "reduced",
    "cell": [8.07007483423, 'inf', 4.65925987792],
    "positions": [
        {'C': [0.08333333333, 0.5, 0.25]},
        {'C': [0.41666666666, 0.5, 0.25]},
        {'C': [0.58333333333, 0.5, 0.75]},
        {'C': [0.91666666666, 0.5, 0.75]},
    ]
}
pos = Posinp.from_dict(pos_dict)
assert pos.boundary_conditions == "surface"

See the documentation to check the extra possibilities offered by the Posinp class.

The InputParams class

This class allows to manage the BigDFT input parameters, in the yaml format:

dft:
    hgrids: [0.35, 0.35, 0.35]

It is therefore convenient to initialize this class via a dictionary representing the input parameters:

[15]:
inp = InputParams({"dft": {"hgrids": [0.35]*3}})
print(inp)
{'dft': {'hgrids': [0.35, 0.35, 0.35]}}

If the given value of a parameter corresponds to its default value, it is as if nothing was given:

[16]:
InputParams({"dft": {"hgrids": [0.45]*3}})
[16]:
{}

The validity of the input dictionary is also checked:

[17]:
try:
    InputParams({'dfpt': {'hgrids': [0.35]*3}})
except KeyError as e:
    print(repr(e))
KeyError("Unknown key 'dfpt'")
[18]:
try:
    InputParams({'dft': {'hgrid': [0.35]*3}})
except KeyError as e:
    print(repr(e))
KeyError("Unknown key 'hgrid' in 'dft'")

Main attributes

The input parameters may contain the input positions under the "posinp" key (whose content must be a dictionary allowing to create a Posinp via the from_dict, see the example above). Here, no input parameters were given:

[19]:
assert inp.posinp is None

The dictionary of parameters is actually stored by the params attribute:

[20]:
inp.params
[20]:
{'dft': {'hgrids': [0.35, 0.35, 0.35]}}

An InputParams instance behaves like a dictionary

[21]:
inp["dft"]
[21]:
{'hgrids': [0.35, 0.35, 0.35]}
[22]:
inp['dft']['hgrids']
[22]:
[0.35, 0.35, 0.35]

You can modify the content of a key afterwards, the validity of the keys will also be checked:

[23]:
# This modification is valid, and therefore taken into account
inp["dft"] = {"rmult": [6, 8]}
inp
[23]:
{'dft': {'rmult': [6, 8]}}
[24]:
try:
    # hgrid is not a valid key: an error is raised!
    inp['dft'] = {'hgrid': [0.35]*3}
except KeyError as e:
    print(repr(e))
KeyError("Unknown key 'hgrid' in 'dft'")

However, modifying the input parameters in this fashion is not checked, you must be careful when using that:

[25]:
inp['dft']["hgrid"] = [0.45]*3

One way of doing making sure that the modified input parameters are still valid is by cleaning them:

[26]:
from mybigdft.iofiles.inputparams import clean
try:
    inp = clean(inp)
except KeyError as e:
    print(repr(e))
    del inp["dft"]["hgrid"]  # Delete the bad key
KeyError("Unknown key 'hgrid' in 'dft'")

This is what is actually done when initializing or updating the input parameters. It is also performed before writing the input parameters on a file on disk, so that using bad keys on-the-fly will still be catched before running a BigDFT calculation.

You can also add initial positions to the input parameters by using its dict representation:

[27]:
inp["posinp"] = {
    "units": "angstroem",
    "positions": [
        {'N': [0.0, 0.0, 0.0]},
        {'N': [0.0, 0.0, 1.1]},
    ],
}

It won’t reflect in the content of the input parameters:

[28]:
inp
[28]:
{'dft': {'rmult': [6, 8]}}

However, the posinp attribute is not None anymore:

[29]:
print(inp.posinp)
2   angstroem
free
N   0.0   0.0   0.0
N   0.0   0.0   1.1

A much simpler way is to directly update the posinp parameter:

[30]:
inp.posinp = pos
print(inp.posinp)
4   reduced
surface   8.07007483423   inf   4.65925987792
C   0.08333333333   0.5   0.25
C   0.41666666666   0.5   0.25
C   0.58333333333   0.5   0.75
C   0.91666666666   0.5   0.75

Class methods to intialize a InputParams instance

Other ways of initializing a InputParams instance are provided. They are very similar to the ones of the Posinp class. The from_dict method is however missing : it would be redundant with the basic way of initializing an InputParams instance.

The from_file method

[31]:
inp = InputParams.from_file("../../../tests/test.yaml")
print(inp)
{}

The from_string method

This allows to initialize an InputParams instance from a string written as a yaml file:

[32]:
# You can even format that string to modify it according to your needs
base_inp = """\
dft:
    rmult: {}
    hgrids: [0.35, 0.35, 0.35]"""
for i, rm in enumerate([[5, 7], [6, 8], [7, 9]]):
    inp = InputParams.from_string(base_inp.format(rm))
    print(f"Input parameters n°{i+1}: {inp}")
Input parameters n°1: {'dft': {'rmult': [5, 7], 'hgrids': [0.35, 0.35, 0.35]}}
Input parameters n°2: {'dft': {'rmult': [6, 8], 'hgrids': [0.35, 0.35, 0.35]}}
Input parameters n°3: {'dft': {'rmult': [7, 9], 'hgrids': [0.35, 0.35, 0.35]}}

However, the same result can be achieved by using the basic initialization procedure. The following code should be prefered:

[33]:
for i, rm in enumerate([[5, 7], [6, 8], [7, 9]]):
    inp = InputParams({"dft": {"rmult": rm, "hgrids": [0.35]*3}})
    print(f"Input parameters n°{i+1}: {inp}")
Input parameters n°1: {'dft': {'rmult': [5, 7], 'hgrids': [0.35, 0.35, 0.35]}}
Input parameters n°2: {'dft': {'rmult': [6, 8], 'hgrids': [0.35, 0.35, 0.35]}}
Input parameters n°3: {'dft': {'rmult': [7, 9], 'hgrids': [0.35, 0.35, 0.35]}}