The Root Mean Square Displacement (RMSD) from a reference position is a useful physical quantity often calculated in MD simulations. However, computing it in LAMMPS is not an immediate nor (apparently) easy task. Existing libraries like COLVARS can help us, but sometimes the documentation is not immediate and relevant examples are missing.
To compute the RMSD from a reference position we should firstly look at the documentation page. It seems a bit obscure, but it is easier than it looks. Few ingredients are required, namely the set of atoms of we want to compute the RMSD (in most cases, all the atoms in the simulation) and the reference configuration/frame/set of atoms with respect to which the RMSD must be computed.
The list of atoms of interest is a list of indexes that must be passed to the atoms
argument of the collective variable.
The reference position can be passed either as a space-separated list of (x, y, z) triplets using the refPositions
argument, or as an external .xyz
file with the coordinates of all atoms to the refPositionsFile
argument.
Let’s dive into both these two arguments, starting from the latter.
The reference frame (which we call henceforth reference.xyz
) in the XYZ form must be formatted as follows
N_atoms
#atom_type x y z
atom_1_type x1 y1 z1
atom_2_type x2 y2 z2
atom_3_type x3 y3 z3
...
atom_Natoms_type xN yN zN
with the comment line starting with the pound #
sign.
To create this kind of file, e.g. from a .pdb
file containing only our reference structure, we may use a bash script like
egrep "^ATOM" infile.pdb | wc -l > reference.xyz
echo "#atom_type x y z" >> reference.xyz
cat input_file.pdb | awk '{ print $3, $7, $8, $9}' >> reference.xyz
For example, looking at the .pdb
file of the crystallographic structure of Murray Valley Encephalitis virus xrRNA, we should obtain
1466
#atom_type x y z
P 3.326 -1.887 19.303
OP1 2.590 -0.731 18.718
OP2 2.577 -3.106 19.724
O5' 4.453 -2.331 18.261
C5' 5.689 -1.630 18.160
...
C6 35.003 12.071 -6.588
Obviously, in the MD simulation you must have N atoms, no more no less — I do not know what could happen if N_atoms
does not coincide the simulated ones.
My guess is that the RMSD would be calculated only between the N_atoms
atoms, but do not take these words for granted.
Then, we must create a COLVARS input file that tells LAMMPS to compute the RMSD.
For example, in this case we create a meta.dat
file:
colvarsTrajFrequency 1000
colvarsRestartFrequency 100000
colvarsTrajAppend on
colvar {
name RMSD
width 0.01 # Angstrom
rmsd {
atoms { atomNumbersRange 1-N_atoms }
refPositionsFile reference.xyz
}
}
A few notes.
colvarsTrajFrequency 1000
is required to print the RMSD every 1000 steps, colvarsRestartFrequency 100000
to save a restart file for COLVARS every 100000 steps, and colvarsTrajAppend on
is to append the data after every restart.
If the last command is omitted, after the restart the whole output file (usually out.colvars.traj
) would be overwritten, whiile we want to append the data.
In the LAMMPS input file we must specify the ONE AND ONLY input file with the COLVARS module definitions, like
fix meta all colvars meta.dat
where meta
is just the name of the LAMMPS fix.
And voila 🪄 if we launch the simulation with this commands, a meta.dat
output file should be generated, containing a column identified with #rmsd
that prints the RMSD of the set of atoms indexed from 1 to N_atoms
!
🖥️ Coding