Try   HackMD

Preparation of the Inputs Files for Advanced Sampling Simulations - Part 1

tags: MDTutorials

In this tutorial, we focus on a detailed procedure for preparing input files for advanced sampling simulation. There are 2 parts in this tutorial:

  • Preparation of the Input Files for Advanced Sampling Simulations - Part 1
    This part of the tutorial focuses on the workflow starting from downloading the PDB file from Protein Data Bank to prepare input files for advanced sampling methods in general, including a coordinate file, a topology file, and an index file. We choose PLCC/PLCG1 protein-ligand binding complex as our system of interest in this part of the tutorial. Sections in this part of the tutorial include:

    • Section 1. Introduction
    • Section 2. Modeling the missing residues in a protein
    • Section 3. Topology preparation of PLCC/PLCG1-pep
    • Section 4. Solvation and neutralization of the system
    • Section 5. Energy minimization, equilibration and a short molecular dynamics (MD) simulation
    • Section 6. The preparation process in brief
  • Preparation of the Input Files for Advanced Sampling Simulations - Part 2 (unpublished for now)
    In this part of the tutorial, we will first introduce methods specifically for preparing the input files for Hamiltonian replica exchange and expanded ensemble simulations, given the files prepared in the first part of the tutorial using PLCC/PLCG1 protein-ligand binding complex as the system of interest. Then, we will utilize the workflows in both parts of the tutorial to different but similar systems, including the protein-ligand binding complexes like PLCC/ErbB2-pep and PLCC/IR-pep in a more brief way compared to PLCC/PLCG1-pep.
    Sections in this part of the tutorial include:

    • Section 1. Preparation of the input files for Hamiltonian replica exchange molecular dynamics (HREMD)
    • Section 2. Preparation of the input files for expanded ensemble simulations
    • Section 3. Other similar protein-ligand binding complexes

Section 1. Introduction

In this part of tutorial, our system of interest, PLCC/PLCG1-pep is the binding complex between the C-terminal SH2 domain of Phosphoplipase C,

Ξ³ 1 (PLCC) and its own intraprotein recognition site (PLCG1-pep). The X-ray crystallography structure of this binding complex could be downloaded from Protein Data Bank (PDB) with PDB ID as 4k45. The target of this tutorial is to provide a detailed method to prepare the GROMACS input files for further advanced sampling simulations, in our case, Hailtonian replica exchange and expanded ensemble simulations. These input files include a topology file (.top), a configuration file (.gro) file and an index file (.ndx) file. If you are already familiar with the reasons for each step in the whole process and just want a quick look at all the command used, please directly jump to Section 6. For information about the installation of the software used in this tutorial, please refer to the wiki page in Shirts group.

Section 2. Modeling of the missing residues in a protein

Section 2-1. System of interest

When simulating a protein-ligand binding complex, the first thing to check is the missing residues and the missing atoms, which should be documented in the header of the PDB file. For example, REMARK 465 shows the missing residues of the binding complex as follows:

REMARK 465   M RES C SSSEQI                                                     
REMARK 465     SER A   661                                                      
REMARK 465     ILE A   764                                                      
REMARK 465     GLY A   765                                                      
REMARK 465     THR A   766                                                      
REMARK 465     ASP B   770                                                      
REMARK 465     TYR B   771                                                      
REMARK 465     GLY B   772                                                      
REMARK 465     ALA B   773                                                      
REMARK 465     LEU B   774                                                      
REMARK 465     TYR B   775                                                      
REMARK 465     GLU B   776                                                      
REMARK 465     GLY B   777                                                      
REMARK 465     ARG B   778                                                      
REMARK 465     ASN B   779

As shown above, 4 residues of the protein (PLCC, chain A) were missing (one at the beginning and 3 in the end). On the other hand, the first 10 residues of the peptide (PLCG1-pep, chain B) were missing. These residues are missing generally because they are too flexible and floppy, so the electron density for these residues is too low to resolve the structure. To make the structure for the simulation consistent with the one investigated by the experimental methods, our goal in Section 2., in this case, is to use the software MODELLER to rebuild all the 4 missing residues of the protein, and the asparagine residue of the peptide (residue 779). Once the rebuilding of the missing residues is completed, you should also found that MODELLER also automatically rebuilt all the missing atoms of side chains of the protein. To start with, make a folder Build, download the PDB file 4k45.pdb from the link above to the folder and follow the steps below in the folder.

Section 2-2. Preparation of the alignment file

To generate an alignment file, first, we have to generate a sequence file from 4k45.pdb. To do so, run the Python script sequence.py which has the following content

from modeller import * # Get the sequence of the 1qg8 PDB file and write to an alignment file code = '4k45' e = environ() m = model(e, file=code) aln = alignment(e) aln.append_model(m, align_codes=code) aln.write(file=code+'.seq')

As a result, a sequence file 4k45.seq will be produced, which contains the sequence of the structure in 4k45.pdb (not including the missing residues) as below.

>P1;4k45
structureX:4k45: 662 :A:+109 :B:MOL_ID  1; MOLECULE  1-PHOSPHATIDYLINOSITOL 4,5-BISPHOSPHATE PHOSPHODI GAMMA-1; CHAIN  A; FRAGMENT  C-TERMINAL SH2 (CSH2) DOMAIN; SYNONYM  PHOSPHOINOSITIDE PHOSPHOLIPASE C-GAMMA-1, PHOSPHOL GAMMA-1, PLC-GAMMA-1; EC  3.1.4.11; ENGINEERED  YES; MOL_ID  2; MOLECULE  1-PHOSPHATIDYLINOSITOL 4,5-BISPHOSPHATE PHOSPHODI GAMMA-1, SHORT PEPTIDE; CHAIN  B; FRAGMENT  RESIDUES 770 TO 787 OF PLC-GAMMA1; SYNONYM  PHOSPHOINOSITIDE PHOSPHOLIPASE C-GAMMA-1, PHOSPHOL GAMMA-1, PLC-GAMMA-1; EC  3.1.4.11; ENGINEERED  YES:MOL_ID  1; ORGANISM_SCIENTIFIC  RATTUS NORVEGICUS; ORGANISM_COMMON  RAT; ORGANISM_TAXID  10116; GENE  PLCG1; EXPRESSION_SYSTEM  ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID  562; MOL_ID  2; SYNTHETIC  YES; ORGANISM_SCIENTIFIC  RATTUS NORVEGICUS; ORGANISM_COMMON  RAT; ORGANISM_TAXID  10116: 1.50: 0.16
QAESKEWYHASLTRAQAEHMLMRVPRDGAFLVRKRNEPNSYAISFRAEGKIKHCRVQQEGQTVMLGNSEFDSLVD
LISYYEKHPLYRKMKLRYPINEEALEK/PGFVEAN*

Note that the sequences should be ended up with an *, and the forward-slash / in the sequence serves as a breaker between different chains. So the part before the slash is the sequence of the protein, while the remaining part is the sequence of the peptide. In the sequence of the peptide, it should be noticed that the phosphotyrosine residue was ignored by MODELLER, which is a result of the HETATM records of the phosphotyrosine.

In the alignment file alignment.ali, which can be manually made from the content in 4k45.seq, there should be two sequences shown as follows:

>P1;4k45
structureX:4k45: 662 :A:+110 :B:MOL_ID  1; MOLECULE  1-PHOSPHATIDYLINOSITOL 4,5-BISPHOSPHATE PHOSPHODI GAMMA-1; CHAIN  A; FRAGMENT  C-TERMINAL SH2 (CSH2) DOMAIN; SYNONYM  PHOSPHOINOSITIDE PHOSPHOLIPASE C-GAMMA-1, PHOSPHOL GAMMA-1, PLC-GAMMA-1; EC  3.1.4.11; ENGINEERED  YES; MOL_ID  2; MOLECULE  1-PHOSPHATIDYLINOSITOL 4,5-BISPHOSPHATE PHOSPHODI GAMMA-1, SHORT PEPTIDE; CHAIN  B; FRAGMENT  RESIDUES 770 TO 787 OF PLC-GAMMA1; SYNONYM  PHOSPHOINOSITIDE PHOSPHOLIPASE C-GAMMA-1, PHOSPHOL GAMMA-1, PLC-GAMMA-1; EC  3.1.4.11; ENGINEERED  YES:MOL_ID  1; ORGANISM_SCIENTIFIC  RATTUS NORVEGICUS; ORGANISM_COMMON  RAT; ORGANISM_TAXID  10116; GENE  PLCG1; EXPRESSION_SYSTEM  ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID  562; MOL_ID  2; SYNTHETIC  YES; ORGANISM_SCIENTIFIC  RATTUS NORVEGICUS; ORGANISM_COMMON  RAT; ORGANISM_TAXID  10116: 1.50: 0.16
-QAESKEWYHASLTRAQAEHMLMRVPRDGAFLVRKRNEPNSYAISFRAEGKIKHCRVQQEGQTVMLGNSEFDSLVD
LISYYEKHPLYRKMKLRYPINEEALEK---/-PGF.VEAN*

>P1;4k45_fill
structureX:4k45: 662 :A:+110 :B:MOL_ID  1; MOLECULE  1-PHOSPHATIDYLINOSITOL 4,5-BISPHOSPHATE PHOSPHODI GAMMA-1; CHAIN  A; FRAGMENT  C-TERMINAL SH2 (CSH2) DOMAIN; SYNONYM  PHOSPHOINOSITIDE PHOSPHOLIPASE C-GAMMA-1, PHOSPHOL GAMMA-1, PLC-GAMMA-1; EC  3.1.4.11; ENGINEERED  YES; MOL_ID  2; MOLECULE  1-PHOSPHATIDYLINOSITOL 4,5-BISPHOSPHATE PHOSPHODI GAMMA-1, SHORT PEPTIDE; CHAIN  B; FRAGMENT  RESIDUES 770 TO 787 OF PLC-GAMMA1; SYNONYM  PHOSPHOINOSITIDE PHOSPHOLIPASE C-GAMMA-1, PHOSPHOL GAMMA-1, PLC-GAMMA-1; EC  3.1.4.11; ENGINEERED  YES:MOL_ID  1; ORGANISM_SCIENTIFIC  RATTUS NORVEGICUS; ORGANISM_COMMON  RAT; ORGANISM_TAXID  10116; GENE  PLCG1; EXPRESSION_SYSTEM  ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID  562; MOL_ID  2; SYNTHETIC  YES; ORGANISM_SCIENTIFIC  RATTUS NORVEGICUS; ORGANISM_COMMON  RAT; ORGANISM_TAXID  10116: 1.50: 0.16
SQAESKEWYHASLTRAQAEHMLMRVPRDGAFLVRKRNEPNSYAISFRAEGKIKHCRVQQEGQTVMLGNSEFDSLVD
LISYYEKHPLYRKMKLRYPINEEALEKIGT/NPGF.VEAN*

As shown above, several changes need to be made to make an alignment file from 4k45.seq:

  • To make the sequence 4k45 in the alignment file, just copy the content of 4k45.seq, identify the missing residues and add a - to represent each residue.
  • Note that in both sequences, we use . to represent the HETATM residue (e.g. DNA, RNA, ligand, etc.), which is the phosphotyrosine residue in this case. In MODELLER, . is called BLK or block residue type. Accordingly, since the phosphotyrosine was ignored in 4k45.seq, we have to correct the number of residues once we include phosphotyrosine. Therefore, change A:+109 into A:+110.
  • The second sequence in the alignment file, 4k45_fill is a complete sequence with all the missing residues filled that serves as a template. Make sure that the two sequences have equal numbers of residues and are aligned with each other.

Section 2-3. Modeling of the missing residues using automodel

After getting the alignment file, we should be able to use it as the input to the modeling using automodel. Specifically, we run the script build.py which has the following content:

from modeller import * from modeller.automodel import * # Load the automodel class log.verbose() env = environ() # directories for input atom files env.io.atom_files_directory = ['.', '../atom_files'] env.io.hetatm = True a = automodel(env, alnfile = 'alignment.ali', knowns = '4k45', sequence = '4k45_fill') a.starting_model = 1 a.ending_model = 1 a.make()

Remember that by default all HETATM records are ignored. Therefore, to read all the HETATM from the template, we have to turn on env.io.hetatm by setting env.io.hetatm = True. After executing build.py, we should get only one PDB file, 4k45_fill.B99990001.pdb as a result of setting a.starting_model = 1 and a.ending_model = 1. (Also note that all the water molecules have been removed.) Use VMD (Visual Molecular Dynamics) to visualize the structure to see it seems reasonable, which means that there should be no broken molecules, missing atoms, or other anomalies. (In this case, check if the phosphotyrosine still stays inside the binding pocket composed of 4 arginine residues.) As a record, the following is the output of using build.py:

<< end of ENERGY.
openf___224_> Open           4k45_fill.B99990001.pdb
wrpdb___568_> Residues, atoms, selected atoms:      115      949      949

>> Summary of successfully produced models:
Filename                          molpdf
----------------------------------------
4k45_fill.B99990001.pdb        719.84387

Note that if a wrong is used in build.py, then the atoms might be erroneously added, which might lead an error when using H++ web sever to perform parameterization. (An error like FATAL: Atom .R<NASN 107>.A<OXT 17> does not have a type. might occur.)

Section 3. Topology preparation of PLCC/PLCG1-pep

Section 3-1: Concepts of parameterization by AmberTools

To prepare the topology file of a structure, we first have to decide which force field to use. In the case of protein-ligand binding complexes, one of the frequently chosen force fields is AMBER99SB-ILDN, which is an all-atom force field. However, simply using the GROMACS program pdb2gmx will not work, since the peptide in our case contains a non-standard residue, phosphotyrosine (PTR), which is not included in the residue topology database (aminoacids.rtp in the folder /usr/share/gromacs/top/amber99sb-ildn.ff). of AMBER99SB-ILDN force field. Generally, there are multiple ways to deal with this issue, and the most intuitive way might be manually adding the parameters of PTR to aminoacids.rtp and modifying residuetypes.dat in the folder /usr/share/gromacs/top then running pdb2gmx. However, it might be tedious to get the parameters of PTR, so it could be a better choice to rely on other tools (so that pdb2gmx is not required).

When it comes to parameterizing a small molecule or a non-standard amino acid, Amber is often a useful tool. Amber is another molecule dynamics simulation engine like GROMACS and it also contains package (AmberTools) that is able to generate topology and coordinate files convertible to GROMACS input files. While the procedure of parameterization using AmberTools could be complicated, a web server H++ developed by Onufriev et al. is able to automize the whole process of the parameterization of either proteins or non-protein structures. However, it is still recommended to know how Amber works to parametrize a structure rather than using H++ as a black box. Therefore, here we introduce the core concepts of parameterization using AmberTools as below. If you are new to Amber or AmberTools, it is highly recommended to refer the manual of Amber 2018 or 2019, especially Section 1, 12, 13, 14, and 15. Also, the following tutorials are recommended to get familiarized with the fundamentals using AmberTools, especially parameterization:

Section 3-1-1. Parameterization of small molecules

Generally, when parameterizing a small molecule (instead of a polypeptide), we have to separate the part of the small molecule from the original structure and use several programs in Amber following the procedures as below to independently parameterize it:

  • Step 1: Use antechamber to assign charges and atom types.
    • Sample command: antechamber -i ${name}.pdb -fi pdb -o ${name}.mol2 -fo mol2 -c bcc -s 2 -nc ${nc} -pl 15
    • Input: a PDB file; output: a .mol2 file.
  • Step 2: Use parmchk2 to build the missing parameters.
    • Sample command: parmchk2 -i ${name}.mol2 -f mol2 -o ${name}.frcmod
    • Input: a .mol2 file; Output: a .frcmod file (parameter file)
  • Step 3: Use tLEaP to generate the topology and updated coordinate files.
    • Sample command: tleap -f tleap.in, which requires a tLEaP input file looks like follows:
      ​​​​​​​source oldff/leaprc.ff99SB                      # load in the protein force field parameters
      ​​​​​​​source leaprc.gaff                              # load the GAFF parameters
      ​​​​​​​mol = loadmol2 input.mol2                       # mol is a variable name (generally the name of the small molecule)
      ​​​​​​​check mol                                       # a check for missing parameters
      ​​​​​​​loadamberparams input.frcmod                    # load the parameters of the small molecule from the frcmod file 
      ​​​​​​​saveoff mol mol.lib                             # create a library file for the small molecule (off = object file format)
      ​​​​​​​saveamberparm mol output.prmtop output.inpcrd   # generate Amber topology file (.prmtop) and coordinate file (inpcrd) (One could also use savepdb instead)
      ​​​​​​​quit                                            # exit out of tleap
      
    • The output files including an Amber topology file (.prmtop file) and an Amber coordinate file (.inpcrd) file. Both files are convertible using InterMol or acpype.py. Note that the missing atoms and the hydrogens will be builts by tLEaP automatically.
  • Refer to the tutorial of Amber for more information.
  • Some useful scripts could be found in the private GitHub repository of Shirts Group. (Directory: Parameterization/GAFF)

Section 3-1-2. Parameterization of peptides containing non-standard amino acids

In our case, however, the ligand is a peptide with a non-standard amino acid (PTR). To generate a topology file of the whole binding complex, there are several ways to do using AmberTools (without using H++):

  • Use AmberTools to parameterize the whole protein-ligand binding complex.
  • Use AmberTools to parameterize the whole peptide ligand and use pdb2gmx to parameterize the protein (with AMBER99SB-ILDN force field), then combine the coordinate files and the topology files output by different parameterizing tools.
  • Use AmberTools to parameterize the phosphotyrosine (instead of the whole ligand) and parameterize the remaining part of the binding complex using pdb2gmx, then combine the coordinate files and the topology files. However, in this case, PTR is not a terminal residue, which means that the peptide ligand will be broken into two pieces when we separate PTR from the original PDB for AmberTools parametrization. To parameterize the remaining part other than PTR, we might have to label the two pieces of the peptide as two different chains (since there is no covalent bond in between).

The procedure of parameterizing using AmberTools in all of these three methods resembles the procedure of parameterizing the small molecule as shown previously, except that antechamber and parmchk2 might not be required. Note that, however, it could be tedious and error-prone when combining the coordinate files if different parts of the peptide are parameterized by different tools. Therefore, the last method is not recommended.

Section 3-2. Preparation of the complex topology file using H++ server

Section 3-2-1. Preparation of the input structure of H++

While H++ web server is able to process the whole complex at a time, it is recommended to parameterize the protein and the peptide ligand separately so that we can have these two entities as two different [ molecultype ] directives in the topology file. Parameterizing the whole complex would lead to only one [ moleculetype ] directive corresponding to the whole complex, which makes it hard for us to only decouple the interactions of the peptide ligand in replica exchange or expanded ensemble simulations.

To separately parameterize the protein and the peptide ligand, we first have to separate the coordinate file. To do so, execute the following command in folder Topology to copy 4k45_fill.B99990001.pdb and save as PLCC.pdb. (Before this, go back to the parent folder of the current folder Build and make another folder Topology, in which all the steps in this section will be carried out.)

cp ../Build/4k_45_fill.B99990001.pdb ./PLCC.pdb

Then, cut the part of the peptide ligand and save it as pep7.pdb.

Note that if there are several images/copies in the original PDB file downloaded from Protein Data Bank, usually there is only one image chosen to proceed to simulations. Since the chosen image is usually not at the center of the simulation box, it is recommended to use the GROMACS program editconf to center the structure to prevent the structure cross the periodic boundaries during energy minimization or other following steps:

gmx editconf -f complex.pdb -c -o complex_center.pdb

However, in our case, there is only one image in the original PDB file, which should be already close to the center of the box, so it is not necessary to do so.

When preparing the structure input file for any service using AmberTools, there are several things to note:

  • If the X-ray unit cell in the PDB file contains more than one image, choose the entity of interest and delete the other(s).
  • If the ligand is a small molecule rather than a peptide, than we have to use software like Avogadro to build the hydrogen atoms (and OpenBabel to convert the file if needed) before using AmberTools, since all the Amber force fields are all-atom force fields.
  • In contrast, there is no need to add hydrogen atoms for proteins and peptides. (Actually, it is recommended to get rid of all protein (or peptide) hydrogens that are explicitly expressed in the PDB file.) The reduce and LEaP utilities in AmberTools add hydrogens automatically with predefined names. Having hydrogens in PDB files with names that LEaP does not recognize within its residue libraries leads to a total mess.
  • Make sure all connectivity records in the PDB file are removed. These are mostly referring to ligands, or, in some cases, to disulfide links. The latter should be explicitly re-connected without relying on connectivity records in the PDB file. (In our case, the connectivity records in pep7.pdb have been removed by MODELLER automatically.)
  • In this case, clean and prepare the input structure PDB file is relatively easy. For other cases, consider pdb4amber, which is introduced in Section 12.4 in the manual.

Obviously, different concerns might need to be taken care of in different cases. More details about the system preparation for AmberTools are elaborated in Section 12 in the manual of Amber 2018. To name just a few,

  • Tautomeric and protonation states are not rendered in PDB files. If a defined state for a residue is required, its name in the PDB file must reflect the choice. See Section 12.2 in the manual of Amber 2018 for more information.
  • Amber preparation modules assume that residues in a PDB file are connected and appear sequentially in the file. If not covalently connect (i.e., linked by an amide bond), the residues must be separated by "TER" records in the PDB file. (Alternatively, the chain ID must change on going from one chain to the next chain.)
  • Similarly, if residues are missing (e.g., not detected in x-ray, or cut by the user), the gap should also be separated by a "TER" record. Terminal residues will be charged by default. If the user wants to avoid this (especially for gaps), these residues should be capped (by ACE and NHE or NME).

With PLCC.pdb and pep7.pdb to start with, now we can use H++ the parameterize them separately.

Section 3-2-2. Preparation of the protein topology file

First, let's parameterize the protein part (PLCC.pdb) using H++. To begin with, click on the tab "Process a structure" on the H++ web server, upload PLCC.pdb and click on "Process File". After that, the user will be directed to a page in which the user can specify parameters used during the parameterization, including salinity, internal dielectric and external dielectric constants, pH value and so on. H++ will decide the protonation states of each residue based on these parameters (mainly the pH value).

For typical physiological conditions, using the default value of 80 for the external dielectric (water) and salinity = 0.15 M is reasonable. The situation is less straightforward with the internal dielectric. If you are mostly interested in deeply buried residues, a lower value is recommended such as 4. If your focus is residues closer to the surface, a higher value is appropriate, such as 10 or even 20. In our case, we have to make sure that the parameters used here are consistent with the experimental conditions. According to Section 2.2.6. (page 37) of Dr. McKercher's thesis, PLCC was dialyzed overnight at 4 Β°C into buffer containing 25 mM Tris, pH 7.4, and 200 mM NaCl and the binding reactions were performed at 25 Β°C (298 K). Here, we will just ignore Tris buffer and set the salinity in the web server as 0.2 (M) and the pH value as 7.4. As for the dieletric constants, we would just use the default values. Note that here we adopt the experimental conditions of ITC instead of X-ray crystallography because what we want to simulate is the binding complex in a solution instead of in a crystal form. ITC and NMR experiments are conducted in solution phase, so we have to make sure that our simulation parameters consistent with the ones used in at least one of these experiments.

Note that H++ is more than parameterizing a structure. As you can see on the page for specifying parameters, it is also able to construct a simulation box, solvate the system and add ions to the box. However, the box types available in H++ are limited to cubic and octahedral boxes. While the volume of an octahedral box is about 77% of a cubic box and can, therefore, decrease computational cost, it is not the best choice compared with a rhombic dodecahedral box available in GROMACS, whose volume is only about 70.7% of a cubic box. Therefore, we are not going to use H++ to solvate the system in a simulation box, but just parameterize the structure and use GROMACS to subsequently complete solvation and neutralization. After setting up all these parameters, click on the tab "Process …" to process the file. (Note that it is recommended to register an account on H++, which would make it easier to manage the submissions.)

As a result, in the results page, the titration sites and their titration curves, isoelectric point (8.69 in our case) and the total charge at pH 7.4 (+3 in our case) will be shown. Also, there are some key files available to be downloaded, including an Amber topology file 0.2_80_10_pH7.4_complex.top and an Amber coordinate file 0.2_80_10_pH7.4_complex.crd. Accordingly, save the Amber topology file and the Amber coordinate file as PLCC.top and PLCC.crd, respectively. However, note that these files are for Amber simulation engine instead of GROMACS, we have to use convert the files to the extensions that are compatible with GROMACS. Since InterMol is only able to convert prmtop file, here we have to use acpype.py to convert the files. To do this, clone the repository that is called "useful-scripts" in Shirts group, copy the folder GAFF in the folder Parameterization and execute the command

python GAFF/acpype.py -x PLCC.crd -p PLCC.top 

As a result, several files will be output, including a GROMACS topology file PLCC_GMX.top, a GROMACS coordinate file PLCC_GMX.gro, and two parameter files for testing the output files with GROAMCS (em.mdp and md.mdp). Since we will also perform energy minimization and short MD simulation later, we are not going to use em.mdp and md.mdp generated here, but the files PLCC_GMX.gro and PLCC_GMX.top are exactly what we want for now.

3-2-3. Preparation of the peptide ligand topology

Basically, the parameters (salinity, pH value and so on) used in the parameterization of the peptide ligand are exactly the same as the ones used in the parameterization of the protein except that several changes have to be made:

  • Change HETATM records to ATOM records
    Note that the records of the PTR residue in pep7.pdb are HETATM instead of ATOM. Since the HETATM records will be ignored by AmberTools during parameterization and would cause a discontinuity in the peptide in our case, we have to change HETATM records of PTR into ATOM records.
  • Deselect the option for correcting residue orientation in H++
    If we process pep7.pdb using H++ using exactly the same settings as we used previously in the parameterization of PLCC.pdb, the following error would occur and be shown in 0.15_80_10_pH6_pep7.tleap.err:
    ​​​​FATAL:  Atom .R<PTR 111>.A<HB1 25> does not have a type.
    
    This error indicates that HB1 atom in PTR does not have a type. Note that this atom was not present in the input PDB file but added by the reduce program in AmberTools used by H++ to correct the orientation of the ASN residue. To avoid this error, we have to deselect the option "Correct orientation of ASN, GLN and HIS groups …" on the page that we specify the parameters. (All the other parameters should remain the same.) While this option is selected by default, skipping it is generally ok. It may affect the accuracy of the predicted protonation state, but the predicted protonation state is an approximation anyway. If you are concerned, you can run with the option turned on. Then look in the reduce.log file under "listings". Search for "FLIP" in this file. It will tell you which groups were flipped. Then you can load your coordinate files in VMD or other viewers to see which groups are near the flipped groups. if the pKa for the nearby groups are far from your pH of interest, then it should not affect predicted protonation state. Very rarely is this not the case. But if it is you can manually change the orientation of the groups.

Given that all the considerations above are taken care of, the parameterization should work well and the total charge at pH 7.4 is -3. Similarly, save the Amber topology file and the Amber coordinate file as pep7.top and pep7.crd, respectively, and execute the following command to convert the files:

python GAFF/acpype.py -x pep7.crd -p pep7.top

As a result, a GROMACS topology file pep7_GMX.top and a GROMACS coordinate file pep7_GMX.gro will be produced.

Section 3-2-4. Combination of the coordinate files and the topology files

Since the simulations will be performed on the whole binding complex, not either only the protein or only the peptide ligand, we have to combine the topology files and the coordinate files of these two entities. To begin with, execute the following command:

cp PLCC_GMX.gro complex_GMX.gro
cp PLCC_GMX.top complex_GMX.top

To combine the coordinate files, we only have to cut the coordinates in pep7_GMX.gro and append to the bottom of complex_GMX.gro, change the number of atoms shown at the top of the file from 1744 to 1878 and save the file. Note that the numbering of the residues of the peptide ligand starts from 1 instead of 107. This is exactly what we want, since this is how the topology file that we are going to make distinguishes different molecule types defined in [ moleculetype ] directive. The numbering of the peptide also starts from 1 instead of 1745, but this will be fixed in the next step using the GROMACS program editconf when we solvate the system. Note that even if the box dimensions in PLCC_GMX.gro and pep7_GMX.gro are different, but it is just a result of different molecule sizes. The relative distance between the protein and the peptide remains the same. To examine this, it is recommended to load complex_GMX.gro and 4k45_fill.B99990001.pdb in VMD. As shown below, it can be seen that the two structures align with each other perfectly. (The one in blue is complex_GMX.gro while the other in red is 4k45_fill.B99990001.pdb.)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Combining the topology files is also straightforward. While making pep7_GMX.top into an .itp file (include topology file) and include the .itp file in complex_GMX.top (using the statement like #include "pep7.itp") should work, here we will just copy the content of pep_GMX.top and paste that to complex_GMX.top so that we don't have to always have both the .itp and .top file in the same directory. However, there are still several things to note. The first is that in complex_GMX.top, there should only be one [ atomtypes ] directive that includes all the atom types in both the protein and the peptide ligand. Therefore, here we adopt the union set of the [ atomtypes ] directives of the two topology files. To do so, we have to append the atom types that only appear in pep7_GMX.top to the [ atomtypes ] directive of complex_GMX.top. Specifically, the appended atom types are:

 OV       OV          0.00000  0.00000   A     2.95101e-01   7.11280e-01 ; 1.66  0.1700
 P        P           0.00000  0.00000   A     3.74177e-01   8.36800e-01 ; 2.10  0.2000
 OT       OT          0.00000  0.00000   A     2.91092e-01   8.78640e-01 ; 1.63  0.2100

Then, copy all the directives other than [ defaults ] and [ atomtypes ] in pep7_GMX.top, including [ atoms ], [ pairs ], [ bonds ], [ angles ] and [ dihedrals ] (including propers and impropers) and paste them right after the last directive (should be [ dihedrals ] directive for the impropers) of the protein. At last, change the system name in [ system ] directive from PLCC to complex in water and add pep7 1 to the bottom of the file. (Note that the molecule type of the peptide ligand is pep7 defined in the [ moleculetype ] directive of the ligand.) As a result, the bottom of the topology file should be:

[ system ]
complex in water

[ molecules ]
; Compound        nmols
 PLCC             1
 pep7             1

Up to this point, the combination of the coordinate files and the topology files should be complete.

Section 4. Solvation and neutralization of the system

Section 4-1. Solvation of the system

To solvate the system in water molecules, we first have to construct a simulation box. (Accordingly, create a new folder Solvation in the directory same level as Topology.) As mentioned, a rhombic dodecahedral box is one of the best choices. To center the protein in a dodecahedral box with a volume such that the distance between the solute and the box is 1.0 nm, execute the following command in folder Solvation:

gmx editconf -f ../Topology/complex_GMX.gro -o complex_box.gro -bt dodecahedron -d 1.0 -c

As a result, a coordinate file complex_box.gro will be generated and it is shown that the volume of the box is 212.339

nm3.

Then, to add water molecules to the system, execute:

gmx solvate -cp ../Topology/complex_em.gro -p ../Topology/complex_GMX.top -o complex_sol.gro -cs

Accordingly, a coordinate file complex_sol.gro and a new topology file complex_GMX.top (in folder Topology) will be produced, both with water molecules added. Note that the old topology file is not overwritten but renamed as #complex_GMX.top.1#. (Note that it is recommnded to make sure that [ molecules ] directive in complex_GMX.top is well-formatted after the solvents are added.)

Section 4-2. Modification of the topology file

After adding water molecules in the simulation box, we then have to add counterions to neutralize the system, which can be done by the GROMACS program genion. To run genion, we need a .tpr file, which can be produced by the grompp command. Basically, grompp command reads a molecule topology file, checks the validity of the file, expands the topology from a molecular description to an atomic description. However, at this point, our topology file does not have atom types and molecule types (like SOL in complex_GMX.top) for the ions and solvent molecules, but only the ones for the binding complex. Therefore, to enable GROMACS to recognize moleculetype such as SOL and CL when we run the grompp command, we have to modify the topology file complex_GMX.top.

In this case, we will borrow these parameters (moleculetype and atomtypes) from AMBER99SB-ILDN force field and use TIP3P water model. Accordingly, add the following lines to the topology file complex_GMX.top, right before the [ system ] directive:

; Include water topology
#include "amber99sb-ildn.ff/tip3p.itp"

; Include topology for ions
#include "amber99sb-ildn.ff/ions.itp"

If these lines are not added, errors like No such moleculetype SOL (which could be avoided by including tip3p.itp) and No such moleculetype CL (which could be avoided by including ions.itp would occur. (Note that it is not recommended to delete the [ defaults ] directive in the topology file and add #include "amber99sb-ildn.ff/forcefield.itp", since there might be duplicates of atom types and the original ones might be overridden.)

To make the include topology tip3p.itp and ions.itp effective, all the atom types used in defining the moleculetypes in these files must be defined in the [ atomtypes ] directive in our topology file complex_GMX.top. However, obviously most atom types used in these two include topology files are not defined in complex_GMX.top. Since all the atom types used in these .itp files can be found in ffnonbonded.itp, to solve the issue, just copy the parameters needed from ffnonbonded.itp and append to the [ atomtypes ] directive in complex_GMX.top with a format consistent with the existing parameters. Specifically, the following lines are add:

 OW       OW          0.00000  0.00000   A      0.315075      0.635968
 HW       HW          0.00000  0.00000   A             0             0
 Cl       Cl          35.45    0.00000   A    4.40104e-01    4.18400e-01
 Na       Na          22.99    0.00000   A    3.32840e-01    1.15897e-02
 IB       IB          0.00000  0.00000   A    8.90899e-01    4.18400e-01
 C0       C0          0.00000  0.00000   A    3.05240e-01    1.92376e+00
 MG       MG          0.00000  0.00000   A    1.41225e-01    3.74342e+00
 K        K           0.00000  0.00000   A    4.73602e-01    1.37235e-03
 Rb       Rb          0.00000  0.00000   A    5.26699e-01    7.11280e-04
 Cs       Cs          0.00000  0.00000   A    6.04920e-01    3.37230e-04
 Li       Li          0.00000  0.00000   A    2.02590e-01    7.65672e-02
 Zn       Zn          0.00000  0.00000   A    1.95998e-01    5.23000e-02

If the lines above are not added, an error like Atomtype XX not found would occur. One thing to note here is that in complex_GMX.top, the mass and charge in the [ atomtypes ] directive can be modified in [ atoms ] directive. If masses are not provided in [ atoms ], they will be referenced from [ atomtypes ]. On the other hand, charges present in [ atomtypes ] are never used for anything and must be specified in [ atoms ]. Therefore, it is fine to have zero masses and charges in [ atomtypes ], as you can see in complex_GMX.top in our case. (MODELLER has automatically assigned zero masses and charges.) However, at this point, [ atoms ] directive only contains atoms in the binding complex. Therefore, when grompp after the ions (like sodium ion(type Na) or chloride ion(type Cl)) are added, the [ atomtypes ] will be referenced for the mass of the ions. Accordingly, to avoid errors like atom CL (Res CL-1) has mass 0 (state A) / 0 (state B), the mass of the atom types Cl and Na should not be zero, as you can see in the added lines above. (For the mass of any atom type, refer to ffnonbonded.itp.) For more information, check one of the discussion on GROMACS mailing list.

In addition, note that it is recommended to check if there are duplicates of atom types and if the atom types in each residue defined in aminoacids.rtp match the atom types of the equivalent atoms in the topology file complex_GMX.top.

Note that in aminoacids.rtp, the atomtype of the CA atom of LEU is CT, while that of LEU is CX in complex_GMX.top. Similarly, the atom type of the CB atom of LEU in aminoacids.rtp and complex_GMX.top is CT and A2C, respectively. Since A2C and CX defined in complex_GMX.top and CT in ffnonbonded.itp have exactly the same parameters (sigma being 3.39967e-01 and epsilon being 4.57730e-01), these three atom types are all effectively the same. At the moment, it is not clear why MODELLER assigned A2C and CX instead of CT, but as long as the parameters are faithfully assigned and the process ends up being straightforward, all other details are flexible and leaving the atom types like this should be fine. After modifying the topology as above, no errors relevant to atom types or molecule types will be produced in the following simulations.

Section 4-3. Addition of the sodium and chloride ions

Section 4-3-1. Tweaking the partial charges in the topology file if needed.

In a lot of cases, it is common to have net charges on the binding complex and the neutralization of the system is required. Generally, the process is carried out by adding sodium ions and the chloride. These added ions not only neutralize th system, but also specify the salinity of the system. Therefore, even if we happen to have neutral binding complex (protein: +3, peptide: -3 at pH 7.4.), we still have to add ions to the system to reach the salinity of 200 mM of NaCl as used in the ITC experiment. To add ions, we first have to run the following command to generate a .tpr file:

gmx grompp -f ions.mdp -c complex_sol.gro -p ../Topology/complex_GMX.top -o complex_ions.tpr

where the parameter file ions.mdp has the following content:

; ions.mdp - used as input into grompp to generate ions.tpr
; All unspecified parameters adopt their own default values.

; Run Control
integrator  = steep         ; Algorithm (steep = steepest descent minimization)
nsteps      = 500000        ; (0) maximum number of steps to integrate or minimize

; Energy minnimization
emtol       = 10.0        ; (10.0) Stop minimization when the maximum force < 100.0 kJ/mol/nm
emstep      = 0.01          ; (0.01) [nm] Minimization step size

; Neighbor searching/Electrostatics/Van der Waals
cutoff-scheme   = Verlet    ; Buffered neighbor searching
nstlist         = 10        ; (10) Frequency to update the neighbor list and long-range forces
ns_type         = grid      ; Method to determine neighbor list (simple, grid)
pbc             = xyz       ; Periodic Boundary Conditions in all 3 dimensions
coulombtype     = PME       ; Treatment of long-range electrostatic interactions
rcoulomb        = 1.0       ; Short-range electrostatic cut-off
rvdw            = 1.0       ; Short-range Van der Waals cut-off

In our case, no errors should occur when generating a .tpr file. However, in some other cases, it is possible the execution of the command will be stopped due to a warning like below: (The warning below is from the case of the binding complex at pH 6.)

WARNING 1 [file complex_GMX.top, line 17625]:
  You are using Ewald electrostatics in a system with net charge. This can
  lead to severe artifacts, such as ions moving into regions with low
  dielectric, due to the uniform background charge. We suggest to
  neutralize your system with counter ions, possibly in combination with a
  physiological salt concentration.

If this warning occurs, generally it should be fine to just ignore it, since the next step is to use genion program in GROMACS to neutralize the system. After all, we have to run grompp before running genion, so it is natural to get this warning at this point. To ignore the warning, just use the flag -maxwarn so that the command would be

gmx grompp -f ions.mdp -c complex_sol.gro -p ../Topology/complex_GMX.top -o complex_ions.tpr -maxwarn 1

However, before running the command above, usually there is another thing that we should take care of. In some cases, you might see that in addition to a warning, there is also a note in the STDOUT of grompp which looks like this:

NOTE 1 [file complex_GMX.top, line 17625]:
  System has non-zero total charge: 2.999915
  Total charge should normally be an integer. See
  http://www.gromacs.org/Documentation/Floating_Point_Arithmetic
  for discussion on how close it should be to an integer.

This note indicates that the total charge of the system is not an integer as it should be. In fact, it is common to get charges that are not exactly an integer, which is just a result of rounding. While the difference between the charges in the topology file (2.999915) and the real value (in this case, 3) is very small, it might be better to spread this difference throughout several atoms. Since the difference is quite small, it would not matter too much which atoms are selected and modified accordingly. To minimize the influence of adjusting the charges, here we can choose to spread the charge difference (0.000085) throughout the protein that is furthest from the peptide, which is residue 106 THR. Since THR has 15 atoms, to spread out the charge difference as even as possible, we choose to add a charge of 0.000005 to each of the first 5 atoms and 0.000006 to the remaining 10 atoms by modifying the topology file complex_GMX.top. Although the note will still probably present after the adjustment (showing that a total charge is closer to 3), at least we do out best we can do here. In fact, such a small charge difference should not cause any observable influence in the following simulations.

Section 4-3-2. Addition of sodium and chlordie ions

To use genion to add sodium and chloride ions, we first have to estimate how many ions should be added, which requires the consideration of the volume of the box and the physiological concentration used. Since we just set the salinity as 0.15 M when using H++ (which means that [NaCl] = 0.15 M), given the volume of the box (222.58

nm3), we can calculate the number of the sodium ions as follows:
(212.339Γ—10βˆ’27)[m3]Γ—1000[Lm3]Γ—0.2[molL]Γ—(6.02Γ—1023)[1mol]β‰ˆ25.565

As a result, since the net charge of the whole binding complex is 0 at pH 7.4 (+3 for the protein and -3 for the peptide ligand), 26 sodium ions and 26 chloride ions should be added. Accordingly, in the same folder Solvation, execute the following command and select Group 15 (SOL) as the continuous group of solvent molecules in the prompt:

gmx genion -s complex_ions.tpr -o complex_ions.gro -p complex_GMX.top -pname NA -nname CL -np 26 -nn 26

Similarly, in folder Topology, a new topology file complex_GMX.top will be produced with ions added and the old one will be renamed as #complex_GMX.top.2#.

Section 5. Energy minimization, equilibration and a short molecular dynamics (MD) simulation

Section 5-1. Energy minimization

Having the solvated, electroneutral system assembled, before we begin dynamics, we must ensure that the system has no steric clashes or inappropriate geometry by relaxing the solvated system to a low-energy state through energy minimization. To start with, in a newly created folder EM, execute a grompp command first to get a .tpr file:

gmx grompp -f em.mdp -c ../Solvation/complex_ions.gro -p ../Topology/complex_GMX.top -o complex_em.tpr

where the parameter file em.mdp has the following content:

; em.mdp - used as input into grompp to generate em.tpr
; All unspecified parameters adopt their own default values.

; Run Control
integrator  = steep         ; Algorithm (steep = steepest descent minimization)
nsteps      = 500000        ; (0) maximum number of steps to integrate or minimize

; Energy minnimization
emtol       = 100.0        ; (10.0) Stop minimization when the maximum force < 100.0 kJ/mol/nm
emstep      = 0.01          ; (0.01) [nm] Minimization step size

; Neighbor searching/Electrostatics/Van der Waals
cutoff-scheme   = Verlet    ; Buffered neighbor searching
nstlist         = 10        ; (10) Frequency to update the neighbor list and long range forces
ns_type         = grid      ; Method to determine neighbor list (simple, grid)
pbc             = xyz       ; Periodic Boundary Conditions in all 3 dimensions
coulombtype     = PME       ; Treatment of long range electrostatic interactions
rcoulomb        = 1.0       ; Short-range electrostatic cut-off
rvdw            = 1.0       ; Short-range Van der Waals cut-off

Then, run mdrun command that requires the .tpr file just produced:

gmx mdrun -s complex_em.tpr -o complex_em.trr -c complex_em.gro -g em.log -e em.edr

As a result, the following information will be printed out as the energy minimization is finished:

Steepest Descents converged to Fmax < 100 in 10645 steps
Potential Energy  = -3.5183519e+05
Maximum force     =  9.8165977e+01 on atom 1876
Norm of force     =  3.6908679e+00

Section 5-2. Equilibration

Energy minimization ensures that we have a reasonable starting structure, in terms of geometry and solvent orientation. To begin real dynamics, we must equilibrate the solvent and ions around the protein and the ligand. If we were to attempt unrestrained dynamics at this point, the system may collapse. The reason is that the solvent is mostly optimized within itself, and not necessarily with the solute. It needs to be brought to the temperature we wish to simulate and establish the proper orientation about the solute (the protein and the ligand). Since NPT ensemble is relatively close to a typical experimental setup, in our case, we will perform equilibration and the subsequent short MD simulation both in NPT ensemble.

While this is generally not necessary, instead of performing NPT equilibration directly, in this case, we choose to perform NVT equilibration first (and then NPT equilibration), since it is often more robust to first equilibrate the temperature of the system under an NVT ensemble before applying a barostat to control the temperature. The simultaneous combination of velocity generation and coordinate scaling under the influence of barostat can introduce instabilities in a system that may be far from equilibrium. To get started, create a folder called Equil in the directory same level of Topology. Then, in folder Equil, create another tow folders: NVT and NPT.

Section 5-2-1. NVT equilibration

To run an NVT equilibration, execute the following command in the folder NVT to generate a .tpr file to run mdrun:

gmx grompp -f nvt_equil.mdp -c ../EM/complex_em.gro -p ../Topology/complex_GMX.top -o complex_equil.tpr

where nvt_equil.mdp has the following content:

; Run control
integrator               = md-vv
tinit                    = 0
dt                       = 0.002
nsteps                   = 100000  ; 200 ps
comm-mode                = Linear
nstcomm                  = 1

; Output control
nstlog                   = 1000
nstcalcenergy            = 1
nstenergy                = 1000
nstxout-compressed       = 1000

; Neighborsearching and short-range nonbonded interactions
nstlist                  = 10
ns_type                  = grid
pbc                      = xyz
rlist                    = 1.0

; Electrostatics
cutoff-scheme         = verlet
coulombtype              = PME
coulomb-modifier     = Potential-shift-Verlet
rcoulomb-switch          = 0.89
rcoulomb                 = 0.9

; van der Waals
vdw-type                 = Cut-off
vdw-modifier             = Potential-switch
rvdw-switch              = 0.85
rvdw                     = 0.9

; Apply long-range dispersion corrections for Energy and Pressure 
DispCorr                 = AllEnerPres

; Spacing for the PME/PPPM FFT grid
fourierspacing           = 0.10

; EWALD/PME/PPPM parameters
fourier_nx               = 0
fourier_ny               = 0
fourier_nz               = 0
pme_order                = 4
ewald_rtol               = 1e-05
ewald_geometry           = 3d
epsilon_surface          = 0

; Temperature coupling
tcoupl                   = v-rescale
nsttcouple               = 1
tc_grps                  = System
tau_t                    = 0.5
ref_t                    = 298

; Pressure coupling is on for NPT
pcoupl                   = no   ; no pressure for NVT equilibration

gen_vel                  = yes
gen-temp                 = 298
gen-seed                 = 6722267; need to randomize the seed each time.

; options for bonds
constraints              = h-bonds  ; we only have C-H bonds here
; Type of constraint algorithm
constraint-algorithm     = lincs
; Highest order in the expansion of the constraint coupling matrix
lincs-order              = 12
lincs-iter               = 2

With complex_equil.tpr, run the following command to perform NVT equilibration:

gmx mdrun -s complex_equil.tpr -o complex_equil.trr -c complex_equil.gro -g equil.log -e equil.edr

As a result, it took about 9 to 10 minutes to finish the equilibration. After the equilibration is complete, we can use gmx energy to output xvg files containing the data of potential and temperature as a function of time. As shown below, it is clear that the temperature of the system quickly reached the target value (298 K) and remained stable over the remainder of the equilibration. (Although there is fluctuation, the average is no longer drifting. The average temperature over the last 100 ps of equilibration is 297.316 K.) Similarly, the potential energy quickly reached a plateau, which indicates that the NVT equilibration here should be adequate.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Section 5-2-2. NPT equilibration

After the NVT equilibration is completed, we then want to run another NPT equilibration to reach the same ensemble in the MD simulation later (also NPT ensemble). To generate a .tpr file for running an NPT equilibration, in folder NPT, execute the following command:

gmx grompp -f npt_equil.mdp -c ../NVT/complex_equil.gro -t ../NVT/state.cpt -p ../../Topology/complex_GMX.top -o complex_equil.tpr

Note that in the command above, the checkpoint file NVT/state.cpt from NVT equilibration is passed to the -t option. The checkpoint file contains a complete description of the thermodynamic state of the system, in high precision. To continue a simulation, the checkpoint file must be passed to grompp. If it is omitted, information regarding velocities, thermostat variables, and high-precision coordinates will be lost.

In addition, the parameter file md.mdp is exactly the same as nvt_equil.mdp except (check one of the discussions on GROMACS mailing list or a GROMACS tutorial by Dr. Lemkul for more information about NPT equilibration):

  • Pressure coupling
    Instead of using pcoupl = no in the NVT equilibration, here we use Berendsen barostat (pcoupl = Berendsen). Although Berendsen barostat is actually not a good barostat for MD simulation since it artificially suppresses fluctuations, in the equilibration process fluctuation, suppression is wanted. (In production MD simulation, Parrinello-Rahman barostat is recommended.)
  • continuation = yes
    In the section of constraint algorithm, we set the parameter continuation to yes. (In the previous NVT equilibration, the default value no was adopted, since the constraints need to be solved at the first time step.) This setting indicates that this simulation run is the continuation of a previous run (NVT equilibration), in which constraints were applied. To preserve continuity, the constraints should not be re-solved, and the input coordinate should be assumed to satisfy the constraint algorithm being applied.
  • gen_vel = no
    Velocities are read from the checkpoint file of the NVT equilibration, so there is no need to resample the velocity from the Maxwell-Boltzmann distribution. Re-generating the velocity negates the purpose of prior equilibration completely.

With complex_equil.tpr, we can perform an NPT equilibration by running the following command:

gmx mdrun -s complex_equil.tpr -o complex_equil.trr -c complex_equil.gro -g equil.log -e equil.edr

As a result, it also took about 9 to 10 minutes to complete the equilibration. Similarly, using gmx energy, we can analyze the pressure and the volume as a function of time. As shown below, the pressure of the system is

βˆ’8.1613Β±174.302 (bar), which seems very different from our target pressure (1 bar). However, the pressure itself is a quantity that fluctuates wildly in a microscopic system. As a result, instantaneous values are going to vary considerably, and average values may not correspond exactly to the desired target value. However, given the huge standard deviation of pressure during NPT equilibration, the average obtained during the simulation is statistically indistinguishable from the target of 1 bar. An additional indicator of the convergence of NPT equilibration is the density or the volume of the system, which are less prone to the fluctuations observed in the instantaneous pressure values. As shown in the figure on the right, the volume shrank to about 209
nm3
and the average stopped drifting after about 30 ps of equilibration. (The average volume over the last 100 ps is 208.793
nm3
.) Since the volume is essentially constant over the duration of NPT equilibration, it is reasonable to conclude that the system is adequately equilibrated and unrestrained simulation can be initiated.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Section 5-3. A short molecular dynamics (MD) simulation

To provide a reasonable structure as the input to further advanced sampling simulations, after equilibration, we conduct a short MD simulation of 5 ns, which should be sufficient for the system to relax. To run the simulation, first, create another folder short_MD and in the directory same level as Topology and execute the following command in the folder:

gmx grompp -f md.mdp -c ../Equil/NPT/complex_equil.gro -p ../Topology/complex_GMX.top -o complex_md.tpr -t ../Equil/NPT/state.cpt

Similarly, -t flag is used to preserve the previous work of NPT equilibration. The parameter file (md.mdp) used here is exactly the same as npt_equil.mdp except for the number of time steps (2500000, 5 ns instead of 100000, 200 ps) and the barostat (Parrinello-Rahman barostat instead of Berendsen barostat). As mentioned, a P-R barostat is generally recommended in production MD simulations.

gmx mdrun -s complex_md.tpr -o complex_md.trr -c complex_md.gro -g md.log -e md.edr

Section 5-4. Postprocessing of the simulation outputs

Section 5-4-1. Preparation of an index file

Index groups are necessary for almost every GROMACS program. All these programs can generate default index groups. However, when analyzing the data or apply restraint as we will do in the following advanced sampling simulations, sometimes we need index groups other than the ones defined in the default index file. In this case, we have to use make_ndx command to define our own special index groups.

First of all, we might need to center the whole binding complex in the simulation box when visualizing the trajectory later. While there is a group called "Protein" in the default index file that should include both the protein and the peptide ligand in our case, it actually does not include phosphotyrosine in the ligand, since phosphotyrosine was not recognized by GROMACS but with aids of AmberTools as we perform parameterization previously. Therefore, we have to define a group for the binding complex on our own. To start with, execute the command:

gmx make_ndx -f complex_md.gro -o complex.ndx

In the prompt, type a 1-1878 so that a new group, Group 24, will be created. To rename this newly created group, type name 24 Binding_Complex. To save the file and exit, type q.

Note that, however, we still need other special index groups defined in the index file. Specifically, in both Hamiltonian replica exchange and expanded ensemble, we will have to apply a harmonic distance restraint between the phosphorus atom of PTR and each of the 4 arginine residues that compose the binding pocket to restrain the peptide in the binding pocket to prevent the greasy core problem. Therefore, we need to define these groups using the command below:

gmx make_ndx -f complex_md.gro -n complex.ndx -o complex.ndx

Note that in the command above, the newly created index file we just made is passed by the -n flag and a new index file with the same name will be created. (In this case, the old index file will be renamed as #complex.ndx.1#.) Without using the flag -n, the group Binding_Complex we just defined in the old index file will not appear in the new index file. To define the groups for applying the distance restraints, we first have to find out atoms involved in the 4 arginine residues and the phosphorus atom in PTR. Typically, it is safer to identify these atoms from a .gro file with the help of VMD visualization (as elaborated in the next section) to make sure that the right atoms are chosen. The following are the commands for choosing the atoms involved in each entity participating the distance restraints to be applied in advanced sampling simulations:

  • Peptide ligand (pep7, not needed for the restraints we are going to use but recommended to be set up): a 1742-1878, name 25 pep7
  • Phosphorus atom in PTR (P_atom): a 1817, name 26 P_atom
  • Arg1: ri 15, name 27 Arg1
  • Arg2: ri 34, name 28 Arg2
  • Arg3: ri 36, name 29 Arg3
  • Arg4: ri 56, name 30 Arg4

After the execution of the command make_ndx, the bottom part of the index file complex.ndx should be as follows (the following is the output of the command tail complex.ndx -n 25):

[ pep7 ]
1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756
1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771
1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786
1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801
1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816
1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831
1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846
1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861
1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876
1877 1878
[ P_atom ]
1817
[ Arg1 ]
 220  221  222  223  224  225  226  227  228  229  230  231  232  233  234
 235  236  237  238  239  240  241  242  243
[ Arg2 ]
 366  367  368  369  370  371  372  373  374  375  376  377  378  379  380
 381  382  383  384  385  386  387  388  389
[ Arg3 ]
 574  575  576  577  578  579  580  581  582  583  584  585  586  587  588
 589  590  591  592  593  594  595  596  597
[ Arg4 ]
 894  895  896  897  898  899  900  901  902  903  904  905  906  907  908
 909  910  911  912  913  914  915  916  917

Section 5-4-2. Visualization of the output structure

(Note: the figures and the data shown in Section 5-4-2. are actually from the simulation of the binding complex at pH6, but the visuzliation method and the things we should care about are exactly the same.) After the simulation is complete, it is crucial to see if the simulation is at least visually reasonable. Using Visual Molecular Dynamics (VMD), we are able to visualize the output structure from the short MD simulation. As a result, as you can see below, the box is rectangular rather than dodecahedral as we specified when using editconf command, which is a result of the fact that GROMACS always keeps the particles in a brick-shaped volume for efficiency even when simulation using a triclinic box. (Note: rhombic dodecahedra are special cases of triclinic unit cells.) One can use trjconv program to convert the trajectory to a different unit-cell representation. (See GROMACS documentation) for more information.)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

From the figure, it can also be observed that the protein is out of the box and detached from the peptide ligand, this is not a result of the peptide unbound from the protein, but a result of the protein crossing the periodic boundaries of the box. (To confirm that, just observe which side of the protein the peptide is on and compare that to the initial PDB structure, say complex_GMX.gro.) To recenter the protein and wrap the periodic boundaries, execute the command below in the Tk Console in VMD:

pbc wrap -centersel "protein" -center com -compound residue -all

As a result shown below, a triclinic representation of the system is recovered, with protein centered.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

In addition, we can use the command as follows in Tk Console to save the structure as a .gro file if needed after the adjustment so that we don't have to adjust the visualization every time for this structure:

animate write gro complex_center_vmd.gro

Section 5-4-3. Visualization of the trajectory

As in any simulation conducted with periodic boundary conditions, molecules may appear "broken" or may "jump" back and forth across the box. To recenter the protein and rewrap the molecules within the unit cell to recover the desired rhombic dodecahedral shape, invoke trjconv in the following two-step procedures (Create the folder animated_traj in short_MD to start with.):

  • Step 1: Center on the binding complex and remove jumps by -pbc nojump
    ​​​​gmx trjconv -s ../complex_md.tpr -f ../traj_comp.xtc -o complex_nojump.xtc -center -pbc nojump -n ../complex.ndx
    
  • Step 2: Center on the binding complex and remove pbc by -pbc mol -ur compact
    ​​​​gmx trjconv -s ../complex_md.tpr -f complex_nojump.xtc -o complex_center.xtc -center -pbc mol -ur compact -n ../complex.ndx -dt 250
    

There are are two prompts for both commands. the first one is "Select group for centering" and the second one is "Select group for output". Enter 24 and 0 sequentially to select Binding_Complex and System, respectively.

To visualize the trajectory, execute the command:

vmd ../complex_md.gro complex_center.xtc

As a result, after using trjconv commands, the dodecahedral box is recovered and no molecules are broken. (Originally, both the protein and the ligand were broken in the trajectory traj_comp.xtc. Check this by the command vmd complex_md.gro traj_comp.xtc.)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

However, as can be seen in the figure above, the binding complex is not at the center of the box. To fix this, execute the exactly the same command that we've just used in Tk Console in VMD:

pbc wrap -centersel "protein" -center com -compound residue -all

Also, it is recommended to output a trajectory (complex_center_vmd.trr) after the adjustment, so we don't have to recenter the complex every time we visualize it:

animate write trr complex_center_vmd.trr

(Note that VMD is not about to write an .xtc file.)

From the animate trajectory, we can see that the peptide is stably bound to the protein and the N-terminal of the peptide has a much larger fluctuation than the C-terminal, which just validated the reason that the first 10 residues were missing in the X-ray crystallography experiment. (Note that only the ASN residue was rebuilt though.)

One important thing to note here is that GROMACS didn't do anything wrong in our simulation. The "problems" we saw before the adjustment by VMD are all just a matter of VMD visualization. The correct physics is still preserved in complex_md.gro and traj_comp.xtc. However, having more straightforward visualization, complex_center_vmd.gro and complex_center_vmd.trr do not have the correct physics after the adjustment, so the files are only for visualization purposes. To validate this, we can examine the energy as a function of time of the trajectory before and after the adjustment. (Also note that the atom coordinates in complex_center_vmd.gro are different from the ones in complex_md.gro.) To get the energy data of complex_md.gro, execute the following command and enter 13 to choose to output the data of Total-Energy to the .xvg file:

gmx energy -f md.edr -o energy.xvg

However, right now we don't have the energy data of the trajectory after the VMD adjustment (complex_center_vmd.trr), so we have to execute the following command the generate an .edr file for the adjusted trajectory:

gmx mdrun -rerun complex_center_vmd.trr -s ../complex_md.tpr

However, the following error would occur:

Fatal error:
Step 0: The total potential energy is 1.08418e+14, which is extremely high.
The LJ and electrostatic contributions to the energy are 1.08418e+14 and
-365625, respectively. A very high potential energy can be caused by
overlapping interactions in bonded interactions or very large coordinate
values. Usually, this is caused by a badly- or non-equilibrated initial
configuration, incorrect interactions or parameters in the topology.

Using gmx energy to get the data of the Coulombic potential (enter 9 to choose Coulombic-(SR)) of the original trajectory (traj_comp.xtc), we can obtain that the average of the Coulombic potential is -371339. Therefore, the VMD adjustment didn't change the Coulombic potential too much and the high total energy almost came from the excessively high/erroneous Lennard-Jones potential. Therefore, it is apparent that the trajectory after VMD adjustment is not physically correct and can only serve as a trajectory for visualization purposes.

Similarly, we can also use -rerun option to get .edr files for complex_nojump.xtc and comple_center.xtc and use gmx energy to get the energy data.


As shown above, although the energy of complex_center.xtc and complex_nojump.xtc seems reasonable, trjconv command still changed the physics somehow. In addition, for both the "rerun" simulations, the following warning was shown:

WARNING: Some frames do not contain velocities.
         Ekin, temperature and pressure are incorrect,
         the virial will be incorrect when constraints are present.

Therefore, to extract a snapshot that is physically correct to proceed with further advanced sampling simulations, utilizing the original trajectory traj_comp.xtc is the safest way.

Section 5-4-5. Extraction of a physically reasonable snapshot

As mentioned, the goal of this tutorial is to provide input files for Hamiltonian replica exchange and expanded ensemble simulations. As running these two types of advanced sampling simulations in an NPT ensemble would generate problematic results, here we must perform the simulations in an NVT ensemble. Since the vanilla simulation was performed in an NPT ensemble, we have to make sure that the free energy in the NPT vanilla simulation is equivalent to the free energy in the NVT Hamiltonian replica exchange/expanded ensemble simulations. To do this, consider:

G=G(N,P,T)=VdPβˆ’SdT+ΞΌdN

A=A(N,V,T)=βˆ’PdVβˆ’SdT+ΞΌdN

Obviously, the Gibbs free energy in an NPT ensemble is equal to the Helmholtz free energy in an NVT ensemble given that the
fixed in the NVT ensemble corresponds to the pressure in the NPT ensemble
. Therefore, to conduct an NVT Hamiltonian replica exchange/expanded ensemble simulation, we have to extract a snapshot from the trajectory of the NPT vanilla simulation and make sure that the volume of that snapshot is close to the average volume in the NPT simulation. Extracting the energy data using gmx energy from md.edr and using gmx_plot2d.py to analyze the data, we can obtain that the average volume over 5 ns of MD simulation is 208.8357

nm3 and the snapshot that has a volume closet to this value is the snapshot at 3.478 ns, whose volume is 208.8350
nm3
. Therefore, we extract this snapshot using the following command (enter 0 to select the whole system in the prompt):

gmx trjconv -s complex_md.tpr -f traj_comp.xtc -o complex_avg.gro -dump 3812

This is the structure file that can finally serve as the input structure of further advanced sampling simulations!

To see how it looks like, we similarly use trjconv commands as below and use pbc wrap command in the Tk Console of VMD:

gmx trjconv -s complex_md.tpr -f complex_avg.gro -o complex_avg_nojump.gro -center -pbc nojump -n complex.ndx
gmx trjconv -s complex_md.tpr -f complex_avg_nojump.gro -o complex_avg_center.gro -center -pbc mol -ur compact -n complex.ndx

Section 5-4-6. Collection of all the input files for advanced sampling methods

After we finally obtain all the input files required for further advanced sampling simulations, it is recommended to collect all the final input files in one folder. To do this, start with creating the folder Final_inputs and execute the following commands in the folder:

cp ../Topology/complex_GMX.top PLCpep7.top
cp ../short_MD/complex_avg.gro PLCpep7.gro
cp ../short_MD/complex.ndx PLCpep7.ndx

Section 6. The preparation process in brief

In this section, we provide the reader a quick look at all the commands used in the preparation process in Section 2 to Section 5 in case that the reader is already familiar with the process.

  • Modeling of the missing residues

    • Execute python sequence.py to get an alignment file.
      ​​​​from modeller import * ​​​​# Get the sequence of the 1qg8 PDB file and write to an sequence file ​​​​code = '4k45' ​​​​e = environ() ​​​​m = model(e, file=code) ​​​​aln = alignment(e) ​​​​aln.append_model(m, align_codes=code) ​​​​aln.write(file=code+'.seq')
    • Use the sequence file to prepare an alignment file. Remember to change the number of residue since we have a non-standard amino acids. (In this case, change A:+109 into A:+110 in the sequence file.)
    • Execute python build.py to model the missing residues
      ​​​​from modeller import * ​​​​from modeller.automodel import * # Load the automodel class ​​​​log.verbose() ​​​​env = environ() ​​​​# directories for input atom files ​​​​env.io.atom_files_directory = ['.', '../atom_files'] ​​​​env.io.hetatm = True ​​​​a = automodel(env, alnfile = 'alignment.ali', ​​​​ knowns = '4k45', sequence = '4k45_fill') ​​​​a.starting_model = 1 ​​​​a.ending_model = 1 ​​​​a.make()
  • Topology preparation

    • First create a folder for this step: Topology.
    • Preparation of the protein topology file
      • Execute cp ../Build/4k45_fill.B99990001.pdb ./PLCC.pdb
      • Cut the part of peptide and save it as pep7.pdb.
      • Process PLCC.pdb with H++ web server (pH = 7.4).
      • Download the output files as PLCC.top and PLCC.crd. Convert the files by executing python GAFF/acpype.py -x PLCC.crd -p PLCC.top.
    • Preparation of the peptide topology file
      • Change HETATM records in pep7.pdb to ATOM records.
      • Download the output files as pep7.top and pep7.crd. Convert the files by executing python GAFF/acpype.py -x pep7.crd -p pep7.top.
    • Combination of the topology files
      • Execute the commands below:
        ​​​​​​cp PLCC_GMX.gro complex_GMX.gro
        ​​​​​​cp PLCC_GMX.top complex_GMX.top
        
      • Cut the coordinates of pep7_GMX.gro and append to the bottom of complex_GMX.gro. Change the numer of atoms from 1744 to 1878 and save the file.
      • Open up complex_GMX.top and pep7_GMX.top and compare the [ atomtypes ] directives, append the atom types that only appear in pep7_GMX.top to the [ atomtypes ] directive of complex_GMX.top. The appdended atom types are:
        ​​​​​​OV       OV          0.00000  0.00000   A     2.95101e-01   7.11280e-01 ; 1.66  0.1700
        ​​​​​​P        P           0.00000  0.00000   A     3.74177e-01   8.36800e-01 ; 2.10  0.2000
        ​​​​​​OT       OT          0.00000  0.00000   A     2.91092e-01   8.78640e-01 ; 1.63  0.2100
        
      • Copy all the directives other than [ defaults ], [ atomtypes ], [ system ] and [ molecules ] in pep7_GMX.top, including [ moleculetype ], [ atoms ], [ pairs ], [ bonds ], [ angles ] and [ dihedrals ] (including the propers and the impropers). Paste all these directives to complex_GMX.top.
      • Change the system name in [ system ] from PLCC to complex in water and add pep7 1 to the bottom of the file. Save the file to complete the combination of the topology files.
    • Modification of the topology file complex_GMX.top
      • Before [ system ] directive, add:
        ​​​​​​; Include water topology
        ​​​​​​#include "amber99sb-ildn.ff/tip3p.itp"
        ​​​​​​; Include topology for ions
        ​​​​​​#include "amber99sb-ildn.ff/ions.itp"
        
      • Add the following atom types to [ atomtypes ] directive:
        ​​​​​​OW       OW          0.00000  0.00000   A      0.315075      0.635968
        ​​​​​​HW       HW          0.00000  0.00000   A             0             0
        ​​​​​​Cl       Cl          35.45    0.00000   A    4.40104e-01    4.18400e-01
        ​​​​​​Na       Na          22.99    0.00000   A    3.32840e-01    1.15897e-02
        ​​​​​​IB       IB          0.00000  0.00000   A    8.90899e-01    4.18400e-01
        ​​​​​​C0       C0          0.00000  0.00000   A    3.05240e-01    1.92376e+00
        ​​​​​​MG       MG          0.00000  0.00000   A    1.41225e-01    3.74342e+00
        ​​​​​​K        K           0.00000  0.00000   A    4.73602e-01    1.37235e-03
        ​​​​​​Rb       Rb          0.00000  0.00000   A    5.26699e-01    7.11280e-04
        ​​​​​​Cs       Cs          0.00000  0.00000   A    6.04920e-01    3.37230e-04
        ​​​​​​Li       Li          0.00000  0.00000   A    2.02590e-01    7.65672e-02
        ​​​​​​Zn       Zn          0.00000  0.00000   A    1.95998e-01    5.23000e-02
        
      • Tweak the partial charges if needed. Spread out the difference of the charges to the residue furtherst to the structural entity of most interest.
      • Save the file to complete the modification.
  • Solvation and neutralization of the system

    • Execute gmx editconf -f ../Topology/complex_GMX.gro -o complex_box.gro -bt dodecahedron -d 1.0 -c
    • Execute gmx solvate -cp complex_box.gro -p ../Topology/complex_GMX.top -o complex_sol.gro -cs
    • After adding the solvent molecules, make sure that [ molecules ] directive in complex_GMX.top is well-formatted.
    • Create ions.mdp and execute gmx grompp -f ions.mdp -c complex_sol.gro -p ../Topology/complex_GMX.top -o complex_ions.tpr -maxwarn 1
    • The content of ions.mdp:
      ​​​​; ions.mdp - used as input into grompp to generate ions.tpr
      ​​​​; All unspecified parameters adopt their own default values.
      
      ​​​​; Run Control
      ​​​​integrator  = steep         ; Algorithm (steep = steepest descent minimization)
      ​​​​nsteps      = 500000        ; (0) maximum number of steps to integrate or minimize
      
      ​​​​; Energy minnimization
      ​​​​emtol       = 10.0        ; (10.0) Stop minimization when the maximum force < 100.0 kJ/mol/nm
      ​​​​emstep      = 0.01          ; (0.01) [nm] Minimization step size
      
      ​​​​; Neighbor searching/Electrostatics/Van der Waals
      ​​​​cutoff-scheme   = Verlet    ; Buffered neighbor searching
      ​​​​nstlist         = 10        ; (10) Frequency to update the neighbor list and long-range forces
      ​​​​ns_type         = grid      ; Method to determine neighbor list (simple, grid)
      ​​​​pbc             = xyz       ; Periodic Boundary Conditions in all 3 dimensions
      ​​​​coulombtype     = PME       ; Treatment of long-range electrostatic interactions
      ​​​​rcoulomb        = 1.0       ; Short-range electrostatic cut-off
      ​​​​rvdw            = 1.0       ; Short-range Van der Waals cut-off
      
    • Execute gmx genion -s complex_ions.tpr -o complex_ions.gro -p ../Topology/complex_GMX.top -pname NA -nname CL -np 26 -nn 26 to add counterions. (Eneter 15 in the prompt to choose SOL.)
  • Energy minimization, equilibration and MD simulation

    • Energy minmization
      • Create em.mdp and execute gmx grompp -f em.mdp -c ../Solvation/complex_ions.gro -p ../Topology/complex_GMX.top -o complex_em.tpr
      • The content of em.mdp:
        ​​​​​​; em.mdp - used as input into grompp to generate em.tpr
        ​​​​​​; All unspecified parameters adopt their own defaults values.
        
        ​​​​​​; Run Control
        ​​​​​​integrator  = steep         ; Algorithm (steep = steepest descent minimization)
        ​​​​​​nsteps      = 500000        ; (0) maximum number of steps to integrate or minimize
        
        ​​​​​​; Energy minnimization
        ​​​​​​emtol       = 100.0        ; (10.0) Stop minimization when the maximum force < 100.0 kJ/mol/nm
        ​​​​​​emstep      = 0.01          ; (0.01) [nm] Minimization step size
        
        ​​​​​​; Neighbor searching/Electrostatics/Van der Waals
        ​​​​​​cutoff-scheme   = Verlet    ; Buffered neighbor searching
        ​​​​​​nstlist         = 10        ; (10) Frequency to update the neighbor list and long range forces
        ​​​​​​ns_type         = grid      ; Method to determine neighbor list (simple, grid)
        ​​​​​​pbc             = xyz       ; Periodic Boundary Conditions in all 3 dimensions
        ​​​​​​coulombtype     = PME       ; Treatment of long range electrostatic interactions
        ​​​​​​rcoulomb        = 1.0       ; Short-range electrostatic cut-off
        ​​​​​​rvdw            = 1.0       ; Short-range Van der Waals cut-off
        
      • Execute gmx mdrun -s complex_em.tpr -o complex_em.trr -c complex_em.gro -g em.log -e em.edr.
    • NVT equilibration
      • Create nvt_equil.mdp and execute gmx grompp -f nvt_equil.mdp -c ../../EM/complex_em.gro -p ../../Topology/complex_GMX.top -o complex_equil.tpr.
      • The content of nvt_equil.mdp:
        ​​​​​​; Run control
        ​​​​​​integrator               = md
        ​​​​​​tinit                    = 0
        ​​​​​​dt                       = 0.002
        ​​​​​​nsteps                   = 100000  ; 200 ps
        ​​​​​​comm-mode                = Linear
        ​​​​​​nstcomm                  = 1
        
        ​​​​​​; Output control
        ​​​​​​nstlog                   = 1000
        ​​​​​​nstcalcenergy            = 1
        ​​​​​​nstenergy                = 1000
        ​​​​​​nstxout-compressed       = 1000
        
        ​​​​​​; Neighborsearching and short-range nonbonded interactions
        ​​​​​​nstlist                  = 10
        ​​​​​​ns_type                  = grid
        ​​​​​​pbc                      = xyz
        ​​​​​​rlist                    = 1.0
        
        ​​​​​​; Electrostatics
        ​​​​​​cutoff-scheme		 = verlet
        ​​​​​​coulombtype              = PME
        ​​​​​​coulomb-modifier	 = Potential-shift-Verlet
        ​​​​​​rcoulomb-switch          = 0.89
        ​​​​​​rcoulomb                 = 0.9
        
        ​​​​​​; van der Waals
        ​​​​​​vdw-type                 = Cut-off
        ​​​​​​vdw-modifier             = Potential-switch
        ​​​​​​rvdw-switch              = 0.85
        ​​​​​​rvdw                     = 0.9
        
        ​​​​​​; Apply long range dispersion corrections for Energy and Pressure 
        ​​​​​​DispCorr                 = AllEnerPres
        
        ​​​​​​; Spacing for the PME/PPPM FFT grid
        ​​​​​​fourierspacing           = 0.10
        
        ​​​​​​; EWALD/PME/PPPM parameters
        ​​​​​​fourier_nx               = 0
        ​​​​​​fourier_ny               = 0
        ​​​​​​fourier_nz               = 0
        ​​​​​​pme_order                = 4
        ​​​​​​ewald_rtol               = 1e-05
        ​​​​​​ewald_geometry           = 3d
        ​​​​​​epsilon_surface          = 0
        
        ​​​​​​; Temperature coupling
        ​​​​​​tcoupl                   = v-rescale
        ​​​​​​nsttcouple               = 1
        ​​​​​​tc_grps                  = System
        ​​​​​​tau_t                    = 0.5
        ​​​​​​ref_t                    = 298
        
        ​​​​​​; Pressure coupling is on for NPT
        ​​​​​​pcoupl                   = no   ; no pressure for NVT equilibration
        
        ​​​​​​gen_vel                  = yes
        ​​​​​​gen-temp                 = 298
        ​​​​​​gen-seed                 = 6722267; need to randomize the seed each time.
        
        ​​​​​​; options for bonds
        ​​​​​​constraints              = h-bonds  ; we only have C-H bonds here
        ​​​​​​; Type of constraint algorithm
        ​​​​​​constraint-algorithm     = lincs
        ​​​​​​; Highest order in the expansion of the constraint coupling matrix
        ​​​​​​lincs-order              = 12
        ​​​​​​lincs-iter               = 2
        
      • Execute gmx mdrun -s complex_equil.tpr -o complex_equil.trr -c complex_equil.gro -g equil.log -e equil.edr.
    • NPT equilibration
      • Create npt_equil.mdp and execute gmx grompp -f npt_equil.mdp -c ../NVT/complex_equil.gro -t ../NVT/state.cpt -p ../../Topology/complex_GMX.top -o complex_equil.tpr
      • The content of `npt_equil.mdp:
        ​​​​​​; Run control
        ​​​​​​integrator               = md
        ​​​​​​tinit                    = 0
        ​​​​​​dt                       = 0.002
        ​​​​​​nsteps                   = 100000  ; 200 ps
        ​​​​​​comm-mode                = Linear
        ​​​​​​nstcomm                  = 1
        
        ​​​​​​; Output control
        ​​​​​​nstlog                   = 1000
        ​​​​​​nstcalcenergy            = 1
        ​​​​​​nstenergy                = 1000
        ​​​​​​nstxout-compressed       = 1000
        
        ​​​​​​; Neighborsearching and short-range nonbonded interactions
        ​​​​​​nstlist                  = 10
        ​​​​​​ns_type                  = grid
        ​​​​​​pbc                      = xyz
        ​​​​​​rlist                    = 1.0
        
        ​​​​​​; Electrostatics
        ​​​​​​cutoff-scheme		     = verlet
        ​​​​​​coulombtype              = PME
        ​​​​​​coulomb-modifier	     = Potential-shift-Verlet
        ​​​​​​rcoulomb-switch          = 0.89
        ​​​​​​rcoulomb                 = 0.9
        
        ​​​​​​; van der Waals
        ​​​​​​vdw-type                 = Cut-off
        ​​​​​​vdw-modifier             = Potential-switch
        ​​​​​​rvdw-switch              = 0.85
        ​​​​​​rvdw                     = 0.9
        
        ​​​​​​; Apply long range dispersion corrections for Energy and Pressure 
        ​​​​​​DispCorr                 = AllEnerPres
        
        ​​​​​​; Spacing for the PME/PPPM FFT grid
        ​​​​​​fourierspacing           = 0.10
        
        ​​​​​​; EWALD/PME/PPPM parameters
        ​​​​​​fourier_nx               = 0
        ​​​​​​fourier_ny               = 0
        ​​​​​​fourier_nz               = 0
        ​​​​​​pme_order                = 4
        ​​​​​​ewald_rtol               = 1e-05
        ​​​​​​ewald_geometry           = 3d
        ​​​​​​epsilon_surface          = 0
        
        ​​​​​​; Temperature coupling
        ​​​​​​tcoupl                   = v-rescale
        ​​​​​​nsttcouple               = 1
        ​​​​​​tc_grps                  = System
        ​​​​​​tau_t                    = 0.5
        ​​​​​​ref_t                    = 298
        
        ​​​​​​; Pressure coupling is on for NPT
        ​​​​​​pcoupl                   = Berendsen
        ​​​​​​pcoupltype               = isotropic
        ​​​​​​tau-p                    = 5.0      
        ​​​​​​ref-p                    = 1.0      ; bar
        ​​​​​​compressibility          = 4.5e-5   ; For water at 1 atm and 300 K the comp  ressibiilty is 4.5e-5 bar^(-1)
        ​​​​​​; refcoord_scaling should do nothing since there are no position restraints.
        
        ​​​​​​gen_vel                  = no
        ​​​​​​; gen-temp                 = 298
        ​​​​​​; gen-seed                 = 6722267; need to randomize the seed each time.
        
        ​​​​​​; options for bonds
        ​​​​​​constraints              = h-bonds  ; we only have C-H bonds here
        
        ​​​​​​; Type of constraint algorithm
        ​​​​​​constraint-algorithm     = lincs
        ​​​​​​continuation             = yes
        
        ​​​​​​; Highest order in the expansion of the constraint coupling matrix
        ​​​​​​lincs-order              = 12
        ​​​​​​lincs-iter               = 2
        
      • Execute gmx mdrun -s complex_equil.tpr -o complex_equil.trr -c complex_equil.gro -g equil.log -e equil.edr
    • MD simulation
      • Create md.mdp and execute gmx grompp -f md.mdp -c ../Equil/NPT/complex_equil.gro -p ../Topology/complex_GMX.top -o complex_md.tpr -t ../Equil/NPT/state.cpt
      • The content of md.mdp:
        ​​​​​​; Run control
        ​​​​​​integrator               = md
        ​​​​​​tinit                    = 0
        ​​​​​​dt                       = 0.002
        ​​​​​​nsteps                   = 2500000  ; 5 ns
        ​​​​​​comm-mode                = Linear
        ​​​​​​nstcomm                  = 1
        
        ​​​​​​; Output control
        ​​​​​​nstlog                   = 1000
        ​​​​​​nstcalcenergy            = 1
        ​​​​​​nstenergy                = 1000
        ​​​​​​nstxout-compressed       = 1000
        
        ​​​​​​; Neighborsearching and short-range nonbonded interactions
        ​​​​​​nstlist                  = 10
        ​​​​​​ns_type                  = grid
        ​​​​​​pbc                      = xyz
        ​​​​​​rlist                    = 1.0
        
        ​​​​​​; Electrostatics
        ​​​​​​cutoff-scheme		       = verlet
        ​​​​​​coulombtype              = PME
        ​​​​​​coulomb-modifier	       = Potential-shift-Verlet
        ​​​​​​rcoulomb-switch          = 0.89
        ​​​​​​rcoulomb                 = 0.9
        
        ​​​​​​; van der Waals
        ​​​​​​vdw-type                 = Cut-off
        ​​​​​​vdw-modifier             = Potential-switch
        ​​​​​​rvdw-switch              = 0.85
        ​​​​​​rvdw                     = 0.9
        
        ​​​​​​; Apply long range dispersion corrections for Energy and Pressure 
        ​​​​​​DispCorr                 = AllEnerPres
        
        ​​​​​​; Spacing for the PME/PPPM FFT grid
        ​​​​​​fourierspacing           = 0.10
        
        ​​​​​​; EWALD/PME/PPPM parameters
        ​​​​​​fourier_nx               = 0
        ​​​​​​fourier_ny               = 0
        ​​​​​​fourier_nz               = 0
        ​​​​​​pme_order                = 4
        ​​​​​​ewald_rtol               = 1e-05
        ​​​​​​ewald_geometry           = 3d
        ​​​​​​epsilon_surface          = 0
        
        ​​​​​​; Temperature coupling
        ​​​​​​tcoupl                   = v-rescale
        ​​​​​​nsttcouple               = 1
        ​​​​​​tc_grps                  = System
        ​​​​​​tau_t                    = 0.5
        ​​​​​​ref_t                    = 298
        
        ​​​​​​; Pressure coupling is on for NPT
        ​​​​​​pcoupl                   = Parrinello-Rahman
        ​​​​​​pcoupltype               = isotropic
        ​​​​​​tau-p                    = 5.0      
        ​​​​​​ref-p                    = 1.0      ; bar
        ​​​​​​compressibility          = 4.5e-5   ; For water at 1 atm and 300 K the compressibiilty is 4.5e-5 bar^(-1)
        ​​​​​​; refcoord_scaling should do nothing since there are no position restraints.
        
        ​​​​​​gen_vel                  = no
        ​​​​​​; gen-temp                 = 298
        ​​​​​​; gen-seed                 = 6722267; need to randomize the seed each time.
        
        ​​​​​​; options for bonds
        ​​​​​​constraints              = h-bonds  ; we only have C-H bonds here
        
        ​​​​​​; Type of constraint algorithm
        ​​​​​​constraint-algorithm     = lincs
        ​​​​​​continuation             = yes
        
        ​​​​​​; Highest order in the expansion of the constraint coupling matrix
        ​​​​​​lincs-order              = 12
        ​​​​​​lincs-iter               = 2
        
      • Execute gmx mdrun -s complex_md.tpr -o complex_md.trr -c complex_md.gro -g md.log -e md.edr
  • Postprocessing of the MD simulation

    • Preparation of the index file
      Execute gmx make_ndx -f complex_md.gro -o complex.ndx. In the prompt, specify the index groups as follows:

      ​​​​a 1-1878
      ​​​​name 24 Binding_Complex
      ​​​​  
      ​​​​a 1742-1878
      ​​​​name 25 pep7
      ​​​​  
      ​​​​a 1817
      ​​​​name 26 P_atom
      ​​​​  
      ​​​​ri 15 
      ​​​​name 27 Arg1
      ​​​​  
      ​​​​ri 34
      ​​​​name 28 Arg2
      ​​​​  
      ​​​​ri 36
      ​​​​name 29 Arg3
      ​​​​  
      ​​​​ri 56
      ​​​​name 30 Arg4
      
    • Extraction of a reasonable snapshot

      • Execute gmx energy -f complex_md.gro -o volume.xvg and select Volume in the prompt.
      • Use Python script gmx_plot2d to plot the volume as a function of time and determine the average volume and the sanpshop having a volume cloest to the average volume.
      • Execute gmx trjconv -s complex_md.tpr -f traj_comp.xtc -o complex_avg.gro -dump 3478 to extract the snapshot.
    • Collection of all the final input files
      Execute the following commands

      ​​​​mkdir Final_inputs && Final_inputs
      ​​​​cp ../Topology/complex_GMX.top PLCpep7.top
      ​​​​cp ../short_MD/complex_avg.gro PLCpep7.gro
      ​​​​cp ../short_MD/complex.ndx PLCpep7.ndx
      

This is the end of the first part of the tutorial! For detailed methods for preparing simulation inputs for advanced sampling methods and the application of the workflows introduced here to other similar systems, please refer to the second part of the tutorial.


If you found the article useful, you're very welcome to share the article or leave a comment below so that I will be motivated to write more articles! If you have any questions about the tutorial, please feel free to contact weitse.hsu@colorado.edu. Thank you for reading this far!