Preparation of the Inputs Files for Advanced Sampling Simulations - Part 1

--- disqus: weitse --- # Preparation of the Inputs Files for Advanced Sampling Simulations - Part 1 ###### tags: `MDTutorials` In this tutorial, we focus on a detailed procedure for preparing input files for advanced sampling simulation. There are 2 parts in this tutorial: - [Preparation of the Input Files for Advanced Sampling Simulations - Part 1](https://hackmd.io/@WeiTseHsu/BJF0tBizU) This part of the tutorial focuses on the workflow starting from downloading the PDB file from Protein Data Bank to prepare input files for advanced sampling methods in general, including a coordinate file, a topology file, and an index file. We choose PLCC/PLCG1 protein-ligand binding complex as our system of interest in this part of the tutorial. Sections in this part of the tutorial include: - Section 1. Introduction - Section 2. Modeling the missing residues in a protein - Section 3. Topology preparation of PLCC/PLCG1-pep - Section 4. Solvation and neutralization of the system - Section 5. Energy minimization, equilibration and a short molecular dynamics (MD) simulation - Section 6. The preparation process in brief - [Preparation of the Input Files for Advanced Sampling Simulations - Part 2](https://hackmd.io/@WeiTseHsu) (unpublished for now) In this part of the tutorial, we will first introduce methods specifically for preparing the input files for Hamiltonian replica exchange and expanded ensemble simulations, given the files prepared in the first part of the tutorial using PLCC/PLCG1 protein-ligand binding complex as the system of interest. Then, we will utilize the workflows in both parts of the tutorial to different but similar systems, including the protein-ligand binding complexes like PLCC/ErbB2-pep and PLCC/IR-pep in a more brief way compared to PLCC/PLCG1-pep. Sections in this part of the tutorial include: - Section 1. Preparation of the input files for Hamiltonian replica exchange molecular dynamics (HREMD) - Section 2. Preparation of the input files for expanded ensemble simulations - Section 3. Other similar protein-ligand binding complexes ## Section 1. Introduction In this part of tutorial, our system of interest, PLCC/PLCG1-pep is the binding complex between the C-terminal SH2 domain of Phosphoplipase C, $\gamma$ 1 (PLCC) and its own intraprotein recognition site (PLCG1-pep). The X-ray crystallography structure of this binding complex could be downloaded from [Protein Data Bank (PDB)](https://www.rcsb.org/structure/4K45) with PDB ID as 4k45. The target of this tutorial is to provide a detailed method to prepare the GROMACS input files for further advanced sampling simulations, in our case, Hailtonian replica exchange and expanded ensemble simulations. These input files include a topology file (`.top`), a configuration file (`.gro`) file and an index file (`.ndx`) file. If you are already familiar with the reasons for each step in the whole process and just want a quick look at all the command used, please directly jump to Section 6. For information about the installation of the software used in this tutorial, please refer to the [wiki page in Shirts group](https://github.com/shirtsgroup/shirts-group-wiki/wiki/Tutorials-and-Installation#simulation-tools). ## Section 2. Modeling of the missing residues in a protein ### Section 2-1. System of interest When simulating a protein-ligand binding complex, the first thing to check is the missing residues and the missing atoms, which should be documented in the header of the PDB file. For example, `REMARK 465` shows the missing residues of the binding complex as follows: ``` REMARK 465 M RES C SSSEQI REMARK 465 SER A 661 REMARK 465 ILE A 764 REMARK 465 GLY A 765 REMARK 465 THR A 766 REMARK 465 ASP B 770 REMARK 465 TYR B 771 REMARK 465 GLY B 772 REMARK 465 ALA B 773 REMARK 465 LEU B 774 REMARK 465 TYR B 775 REMARK 465 GLU B 776 REMARK 465 GLY B 777 REMARK 465 ARG B 778 REMARK 465 ASN B 779 ``` As shown above, 4 residues of the protein (PLCC, chain A) were missing (one at the beginning and 3 in the end). On the other hand, the first 10 residues of the peptide (PLCG1-pep, chain B) were missing. These residues are missing generally because they are too flexible and floppy, so the electron density for these residues is too low to resolve the structure. To make the structure for the simulation consistent with the one investigated by the experimental methods, our goal in Section 2., in this case, is to use the software MODELLER to rebuild all the 4 missing residues of the protein, and the asparagine residue of the peptide (residue 779). Once the rebuilding of the missing residues is completed, you should also found that MODELLER also automatically rebuilt all the missing atoms of side chains of the protein. To start with, make a folder `Build`, download the PDB file `4k45.pdb` from the link above to the folder and follow the steps below in the folder. ### Section 2-2. Preparation of the alignment file To generate an alignment file, first, we have to generate a sequence file from `4k45.pdb`. To do so, run the Python script `sequence.py` which has the following content ```python=1 from modeller import * # Get the sequence of the 1qg8 PDB file and write to an alignment file code = '4k45' e = environ() m = model(e, file=code) aln = alignment(e) aln.append_model(m, align_codes=code) aln.write(file=code+'.seq') ``` As a result, a sequence file `4k45.seq` will be produced, which contains the sequence of the structure in `4k45.pdb` (not including the missing residues) as below. ``` >P1;4k45 structureX:4k45: 662 :A:+109 :B:MOL_ID 1; MOLECULE 1-PHOSPHATIDYLINOSITOL 4,5-BISPHOSPHATE PHOSPHODI GAMMA-1; CHAIN A; FRAGMENT C-TERMINAL SH2 (CSH2) DOMAIN; SYNONYM PHOSPHOINOSITIDE PHOSPHOLIPASE C-GAMMA-1, PHOSPHOL GAMMA-1, PLC-GAMMA-1; EC 3.1.4.11; ENGINEERED YES; MOL_ID 2; MOLECULE 1-PHOSPHATIDYLINOSITOL 4,5-BISPHOSPHATE PHOSPHODI GAMMA-1, SHORT PEPTIDE; CHAIN B; FRAGMENT RESIDUES 770 TO 787 OF PLC-GAMMA1; SYNONYM PHOSPHOINOSITIDE PHOSPHOLIPASE C-GAMMA-1, PHOSPHOL GAMMA-1, PLC-GAMMA-1; EC 3.1.4.11; ENGINEERED YES:MOL_ID 1; ORGANISM_SCIENTIFIC RATTUS NORVEGICUS; ORGANISM_COMMON RAT; ORGANISM_TAXID 10116; GENE PLCG1; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; MOL_ID 2; SYNTHETIC YES; ORGANISM_SCIENTIFIC RATTUS NORVEGICUS; ORGANISM_COMMON RAT; ORGANISM_TAXID 10116: 1.50: 0.16 QAESKEWYHASLTRAQAEHMLMRVPRDGAFLVRKRNEPNSYAISFRAEGKIKHCRVQQEGQTVMLGNSEFDSLVD LISYYEKHPLYRKMKLRYPINEEALEK/PGFVEAN* ``` Note that the sequences should be ended up with an `*`, and the forward-slash `/` in the sequence serves as a breaker between different chains. So the part before the slash is the sequence of the protein, while the remaining part is the sequence of the peptide. In the sequence of the peptide, it should be noticed that the phosphotyrosine residue was ignored by MODELLER, which is a result of the HETATM records of the phosphotyrosine. In the alignment file `alignment.ali`, which can be manually made from the content in `4k45.seq`, there should be two sequences shown as follows: ``` >P1;4k45 structureX:4k45: 662 :A:+110 :B:MOL_ID 1; MOLECULE 1-PHOSPHATIDYLINOSITOL 4,5-BISPHOSPHATE PHOSPHODI GAMMA-1; CHAIN A; FRAGMENT C-TERMINAL SH2 (CSH2) DOMAIN; SYNONYM PHOSPHOINOSITIDE PHOSPHOLIPASE C-GAMMA-1, PHOSPHOL GAMMA-1, PLC-GAMMA-1; EC 3.1.4.11; ENGINEERED YES; MOL_ID 2; MOLECULE 1-PHOSPHATIDYLINOSITOL 4,5-BISPHOSPHATE PHOSPHODI GAMMA-1, SHORT PEPTIDE; CHAIN B; FRAGMENT RESIDUES 770 TO 787 OF PLC-GAMMA1; SYNONYM PHOSPHOINOSITIDE PHOSPHOLIPASE C-GAMMA-1, PHOSPHOL GAMMA-1, PLC-GAMMA-1; EC 3.1.4.11; ENGINEERED YES:MOL_ID 1; ORGANISM_SCIENTIFIC RATTUS NORVEGICUS; ORGANISM_COMMON RAT; ORGANISM_TAXID 10116; GENE PLCG1; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; MOL_ID 2; SYNTHETIC YES; ORGANISM_SCIENTIFIC RATTUS NORVEGICUS; ORGANISM_COMMON RAT; ORGANISM_TAXID 10116: 1.50: 0.16 -QAESKEWYHASLTRAQAEHMLMRVPRDGAFLVRKRNEPNSYAISFRAEGKIKHCRVQQEGQTVMLGNSEFDSLVD LISYYEKHPLYRKMKLRYPINEEALEK---/-PGF.VEAN* >P1;4k45_fill structureX:4k45: 662 :A:+110 :B:MOL_ID 1; MOLECULE 1-PHOSPHATIDYLINOSITOL 4,5-BISPHOSPHATE PHOSPHODI GAMMA-1; CHAIN A; FRAGMENT C-TERMINAL SH2 (CSH2) DOMAIN; SYNONYM PHOSPHOINOSITIDE PHOSPHOLIPASE C-GAMMA-1, PHOSPHOL GAMMA-1, PLC-GAMMA-1; EC 3.1.4.11; ENGINEERED YES; MOL_ID 2; MOLECULE 1-PHOSPHATIDYLINOSITOL 4,5-BISPHOSPHATE PHOSPHODI GAMMA-1, SHORT PEPTIDE; CHAIN B; FRAGMENT RESIDUES 770 TO 787 OF PLC-GAMMA1; SYNONYM PHOSPHOINOSITIDE PHOSPHOLIPASE C-GAMMA-1, PHOSPHOL GAMMA-1, PLC-GAMMA-1; EC 3.1.4.11; ENGINEERED YES:MOL_ID 1; ORGANISM_SCIENTIFIC RATTUS NORVEGICUS; ORGANISM_COMMON RAT; ORGANISM_TAXID 10116; GENE PLCG1; EXPRESSION_SYSTEM ESCHERICHIA COLI; EXPRESSION_SYSTEM_TAXID 562; MOL_ID 2; SYNTHETIC YES; ORGANISM_SCIENTIFIC RATTUS NORVEGICUS; ORGANISM_COMMON RAT; ORGANISM_TAXID 10116: 1.50: 0.16 SQAESKEWYHASLTRAQAEHMLMRVPRDGAFLVRKRNEPNSYAISFRAEGKIKHCRVQQEGQTVMLGNSEFDSLVD LISYYEKHPLYRKMKLRYPINEEALEKIGT/NPGF.VEAN* ``` As shown above, several changes need to be made to make an alignment file from `4k45.seq`: - To make the sequence `4k45` in the alignment file, just copy the content of `4k45.seq`, identify the missing residues and add a `-` to represent each residue. - Note that in both sequences, we use `.` to represent the HETATM residue (e.g. DNA, RNA, ligand, etc.), which is the phosphotyrosine residue in this case. In MODELLER, `.` is called BLK or block residue type. Accordingly, since the phosphotyrosine was ignored in `4k45.seq`, we have to correct the number of residues once we include phosphotyrosine. Therefore, change `A:+109` into `A:+110`. - The second sequence in the alignment file, `4k45_fill` is a complete sequence with all the missing residues filled that serves as a template. Make sure that the two sequences have equal numbers of residues and are aligned with each other. ### Section 2-3. Modeling of the missing residues using `automodel` After getting the alignment file, we should be able to use it as the input to the modeling using `automodel`. Specifically, we run the script `build.py` which has the following content: ```python=1 from modeller import * from modeller.automodel import * # Load the automodel class log.verbose() env = environ() # directories for input atom files env.io.atom_files_directory = ['.', '../atom_files'] env.io.hetatm = True a = automodel(env, alnfile = 'alignment.ali', knowns = '4k45', sequence = '4k45_fill') a.starting_model = 1 a.ending_model = 1 a.make() ``` Remember that by default all HETATM records are ignored. Therefore, to read all the HETATM from the template, we have to turn on `env.io.hetatm` by setting `env.io.hetatm = True`. After executing `build.py`, we should get only one PDB file, `4k45_fill.B99990001.pdb` as a result of setting `a.starting_model = 1` and `a.ending_model = 1`. (Also note that all the water molecules have been removed.) Use VMD (Visual Molecular Dynamics) to visualize the structure to see it seems reasonable, which means that there should be no broken molecules, missing atoms, or other anomalies. (In this case, check if the phosphotyrosine still stays inside the binding pocket composed of 4 arginine residues.) As a record, the following is the output of using `build.py`: ``` << end of ENERGY. openf___224_> Open 4k45_fill.B99990001.pdb wrpdb___568_> Residues, atoms, selected atoms: 115 949 949 >> Summary of successfully produced models: Filename molpdf ---------------------------------------- 4k45_fill.B99990001.pdb 719.84387 ``` Note that if a wrong is used in `build.py`, then the atoms might be erroneously added, which might lead an error when using H++ web sever to perform parameterization. (An error like `FATAL: Atom .R<NASN 107>.A<OXT 17> does not have a type.` might occur.) ## Section 3. Topology preparation of PLCC/PLCG1-pep ### Section 3-1: Concepts of parameterization by AmberTools To prepare the topology file of a structure, we first have to decide which force field to use. In the case of protein-ligand binding complexes, one of the frequently chosen force fields is AMBER99SB-ILDN, which is an all-atom force field. However, simply using the GROMACS program `pdb2gmx` will not work, since the peptide in our case contains a non-standard residue, phosphotyrosine (PTR), which is not included in the residue topology database (`aminoacids.rtp` in the folder `/usr/share/gromacs/top/amber99sb-ildn.ff`). of AMBER99SB-ILDN force field. Generally, there are multiple ways to deal with this issue, and the most intuitive way might be manually adding the parameters of PTR to `aminoacids.rtp` and modifying `residuetypes.dat` in the folder `/usr/share/gromacs/top` then running `pdb2gmx`. However, it might be tedious to get the parameters of PTR, so it could be a better choice to rely on other tools (so that `pdb2gmx` is not required). When it comes to parameterizing a small molecule or a non-standard amino acid, Amber is often a useful tool. Amber is another molecule dynamics simulation engine like GROMACS and it also contains package (AmberTools) that is able to generate topology and coordinate files convertible to GROMACS input files. While the procedure of parameterization using AmberTools could be complicated, a web server [H++](http://biophysics.cs.vt.edu/) developed by Onufriev et al. is able to automize the whole process of the parameterization of either proteins or non-protein structures. However, it is still recommended to know how Amber works to parametrize a structure rather than using H++ as a black box. Therefore, here we introduce the core concepts of parameterization using AmberTools as below. If you are new to Amber or AmberTools, it is highly recommended to refer [the manual of Amber 2018 or 2019](https://ambermd.org/doc12/Amber19.pdf), especially Section 1, 12, 13, 14, and 15. Also, the following tutorials are recommended to get familiarized with the fundamentals using AmberTools, especially parameterization: - [Tutorial for LEaP program](http://ambermd.org/tutorials/pengfei/index.htm) - [Fundametnals of LEaP](http://ambermd.org/tutorials/pengfei/index.php) - [Antechamber Tutorial](http://ambermd.org/tutorials/basic/tutorial4b/) - [TUTORIAL B5: Simulating the Green Fluorescent Protein](http://ambermd.org/tutorials/basic/tutorial5/index.htm) - [An Introduction to Molecular Dynamics Simulations using AMBER](http://ambermd.org/tutorials/basic/tutorial0/index.htm) #### Section 3-1-1. Parameterization of small molecules Generally, when parameterizing a small molecule (instead of a polypeptide), we have to separate the part of the small molecule from the original structure and use several programs in Amber following the procedures as below to independently parameterize it: - **Step 1**: Use `antechamber` to assign charges and atom types. - Sample command: `antechamber -i ${name}.pdb -fi pdb -o ${name}.mol2 -fo mol2 -c bcc -s 2 -nc ${nc} -pl 15` - Input: a PDB file; output: a `.mol2` file. - **Step 2**: Use `parmchk2` to build the missing parameters. - Sample command: `parmchk2 -i ${name}.mol2 -f mol2 -o ${name}.frcmod` - Input: a `.mol2` file; Output: a `.frcmod` file (parameter file) - **Step 3**: Use tLEaP to generate the topology and updated coordinate files. - Sample command: `tleap -f tleap.in`, which requires a tLEaP input file looks like follows: ``` source oldff/leaprc.ff99SB # load in the protein force field parameters source leaprc.gaff # load the GAFF parameters mol = loadmol2 input.mol2 # mol is a variable name (generally the name of the small molecule) check mol # a check for missing parameters loadamberparams input.frcmod # load the parameters of the small molecule from the frcmod file saveoff mol mol.lib # create a library file for the small molecule (off = object file format) saveamberparm mol output.prmtop output.inpcrd # generate Amber topology file (.prmtop) and coordinate file (inpcrd) (One could also use savepdb instead) quit # exit out of tleap ``` - The output files including an Amber topology file (`.prmtop` file) and an Amber coordinate file (`.inpcrd`) file. Both files are convertible using [InterMol](https://github.com/shirtsgroup/InterMol.git) or `acpype.py`. Note that the missing atoms and the hydrogens will be builts by tLEaP automatically. - Refer to the [tutorial of Amber](http://ambermd.org/tutorials/basic/tutorial4b/) for more information. - Some useful scripts could be found in the [private GitHub repository of Shirts Group](https://github.com/shirtsgroup/useful-scripts.git). (Directory: `Parameterization/GAFF`) #### Section 3-1-2. Parameterization of peptides containing non-standard amino acids In our case, however, the ligand is a peptide with a non-standard amino acid (PTR). To generate a topology file of the whole binding complex, there are several ways to do using AmberTools (without using H++): - Use AmberTools to parameterize the whole protein-ligand binding complex. - Use AmberTools to parameterize the whole peptide ligand and use `pdb2gmx` to parameterize the protein (with AMBER99SB-ILDN force field), then combine the coordinate files and the topology files output by different parameterizing tools. - Use AmberTools to parameterize the phosphotyrosine (instead of the whole ligand) and parameterize the remaining part of the binding complex using `pdb2gmx`, then combine the coordinate files and the topology files. However, in this case, PTR is not a terminal residue, which means that the peptide ligand will be broken into two pieces when we separate PTR from the original PDB for AmberTools parametrization. To parameterize the remaining part other than PTR, we might have to label the two pieces of the peptide as two different chains (since there is no covalent bond in between). The procedure of parameterizing using AmberTools in all of these three methods resembles the procedure of parameterizing the small molecule as shown previously, except that `antechamber` and `parmchk2` might not be required. Note that, however, it could be tedious and error-prone when combining the coordinate files if different parts of the peptide are parameterized by different tools. Therefore, the last method is not recommended. ### Section 3-2. Preparation of the complex topology file using H++ server #### Section 3-2-1. Preparation of the input structure of H++ While H++ web server is able to process the whole complex at a time, it is recommended to parameterize the protein and the peptide ligand separately so that we can have these two entities as two different `[ molecultype ]` directives in the topology file. Parameterizing the whole complex would lead to only one `[ moleculetype ]` directive corresponding to the whole complex, which makes it hard for us to only decouple the interactions of the peptide ligand in replica exchange or expanded ensemble simulations. To separately parameterize the protein and the peptide ligand, we first have to separate the coordinate file. To do so, execute the following command in folder `Topology` to copy `4k45_fill.B99990001.pdb` and save as `PLCC.pdb`. (Before this, go back to the parent folder of the current folder `Build` and make another folder `Topology`, in which all the steps in this section will be carried out.) ``` cp ../Build/4k_45_fill.B99990001.pdb ./PLCC.pdb ``` Then, cut the part of the peptide ligand and save it as `pep7.pdb`. Note that if there are several images/copies in the original PDB file downloaded from Protein Data Bank, usually there is only one image chosen to proceed to simulations. Since the chosen image is usually not at the center of the simulation box, it is recommended to use the GROMACS program `editconf` to center the structure to prevent the structure cross the periodic boundaries during energy minimization or other following steps: ``` gmx editconf -f complex.pdb -c -o complex_center.pdb ``` However, in our case, there is only one image in the original PDB file, which should be already close to the center of the box, so it is not necessary to do so. When preparing the structure input file for any service using AmberTools, there are several things to note: - If the X-ray unit cell in the PDB ﬁle contains more than one image, choose the entity of interest and delete the other(s). - If the ligand is a small molecule rather than a peptide, than we have to use software like Avogadro to build the hydrogen atoms (and OpenBabel to convert the file if needed) before using AmberTools, since all the Amber force fields are all-atom force fields. - In contrast, there is no need to add hydrogen atoms for proteins and peptides. (Actually, it is recommended to get rid of all protein (or peptide) hydrogens that are explicitly expressed in the PDB file.) The `reduce` and LEaP utilities in AmberTools add hydrogens automatically with predefined names. Having hydrogens in PDB files with names that LEaP does not recognize within its residue libraries leads to a total mess. - Make sure all connectivity records in the PDB file are removed. These are mostly referring to ligands, or, in some cases, to disulﬁde links. The latter should be explicitly re-connected without relying on connectivity records in the PDB ﬁle. (In our case, the connectivity records in `pep7.pdb` have been removed by MODELLER automatically.) - In this case, clean and prepare the input structure PDB file is relatively easy. For other cases, consider `pdb4amber`, which is introduced in Section 12.4 in the manual. Obviously, different concerns might need to be taken care of in different cases. More details about the system preparation for AmberTools are elaborated in Section 12 in the manual of Amber 2018. To name just a few, - Tautomeric and protonation states are not rendered in PDB ﬁles. If a deﬁned state for a residue is required, its name in the PDB ﬁle must reﬂect the choice. See Section 12.2 in the manual of Amber 2018 for more information. - Amber preparation modules assume that residues in a PDB file are connected and appear sequentially in the file. If not covalently connect (i.e., linked by an amide bond), the residues must be separated by "TER" records in the PDB file. (Alternatively, the chain ID must change on going from one chain to the next chain.) - Similarly, if residues are missing (e.g., not detected in x-ray, or cut by the user), the gap should also be separated by a "TER" record. Terminal residues will be charged by default. If the user wants to avoid this (especially for gaps), these residues should be capped (by ACE and NHE or NME). With `PLCC.pdb` and `pep7.pdb` to start with, now we can use H++ the parameterize them separately. #### Section 3-2-2. Preparation of the protein topology file First, let's parameterize the protein part (`PLCC.pdb`) using H++. To begin with, click on the tab "Process a structure" on the [H++ web server](http://biophysics.cs.vt.edu/), upload `PLCC.pdb` and click on "Process File". After that, the user will be directed to a page in which the user can specify parameters used during the parameterization, including salinity, internal dielectric and external dielectric constants, pH value and so on. H++ will decide the protonation states of each residue based on these parameters (mainly the pH value). For typical physiological conditions, using the default value of 80 for the external dielectric (water) and salinity = 0.15 M is reasonable. The situation is less straightforward with the internal dielectric. If you are mostly interested in deeply buried residues, a lower value is recommended such as 4. If your focus is residues closer to the surface, a higher value is appropriate, such as 10 or even 20. In our case, we have to make sure that the parameters used here are consistent with the experimental conditions. According to Section 2.2.6. (page 37) of Dr. McKercher's thesis, PLCC was dialyzed overnight at 4 °C into buffer containing **25 mM Tris**, **pH 7.4**, and **200 mM NaCl** and the binding reactions were performed at **25 °C (298 K)**. Here, we will just ignore Tris buffer and set the salinity in the web server as 0.2 (M) and the pH value as 7.4. As for the dieletric constants, we would just use the default values. Note that here we adopt the experimental conditions of ITC instead of X-ray crystallography because what we want to simulate is the binding complex in a solution instead of in a crystal form. ITC and NMR experiments are conducted in solution phase, so we have to make sure that our simulation parameters consistent with the ones used in at least one of these experiments. Note that H++ is more than parameterizing a structure. As you can see on the page for specifying parameters, it is also able to construct a simulation box, solvate the system and add ions to the box. However, the box types available in H++ are limited to cubic and octahedral boxes. While the volume of an octahedral box is about 77% of a cubic box and can, therefore, decrease computational cost, it is not the best choice compared with a rhombic dodecahedral box available in GROMACS, whose volume is only about 70.7% of a cubic box. Therefore, we are not going to use H++ to solvate the system in a simulation box, but just parameterize the structure and use GROMACS to subsequently complete solvation and neutralization. After setting up all these parameters, click on the tab "Process ..." to process the file. (Note that it is recommended to register an account on H++, which would make it easier to manage the submissions.) As a result, in the results page, the titration sites and their titration curves, isoelectric point (8.69 in our case) and the total charge at pH 7.4 (+3 in our case) will be shown. Also, there are some key files available to be downloaded, including an Amber topology file `0.2_80_10_pH7.4_complex.top` and an Amber coordinate file `0.2_80_10_pH7.4_complex.crd`. Accordingly, save the Amber topology file and the Amber coordinate file as `PLCC.top` and `PLCC.crd`, respectively. However, note that these files are for Amber simulation engine instead of GROMACS, we have to use convert the files to the extensions that are compatible with GROMACS. Since InterMol is only able to convert `prmtop` file, here we have to use `acpype.py` to convert the files. To do this, clone the [repository that is called "useful-scripts" in Shirts group](https://github.com/shirtsgroup/useful-scripts.git), copy the folder `GAFF` in the folder `Parameterization` and execute the command ``` python GAFF/acpype.py -x PLCC.crd -p PLCC.top ``` As a result, several files will be output, including a GROMACS topology file `PLCC_GMX.top`, a GROMACS coordinate file `PLCC_GMX.gro`, and two parameter files for testing the output files with GROAMCS (`em.mdp` and `md.mdp`). Since we will also perform energy minimization and short MD simulation later, we are not going to use `em.mdp` and `md.mdp` generated here, but the files `PLCC_GMX.gro` and `PLCC_GMX.top` are exactly what we want for now. #### 3-2-3. Preparation of the peptide ligand topology Basically, the parameters (salinity, pH value and so on) used in the parameterization of the peptide ligand are exactly the same as the ones used in the parameterization of the protein except that several changes have to be made: - **Change HETATM records to ATOM records** Note that the records of the PTR residue in `pep7.pdb` are HETATM instead of ATOM. Since the HETATM records will be ignored by AmberTools during parameterization and would cause a discontinuity in the peptide in our case, we have to change HETATM records of PTR into ATOM records. - **Deselect the option for correcting residue orientation in H++** If we process `pep7.pdb` using H++ using exactly the same settings as we used previously in the parameterization of `PLCC.pdb`, the following error would occur and be shown in `0.15_80_10_pH6_pep7.tleap.err`: ``` FATAL: Atom .R<PTR 111>.A<HB1 25> does not have a type. ``` This error indicates that HB1 atom in PTR does not have a type. Note that this atom was not present in the input PDB file but added by the `reduce` program in AmberTools used by H++ to correct the orientation of the ASN residue. To avoid this error, we have to deselect the option "Correct orientation of ASN, GLN and HIS groups ..." on the page that we specify the parameters. (All the other parameters should remain the same.) While this option is selected by default, skipping it is generally ok. It may affect the accuracy of the predicted protonation state, but the predicted protonation state is an approximation anyway. If you are concerned, you can run with the option turned on. Then look in the `reduce.log` file under "listings". Search for "FLIP" in this file. It will tell you which groups were flipped. Then you can load your coordinate files in VMD or other viewers to see which groups are near the flipped groups. if the pKa for the nearby groups are far from your pH of interest, then it should not affect predicted protonation state. Very rarely is this not the case. But if it is you can manually change the orientation of the groups. Given that all the considerations above are taken care of, the parameterization should work well and the total charge at pH 7.4 is -3. Similarly, save the Amber topology file and the Amber coordinate file as `pep7.top` and `pep7.crd`, respectively, and execute the following command to convert the files: ``` python GAFF/acpype.py -x pep7.crd -p pep7.top ``` As a result, a GROMACS topology file `pep7_GMX.top` and a GROMACS coordinate file `pep7_GMX.gro` will be produced. #### Section 3-2-4. Combination of the coordinate files and the topology files Since the simulations will be performed on the whole binding complex, not either only the protein or only the peptide ligand, we have to combine the topology files and the coordinate files of these two entities. To begin with, execute the following command: ``` cp PLCC_GMX.gro complex_GMX.gro cp PLCC_GMX.top complex_GMX.top ``` To combine the coordinate files, we only have to cut the coordinates in `pep7_GMX.gro` and append to the bottom of `complex_GMX.gro`, change the number of atoms shown at the top of the file from 1744 to 1878 and save the file. Note that the numbering of the residues of the peptide ligand starts from 1 instead of 107. This is exactly what we want, since this is how the topology file that we are going to make distinguishes different molecule types defined in `[ moleculetype ]` directive. The numbering of the peptide also starts from 1 instead of 1745, but this will be fixed in the next step using the GROMACS program `editconf` when we solvate the system. Note that even if the box dimensions in `PLCC_GMX.gro` and `pep7_GMX.gro` are different, but it is just a result of different molecule sizes. The relative distance between the protein and the peptide remains the same. To examine this, it is recommended to load `complex_GMX.gro` and `4k45_fill.B99990001.pdb` in VMD. As shown below, it can be seen that the two structures align with each other perfectly. (The one in blue is `complex_GMX.gro` while the other in red is `4k45_fill.B99990001.pdb`.) <center><img src=https://i.imgur.com/xYbcQmZ.png width=600></center> </br> Combining the topology files is also straightforward. While making `pep7_GMX.top` into an `.itp` file (include topology file) and include the `.itp` file in `complex_GMX.top` (using the statement like `#include "pep7.itp"`) should work, here we will just copy the content of `pep_GMX.top` and paste that to `complex_GMX.top` so that we don't have to always have both the `.itp` and `.top` file in the same directory. However, there are still several things to note. The first is that in `complex_GMX.top`, there should only be one `[ atomtypes ]` directive that includes all the atom types in both the protein and the peptide ligand. Therefore, here we adopt the union set of the `[ atomtypes ]` directives of the two topology files. To do so, we have to append the atom types that only appear in `pep7_GMX.top` to the `[ atomtypes ]` directive of `complex_GMX.top`. Specifically, the appended atom types are: ``` OV OV 0.00000 0.00000 A 2.95101e-01 7.11280e-01 ; 1.66 0.1700 P P 0.00000 0.00000 A 3.74177e-01 8.36800e-01 ; 2.10 0.2000 OT OT 0.00000 0.00000 A 2.91092e-01 8.78640e-01 ; 1.63 0.2100 ``` Then, copy all the directives other than `[ defaults ]` and `[ atomtypes ]` in `pep7_GMX.top`, including `[ atoms ]`, `[ pairs ]`, `[ bonds ]`, `[ angles ]` and `[ dihedrals ]` (including propers and impropers) and paste them right after the last directive (should be `[ dihedrals ]` directive for the impropers) of the protein. At last, change the system name in `[ system ]` directive from `PLCC` to `complex in water` and add `pep7 1` to the bottom of the file. (Note that the molecule type of the peptide ligand is `pep7` defined in the `[ moleculetype ]` directive of the ligand.) As a result, the bottom of the topology file should be: ``` [ system ] complex in water [ molecules ] ; Compound nmols PLCC 1 pep7 1 ``` Up to this point, the combination of the coordinate files and the topology files should be complete. ## Section 4. Solvation and neutralization of the system ### Section 4-1. Solvation of the system To solvate the system in water molecules, we first have to construct a simulation box. (Accordingly, create a new folder `Solvation` in the directory same level as `Topology`.) As mentioned, a rhombic dodecahedral box is one of the best choices. To center the protein in a dodecahedral box with a volume such that the distance between the solute and the box is 1.0 nm, execute the following command in folder `Solvation`: ``` gmx editconf -f ../Topology/complex_GMX.gro -o complex_box.gro -bt dodecahedron -d 1.0 -c ``` As a result, a coordinate file `complex_box.gro` will be generated and it is shown that the volume of the box is 212.339 $nm^{3}$. Then, to add water molecules to the system, execute: ``` gmx solvate -cp ../Topology/complex_em.gro -p ../Topology/complex_GMX.top -o complex_sol.gro -cs ``` Accordingly, a coordinate file `complex_sol.gro` and a new topology file `complex_GMX.top` (in folder `Topology`) will be produced, both with water molecules added. Note that the old topology file is not overwritten but renamed as `#complex_GMX.top.1#`. (Note that it is recommnded to make sure that `[ molecules ]` directive in `complex_GMX.top` is well-formatted after the solvents are added.) ### Section 4-2. Modification of the topology file After adding water molecules in the simulation box, we then have to add counterions to neutralize the system, which can be done by the GROMACS program `genion`. To run `genion`, we need a `.tpr` file, which can be produced by the `grompp` command. Basically, `grompp` command reads a molecule topology file, checks the validity of the file, expands the topology from a molecular description to an atomic description. However, at this point, our topology file does not have atom types and molecule types (like `SOL` in `complex_GMX.top`) for the ions and solvent molecules, but only the ones for the binding complex. Therefore, to enable GROMACS to recognize `moleculetype` such as `SOL` and `CL` when we run the `grompp` command, we have to modify the topology file `complex_GMX.top`. In this case, we will borrow these parameters (`moleculetype` and `atomtypes`) from AMBER99SB-ILDN force field and use TIP3P water model. Accordingly, add the following lines to the topology file `complex_GMX.top`, right before the `[ system ]` directive: ``` ; Include water topology #include "amber99sb-ildn.ff/tip3p.itp" ; Include topology for ions #include "amber99sb-ildn.ff/ions.itp" ``` If these lines are not added, errors like `No such moleculetype SOL` (which could be avoided by including `tip3p.itp`) and `No such moleculetype CL` (which could be avoided by including `ions.itp` would occur. (Note that it is not recommended to delete the `[ defaults ]` directive in the topology file and add `#include "amber99sb-ildn.ff/forcefield.itp"`, since there might be duplicates of atom types and the original ones might be overridden.) To make the include topology `tip3p.itp` and `ions.itp` effective, all the atom types used in defining the `moleculetypes` in these files must be defined in the `[ atomtypes ]` directive in our topology file `complex_GMX.top`. However, obviously most atom types used in these two include topology files are not defined in `complex_GMX.top`. Since all the atom types used in these `.itp` files can be found in `ffnonbonded.itp`, to solve the issue, just copy the parameters needed from `ffnonbonded.itp` and append to the `[ atomtypes ]` directive in `complex_GMX.top` with a format consistent with the existing parameters. Specifically, the following lines are add: ``` OW OW 0.00000 0.00000 A 0.315075 0.635968 HW HW 0.00000 0.00000 A 0 0 Cl Cl 35.45 0.00000 A 4.40104e-01 4.18400e-01 Na Na 22.99 0.00000 A 3.32840e-01 1.15897e-02 IB IB 0.00000 0.00000 A 8.90899e-01 4.18400e-01 C0 C0 0.00000 0.00000 A 3.05240e-01 1.92376e+00 MG MG 0.00000 0.00000 A 1.41225e-01 3.74342e+00 K K 0.00000 0.00000 A 4.73602e-01 1.37235e-03 Rb Rb 0.00000 0.00000 A 5.26699e-01 7.11280e-04 Cs Cs 0.00000 0.00000 A 6.04920e-01 3.37230e-04 Li Li 0.00000 0.00000 A 2.02590e-01 7.65672e-02 Zn Zn 0.00000 0.00000 A 1.95998e-01 5.23000e-02 ``` If the lines above are not added, an error like `Atomtype XX not found` would occur. One thing to note here is that in `complex_GMX.top`, the mass and charge in the `[ atomtypes ]` directive can be modified in `[ atoms ]` directive. If masses are not provided in `[ atoms ]`, they will be referenced from `[ atomtypes ]`. On the other hand, charges present in `[ atomtypes ]` are never used for anything and must be specified in `[ atoms ]`. Therefore, it is fine to have zero masses and charges in `[ atomtypes ]`, as you can see in `complex_GMX.top` in our case. (MODELLER has automatically assigned zero masses and charges.) However, at this point, `[ atoms ]` directive only contains atoms in the binding complex. Therefore, when `grompp` after the ions (like sodium ion(type Na) or chloride ion(type Cl)) are added, the `[ atomtypes ]` will be referenced for the mass of the ions. Accordingly, to avoid errors like `atom CL (Res CL-1) has mass 0 (state A) / 0 (state B)`, the mass of the atom types `Cl` and `Na` should not be zero, as you can see in the added lines above. (For the mass of any atom type, refer to `ffnonbonded.itp`.) For more information, check one of the [discussion on GROMACS mailing list](https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2016-October/108866.html). In addition, note that it is recommended to check if there are duplicates of atom types and if the atom types in each residue defined in `aminoacids.rtp` match the atom types of the equivalent atoms in the topology file `complex_GMX.top`. Note that in `aminoacids.rtp`, the atomtype of the CA atom of LEU is CT, while that of LEU is CX in `complex_GMX.top`. Similarly, the atom type of the CB atom of LEU in `aminoacids.rtp` and `complex_GMX.top` is CT and A2C, respectively. Since A2C and CX defined in `complex_GMX.top` and CT in `ffnonbonded.itp` have exactly the same parameters (sigma being 3.39967e-01 and epsilon being 4.57730e-01), these three atom types are all effectively the same. At the moment, it is not clear why MODELLER assigned A2C and CX instead of CT, but as long as the parameters are faithfully assigned and the process ends up being straightforward, all other details are flexible and leaving the atom types like this should be fine. After modifying the topology as above, no errors relevant to atom types or molecule types will be produced in the following simulations. ### Section 4-3. Addition of the sodium and chloride ions #### Section 4-3-1. Tweaking the partial charges in the topology file if needed. In a lot of cases, it is common to have net charges on the binding complex and the neutralization of the system is required. Generally, the process is carried out by adding sodium ions and the chloride. These added ions not only neutralize th system, but also specify the salinity of the system. Therefore, even if we happen to have neutral binding complex (protein: +3, peptide: -3 at pH 7.4.), we still have to add ions to the system to reach the salinity of 200 mM of NaCl as used in the ITC experiment. To add ions, we first have to run the following command to generate a `.tpr` file: ``` gmx grompp -f ions.mdp -c complex_sol.gro -p ../Topology/complex_GMX.top -o complex_ions.tpr ``` where the parameter file `ions.mdp` has the following content: ``` ; ions.mdp - used as input into grompp to generate ions.tpr ; All unspecified parameters adopt their own default values. ; Run Control integrator = steep ; Algorithm (steep = steepest descent minimization) nsteps = 500000 ; (0) maximum number of steps to integrate or minimize ; Energy minnimization emtol = 10.0 ; (10.0) Stop minimization when the maximum force < 100.0 kJ/mol/nm emstep = 0.01 ; (0.01) [nm] Minimization step size ; Neighbor searching/Electrostatics/Van der Waals cutoff-scheme = Verlet ; Buffered neighbor searching nstlist = 10 ; (10) Frequency to update the neighbor list and long-range forces ns_type = grid ; Method to determine neighbor list (simple, grid) pbc = xyz ; Periodic Boundary Conditions in all 3 dimensions coulombtype = PME ; Treatment of long-range electrostatic interactions rcoulomb = 1.0 ; Short-range electrostatic cut-off rvdw = 1.0 ; Short-range Van der Waals cut-off ``` In our case, no errors should occur when generating a `.tpr` file. However, in some other cases, it is possible the execution of the command will be stopped due to a warning like below: (The warning below is from the case of the binding complex at pH 6.) ``` WARNING 1 [file complex_GMX.top, line 17625]: You are using Ewald electrostatics in a system with net charge. This can lead to severe artifacts, such as ions moving into regions with low dielectric, due to the uniform background charge. We suggest to neutralize your system with counter ions, possibly in combination with a physiological salt concentration. ``` If this warning occurs, generally it should be fine to just ignore it, since the next step is to use `genion` program in GROMACS to neutralize the system. After all, we have to run `grompp` before running `genion`, so it is natural to get this warning at this point. To ignore the warning, just use the flag `-maxwarn` so that the command would be ``` gmx grompp -f ions.mdp -c complex_sol.gro -p ../Topology/complex_GMX.top -o complex_ions.tpr -maxwarn 1 ``` However, before running the command above, usually there is another thing that we should take care of. In some cases, you might see that in addition to a warning, there is also a note in the `STDOUT` of `grompp` which looks like this: ``` NOTE 1 [file complex_GMX.top, line 17625]: System has non-zero total charge: 2.999915 Total charge should normally be an integer. See http://www.gromacs.org/Documentation/Floating_Point_Arithmetic for discussion on how close it should be to an integer. ``` This note indicates that the total charge of the system is not an integer as it should be. In fact, it is common to get charges that are not exactly an integer, which is just a result of rounding. While the difference between the charges in the topology file (2.999915) and the real value (in this case, 3) is very small, it might be better to spread this difference throughout several atoms. Since the difference is quite small, it would not matter too much which atoms are selected and modified accordingly. To minimize the influence of adjusting the charges, here we can choose to spread the charge difference (0.000085) throughout the protein that is furthest from the peptide, which is residue 106 THR. Since THR has 15 atoms, to spread out the charge difference as even as possible, we choose to add a charge of 0.000005 to each of the first 5 atoms and 0.000006 to the remaining 10 atoms by modifying the topology file `complex_GMX.top`. Although the note will still probably present after the adjustment (showing that a total charge is closer to 3), at least we do out best we can do here. In fact, such a small charge difference should not cause any observable influence in the following simulations. #### Section 4-3-2. Addition of sodium and chlordie ions To use `genion` to add sodium and chloride ions, we first have to estimate how many ions should be added, which requires the consideration of the volume of the box and the physiological concentration used. Since we just set the salinity as 0.15 M when using H++ (which means that [NaCl] = 0.15 M), given the volume of the box (222.58 $nm^{3}$), we can calculate the number of the sodium ions as follows: $$(212.339 \times 10^{-27}) [m^{3}] \times 1000 [\frac{L}{m^{3}}] \times 0.2 [\frac{mol}{L}] \times (6.02 \times 10^{23})[\frac{1}{mol}] \approx25.565$$ As a result, since the net charge of the whole binding complex is 0 at pH 7.4 (+3 for the protein and -3 for the peptide ligand), 26 sodium ions and 26 chloride ions should be added. Accordingly, in the same folder `Solvation`, execute the following command and select Group 15 (`SOL`) as the continuous group of solvent molecules in the prompt: ``` gmx genion -s complex_ions.tpr -o complex_ions.gro -p complex_GMX.top -pname NA -nname CL -np 26 -nn 26 ``` Similarly, in folder `Topology`, a new topology file `complex_GMX.top` will be produced with ions added and the old one will be renamed as `#complex_GMX.top.2#`. ## Section 5. Energy minimization, equilibration and a short molecular dynamics (MD) simulation ### Section 5-1. Energy minimization Having the solvated, electroneutral system assembled, before we begin dynamics, we must ensure that the system has no steric clashes or inappropriate geometry by relaxing the solvated system to a low-energy state through energy minimization. To start with, in a newly created folder `EM`, execute a `grompp` command first to get a `.tpr` file: ``` gmx grompp -f em.mdp -c ../Solvation/complex_ions.gro -p ../Topology/complex_GMX.top -o complex_em.tpr ``` where the parameter file `em.mdp` has the following content: ``` ; em.mdp - used as input into grompp to generate em.tpr ; All unspecified parameters adopt their own default values. ; Run Control integrator = steep ; Algorithm (steep = steepest descent minimization) nsteps = 500000 ; (0) maximum number of steps to integrate or minimize ; Energy minnimization emtol = 100.0 ; (10.0) Stop minimization when the maximum force < 100.0 kJ/mol/nm emstep = 0.01 ; (0.01) [nm] Minimization step size ; Neighbor searching/Electrostatics/Van der Waals cutoff-scheme = Verlet ; Buffered neighbor searching nstlist = 10 ; (10) Frequency to update the neighbor list and long range forces ns_type = grid ; Method to determine neighbor list (simple, grid) pbc = xyz ; Periodic Boundary Conditions in all 3 dimensions coulombtype = PME ; Treatment of long range electrostatic interactions rcoulomb = 1.0 ; Short-range electrostatic cut-off rvdw = 1.0 ; Short-range Van der Waals cut-off ``` Then, run `mdrun` command that requires the `.tpr` file just produced: ``` gmx mdrun -s complex_em.tpr -o complex_em.trr -c complex_em.gro -g em.log -e em.edr ``` As a result, the following information will be printed out as the energy minimization is finished: ``` Steepest Descents converged to Fmax < 100 in 10645 steps Potential Energy = -3.5183519e+05 Maximum force = 9.8165977e+01 on atom 1876 Norm of force = 3.6908679e+00 ``` ### Section 5-2. Equilibration Energy minimization ensures that we have a reasonable starting structure, in terms of geometry and solvent orientation. To begin real dynamics, we must equilibrate the solvent and ions around the protein and the ligand. If we were to attempt unrestrained dynamics at this point, the system may collapse. The reason is that the solvent is mostly optimized within itself, and not necessarily with the solute. It needs to be brought to the temperature we wish to simulate and establish the proper orientation about the solute (the protein and the ligand). Since NPT ensemble is relatively close to a typical experimental setup, in our case, we will perform equilibration and the subsequent short MD simulation both in NPT ensemble. While this is generally not necessary, instead of performing NPT equilibration directly, in this case, we choose to perform NVT equilibration first (and then NPT equilibration), since it is often more robust to first equilibrate the temperature of the system under an NVT ensemble before applying a barostat to control the temperature. The simultaneous combination of velocity generation and coordinate scaling under the influence of barostat can introduce instabilities in a system that may be far from equilibrium. To get started, create a folder called `Equil` in the directory same level of `Topology`. Then, in folder `Equil`, create another tow folders: `NVT` and `NPT.` #### Section 5-2-1. NVT equilibration To run an NVT equilibration, execute the following command in the folder `NVT` to generate a `.tpr` file to run `mdrun`: ``` gmx grompp -f nvt_equil.mdp -c ../EM/complex_em.gro -p ../Topology/complex_GMX.top -o complex_equil.tpr ``` where `nvt_equil.mdp` has the following content: ``` ; Run control integrator = md-vv tinit = 0 dt = 0.002 nsteps = 100000 ; 200 ps comm-mode = Linear nstcomm = 1 ; Output control nstlog = 1000 nstcalcenergy = 1 nstenergy = 1000 nstxout-compressed = 1000 ; Neighborsearching and short-range nonbonded interactions nstlist = 10 ns_type = grid pbc = xyz rlist = 1.0 ; Electrostatics cutoff-scheme = verlet coulombtype = PME coulomb-modifier = Potential-shift-Verlet rcoulomb-switch = 0.89 rcoulomb = 0.9 ; van der Waals vdw-type = Cut-off vdw-modifier = Potential-switch rvdw-switch = 0.85 rvdw = 0.9 ; Apply long-range dispersion corrections for Energy and Pressure DispCorr = AllEnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.10 ; EWALD/PME/PPPM parameters fourier_nx = 0 fourier_ny = 0 fourier_nz = 0 pme_order = 4 ewald_rtol = 1e-05 ewald_geometry = 3d epsilon_surface = 0 ; Temperature coupling tcoupl = v-rescale nsttcouple = 1 tc_grps = System tau_t = 0.5 ref_t = 298 ; Pressure coupling is on for NPT pcoupl = no ; no pressure for NVT equilibration gen_vel = yes gen-temp = 298 gen-seed = 6722267; need to randomize the seed each time. ; options for bonds constraints = h-bonds ; we only have C-H bonds here ; Type of constraint algorithm constraint-algorithm = lincs ; Highest order in the expansion of the constraint coupling matrix lincs-order = 12 lincs-iter = 2 ``` With `complex_equil.tpr`, run the following command to perform NVT equilibration: ``` gmx mdrun -s complex_equil.tpr -o complex_equil.trr -c complex_equil.gro -g equil.log -e equil.edr ``` As a result, it took about 9 to 10 minutes to finish the equilibration. After the equilibration is complete, we can use `gmx energy` to output `xvg` files containing the data of potential and temperature as a function of time. As shown below, it is clear that the temperature of the system quickly reached the target value (298 K) and remained stable over the remainder of the equilibration. (Although there is fluctuation, the average is no longer drifting. The average temperature over the last 100 ps of equilibration is 297.316 K.) Similarly, the potential energy quickly reached a plateau, which indicates that the NVT equilibration here should be adequate. <img src=https://i.imgur.com/URkf2If.png width="345"> <img src=https://i.imgur.com/V1DRM9l.png width="345"> #### Section 5-2-2. NPT equilibration After the NVT equilibration is completed, we then want to run another NPT equilibration to reach the same ensemble in the MD simulation later (also NPT ensemble). To generate a `.tpr` file for running an NPT equilibration, in folder `NPT`, execute the following command: ``` gmx grompp -f npt_equil.mdp -c ../NVT/complex_equil.gro -t ../NVT/state.cpt -p ../../Topology/complex_GMX.top -o complex_equil.tpr ``` Note that in the command above, the checkpoint file `NVT/state.cpt` from NVT equilibration is passed to the `-t` option. The checkpoint file contains a complete description of the thermodynamic state of the system, in high precision. To continue a simulation, the checkpoint file must be passed to `grompp`. If it is omitted, information regarding velocities, thermostat variables, and high-precision coordinates will be lost. In addition, the parameter file `md.mdp` is exactly the same as `nvt_equil.mdp` except (check one of the [discussions on GROMACS mailing list](https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2014-April/088634.html) or [a GROMACS tutorial by Dr. Lemkul](http://www.mdtutorials.com/gmx/complex/06_equil.html) for more information about NPT equilibration): - Pressure coupling Instead of using `pcoupl = no` in the NVT equilibration, here we use Berendsen barostat (`pcoupl = Berendsen`). Although Berendsen barostat is actually not a good barostat for MD simulation since it artificially suppresses fluctuations, in the equilibration process fluctuation, suppression is wanted. (In production MD simulation, Parrinello-Rahman barostat is recommended.) - `continuation = yes` In the section of constraint algorithm, we set the parameter `continuation` to `yes`. (In the previous NVT equilibration, the default value `no` was adopted, since the constraints need to be solved at the first time step.) This setting indicates that this simulation run is the continuation of a previous run (NVT equilibration), in which constraints were applied. To preserve continuity, the constraints should not be re-solved, and the input coordinate should be assumed to satisfy the constraint algorithm being applied. - `gen_vel = no` Velocities are read from the checkpoint file of the NVT equilibration, so there is no need to resample the velocity from the Maxwell-Boltzmann distribution. Re-generating the velocity negates the purpose of prior equilibration completely. With `complex_equil.tpr`, we can perform an NPT equilibration by running the following command: ``` gmx mdrun -s complex_equil.tpr -o complex_equil.trr -c complex_equil.gro -g equil.log -e equil.edr ``` As a result, it also took about 9 to 10 minutes to complete the equilibration. Similarly, using `gmx energy`, we can analyze the pressure and the volume as a function of time. As shown below, the pressure of the system is $-8.1613 \pm 174.302$ (bar), which seems very different from our target pressure (1 bar). However, the pressure itself is a quantity that fluctuates wildly in a microscopic system. As a result, instantaneous values are going to vary considerably, and average values may not correspond exactly to the desired target value. However, given the huge standard deviation of pressure during NPT equilibration, the average obtained during the simulation is statistically indistinguishable from the target of 1 bar. An additional indicator of the convergence of NPT equilibration is the density or the volume of the system, which are less prone to the fluctuations observed in the instantaneous pressure values. As shown in the figure on the right, the volume shrank to about 209 $nm^{3}$ and the average stopped drifting after about 30 ps of equilibration. (The average volume over the last 100 ps is 208.793 $nm^{3}$.) Since the volume is essentially constant over the duration of NPT equilibration, it is reasonable to conclude that the system is adequately equilibrated and unrestrained simulation can be initiated. <img src=https://i.imgur.com/S35pQiJ.png width=345> <img src=https://i.imgur.com/N7Yg8pH.png width=345> ### Section 5-3. A short molecular dynamics (MD) simulation To provide a reasonable structure as the input to further advanced sampling simulations, after equilibration, we conduct a short MD simulation of 5 ns, which should be sufficient for the system to relax. To run the simulation, first, create another folder `short_MD` and in the directory same level as `Topology` and execute the following command in the folder: ``` gmx grompp -f md.mdp -c ../Equil/NPT/complex_equil.gro -p ../Topology/complex_GMX.top -o complex_md.tpr -t ../Equil/NPT/state.cpt ``` Similarly, `-t` flag is used to preserve the previous work of NPT equilibration. The parameter file (`md.mdp`) used here is exactly the same as `npt_equil.mdp` except for the number of time steps (2500000, 5 ns instead of 100000, 200 ps) and the barostat (Parrinello-Rahman barostat instead of Berendsen barostat). As mentioned, a P-R barostat is generally recommended in production MD simulations. ``` gmx mdrun -s complex_md.tpr -o complex_md.trr -c complex_md.gro -g md.log -e md.edr ``` ### Section 5-4. Postprocessing of the simulation outputs #### Section 5-4-1. Preparation of an index file Index groups are necessary for almost every GROMACS program. All these programs can generate default index groups. However, when analyzing the data or apply restraint as we will do in the following advanced sampling simulations, sometimes we need index groups other than the ones defined in the default index file. In this case, we have to use `make_ndx` command to define our own special index groups. First of all, we might need to center the whole binding complex in the simulation box when visualizing the trajectory later. While there is a group called "Protein" in the default index file that should include both the protein and the peptide ligand in our case, it actually does not include phosphotyrosine in the ligand, since phosphotyrosine was not recognized by GROMACS but with aids of AmberTools as we perform parameterization previously. Therefore, we have to define a group for the binding complex on our own. To start with, execute the command: ``` gmx make_ndx -f complex_md.gro -o complex.ndx ``` In the prompt, type `a 1-1878` so that a new group, Group 24, will be created. To rename this newly created group, type `name 24 Binding_Complex`. To save the file and exit, type `q`. Note that, however, we still need other special index groups defined in the index file. Specifically, in both Hamiltonian replica exchange and expanded ensemble, we will have to apply a harmonic distance restraint between the phosphorus atom of PTR and each of the 4 arginine residues that compose the binding pocket to restrain the peptide in the binding pocket to prevent the greasy core problem. Therefore, we need to define these groups using the command below: ``` gmx make_ndx -f complex_md.gro -n complex.ndx -o complex.ndx ``` Note that in the command above, the newly created index file we just made is passed by the `-n` flag and a new index file with the same name will be created. (In this case, the old index file will be renamed as `#complex.ndx.1#`.) Without using the flag `-n`, the group `Binding_Complex` we just defined in the old index file will not appear in the new index file. To define the groups for applying the distance restraints, we first have to find out atoms involved in the 4 arginine residues and the phosphorus atom in PTR. Typically, it is safer to identify these atoms from a `.gro` file with the help of VMD visualization (as elaborated in the next section) to make sure that the right atoms are chosen. The following are the commands for choosing the atoms involved in each entity participating the distance restraints to be applied in advanced sampling simulations: - Peptide ligand (`pep7`, not needed for the restraints we are going to use but recommended to be set up): `a 1742-1878`, `name 25 pep7` - Phosphorus atom in PTR (`P_atom`): `a 1817`, `name 26 P_atom` - `Arg1`: `ri 15`, `name 27 Arg1` - `Arg2`: `ri 34`, `name 28 Arg2` - `Arg3`: `ri 36`, `name 29 Arg3` - `Arg4`: `ri 56`, `name 30 Arg4` After the execution of the command `make_ndx`, the bottom part of the index file `complex.ndx` should be as follows (the following is the output of the command `tail complex.ndx -n 25`): ``` [ pep7 ] 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 [ P_atom ] 1817 [ Arg1 ] 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 [ Arg2 ] 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 [ Arg3 ] 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 [ Arg4 ] 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 ``` #### Section 5-4-2. Visualization of the output structure (Note: the figures and the data shown in Section 5-4-2. are actually from the simulation of the binding complex at pH6, but the visuzliation method and the things we should care about are exactly the same.) After the simulation is complete, it is crucial to see if the simulation is at least visually reasonable. Using Visual Molecular Dynamics (VMD), we are able to visualize the output structure from the short MD simulation. As a result, as you can see below, the box is rectangular rather than dodecahedral as we specified when using `editconf` command, which is a result of the fact that GROMACS always keeps the particles in a brick-shaped volume for efficiency even when simulation using a triclinic box. (Note: rhombic dodecahedra are special cases of triclinic unit cells.) One can use `trjconv` program to convert the trajectory to a different unit-cell representation. (See [GROMACS documentation](http://manual.gromacs.org/documentation/2019/reference-manual/algorithms/periodic-boundary-conditions.html)) for more information.) <center><img src=https://i.imgur.com/GAdu72Q.png width=600></center> </br> From the figure, it can also be observed that the protein is out of the box and detached from the peptide ligand, this is not a result of the peptide unbound from the protein, but a result of the protein crossing the periodic boundaries of the box. (To confirm that, just observe which side of the protein the peptide is on and compare that to the initial PDB structure, say `complex_GMX.gro`.) To recenter the protein and wrap the periodic boundaries, execute the command below in the Tk Console in VMD: ``` pbc wrap -centersel "protein" -center com -compound residue -all ``` As a result shown below, a triclinic representation of the system is recovered, with protein centered. <center><img src=https://i.imgur.com/qUY2btY.png width=600></center> </br> In addition, we can use the command as follows in Tk Console to save the structure as a `.gro` file if needed after the adjustment so that we don't have to adjust the visualization every time for this structure: ``` animate write gro complex_center_vmd.gro ``` #### Section 5-4-3. Visualization of the trajectory As in any simulation conducted with periodic boundary conditions, molecules may appear "broken" or may "jump" back and forth across the box. To recenter the protein and rewrap the molecules within the unit cell to recover the desired rhombic dodecahedral shape, invoke `trjconv` in the following two-step procedures (Create the folder `animated_traj` in `short_MD` to start with.): - **Step 1**: Center on the binding complex and remove jumps by `-pbc nojump` ``` gmx trjconv -s ../complex_md.tpr -f ../traj_comp.xtc -o complex_nojump.xtc -center -pbc nojump -n ../complex.ndx ``` - **Step 2**: Center on the binding complex and remove pbc by `-pbc mol -ur compact` ``` gmx trjconv -s ../complex_md.tpr -f complex_nojump.xtc -o complex_center.xtc -center -pbc mol -ur compact -n ../complex.ndx -dt 250 ``` There are are two prompts for both commands. the first one is "Select group for centering" and the second one is "Select group for output". Enter `24` and `0` sequentially to select `Binding_Complex` and `System`, respectively. To visualize the trajectory, execute the command: ``` vmd ../complex_md.gro complex_center.xtc ``` As a result, after using `trjconv` commands, the dodecahedral box is recovered and no molecules are broken. (Originally, both the protein and the ligand were broken in the trajectory `traj_comp.xtc`. Check this by the command `vmd complex_md.gro traj_comp.xtc`.) <center><img src=https://i.imgur.com/Z81uNED.png width=600></center> </br> However, as can be seen in the figure above, the binding complex is not at the center of the box. To fix this, execute the exactly the same command that we've just used in Tk Console in VMD: ``` pbc wrap -centersel "protein" -center com -compound residue -all ``` Also, it is recommended to output a trajectory (`complex_center_vmd.trr`) after the adjustment, so we don't have to recenter the complex every time we visualize it: ``` animate write trr complex_center_vmd.trr ``` (Note that VMD is not about to write an `.xtc` file.) From the animate trajectory, we can see that the peptide is stably bound to the protein and the N-terminal of the peptide has a much larger fluctuation than the C-terminal, which just validated the reason that the first 10 residues were missing in the X-ray crystallography experiment. (Note that only the ASN residue was rebuilt though.) One important thing to note here is that GROMACS didn't do anything wrong in our simulation. The "problems" we saw before the adjustment by VMD are all just a matter of VMD visualization. The correct physics is still preserved in `complex_md.gro` and `traj_comp.xtc`. However, having more straightforward visualization, `complex_center_vmd.gro` and `complex_center_vmd.trr` do not have the correct physics after the adjustment, so the files are only for visualization purposes. To validate this, we can examine the energy as a function of time of the trajectory before and after the adjustment. (Also note that the atom coordinates in `complex_center_vmd.gro` are different from the ones in `complex_md.gro`.) To get the energy data of `complex_md.gro`, execute the following command and enter `13` to choose to output the data of `Total-Energy` to the `.xvg` file: ``` gmx energy -f md.edr -o energy.xvg ``` However, right now we don't have the energy data of the trajectory after the VMD adjustment (`complex_center_vmd.trr`), so we have to execute the following command the generate an `.edr` file for the adjusted trajectory: ``` gmx mdrun -rerun complex_center_vmd.trr -s ../complex_md.tpr ``` However, the following error would occur: ``` Fatal error: Step 0: The total potential energy is 1.08418e+14, which is extremely high. The LJ and electrostatic contributions to the energy are 1.08418e+14 and -365625, respectively. A very high potential energy can be caused by overlapping interactions in bonded interactions or very large coordinate values. Usually, this is caused by a badly- or non-equilibrated initial configuration, incorrect interactions or parameters in the topology. ``` Using `gmx energy` to get the data of the Coulombic potential (enter 9 to choose `Coulombic-(SR)`) of the original trajectory (`traj_comp.xtc`), we can obtain that the average of the Coulombic potential is -371339. Therefore, the VMD adjustment didn't change the Coulombic potential too much and the high total energy almost came from the excessively high/erroneous Lennard-Jones potential. Therefore, it is apparent that the trajectory after VMD adjustment is not physically correct and can only serve as a trajectory for visualization purposes. Similarly, we can also use `-rerun` option to get `.edr` files for `complex_nojump.xtc` and `comple_center.xtc` and use `gmx energy` to get the energy data. ![](https://i.imgur.com/kqpI5qb.png) As shown above, although the energy of `complex_center.xtc` and `complex_nojump.xtc` seems reasonable, `trjconv` command still changed the physics somehow. In addition, for both the "rerun" simulations, the following warning was shown: ``` WARNING: Some frames do not contain velocities. Ekin, temperature and pressure are incorrect, the virial will be incorrect when constraints are present. ``` Therefore, to extract a snapshot that is physically correct to proceed with further advanced sampling simulations, utilizing the original trajectory `traj_comp.xtc` is the safest way. #### Section 5-4-5. Extraction of a physically reasonable snapshot As mentioned, the goal of this tutorial is to provide input files for Hamiltonian replica exchange and expanded ensemble simulations. As running these two types of advanced sampling simulations in an NPT ensemble would generate problematic results, here we must perform the simulations in an NVT ensemble. Since the vanilla simulation was performed in an NPT ensemble, we have to make sure that the free energy in the NPT vanilla simulation is equivalent to the free energy in the NVT Hamiltonian replica exchange/expanded ensemble simulations. To do this, consider: $$G=G(N, P, T) = VdP -SdT + \mu dN$$ $$A=A(N, V, T) = -PdV -SdT + \mu dN$$ Obviously, the Gibbs free energy in an NPT ensemble is equal to the Helmholtz free energy in an NVT ensemble **given that the fixed in the NVT ensemble corresponds to the pressure in the NPT ensemble**. Therefore, to conduct an NVT Hamiltonian replica exchange/expanded ensemble simulation, we have to extract a snapshot from the trajectory of the NPT vanilla simulation and make sure that the volume of that snapshot is close to the average volume in the NPT simulation. Extracting the energy data using `gmx energy` from `md.edr` and using `gmx_plot2d.py` to analyze the data, we can obtain that the average volume over 5 ns of MD simulation is 208.8357 $nm^{3}$ and the snapshot that has a volume closet to this value is the snapshot at 3.478 ns, whose volume is 208.8350 $nm^{3}$. Therefore, we extract this snapshot using the following command (enter `0` to select the whole system in the prompt): ``` gmx trjconv -s complex_md.tpr -f traj_comp.xtc -o complex_avg.gro -dump 3812 ``` This is the structure file that can finally serve as the input structure of further advanced sampling simulations! To see how it looks like, we similarly use `trjconv` commands as below and use `pbc wrap` command in the Tk Console of VMD: ``` gmx trjconv -s complex_md.tpr -f complex_avg.gro -o complex_avg_nojump.gro -center -pbc nojump -n complex.ndx gmx trjconv -s complex_md.tpr -f complex_avg_nojump.gro -o complex_avg_center.gro -center -pbc mol -ur compact -n complex.ndx ``` #### Section 5-4-6. Collection of all the input files for advanced sampling methods After we finally obtain all the input files required for further advanced sampling simulations, it is recommended to collect all the final input files in one folder. To do this, start with creating the folder `Final_inputs` and execute the following commands in the folder: ``` cp ../Topology/complex_GMX.top PLCpep7.top cp ../short_MD/complex_avg.gro PLCpep7.gro cp ../short_MD/complex.ndx PLCpep7.ndx ``` ## Section 6. The preparation process in brief In this section, we provide the reader a quick look at all the commands used in the preparation process in Section 2 to Section 5 in case that the reader is already familiar with the process. - **Modeling of the missing residues** - Execute `python sequence.py` to get an alignment file. ```python=1 from modeller import * # Get the sequence of the 1qg8 PDB file and write to an sequence file code = '4k45' e = environ() m = model(e, file=code) aln = alignment(e) aln.append_model(m, align_codes=code) aln.write(file=code+'.seq') ``` - Use the sequence file to prepare an alignment file. Remember to change the number of residue since we have a non-standard amino acids. (In this case, change `A:+109` into `A:+110` in the sequence file.) - Execute `python build.py` to model the missing residues ```python=1 from modeller import * from modeller.automodel import * # Load the automodel class log.verbose() env = environ() # directories for input atom files env.io.atom_files_directory = ['.', '../atom_files'] env.io.hetatm = True a = automodel(env, alnfile = 'alignment.ali', knowns = '4k45', sequence = '4k45_fill') a.starting_model = 1 a.ending_model = 1 a.make() ``` - **Topology preparation** - First create a folder for this step: `Topology`. - Preparation of the protein topology file - Execute `cp ../Build/4k45_fill.B99990001.pdb ./PLCC.pdb` - Cut the part of peptide and save it as `pep7.pdb`. - Process `PLCC.pdb` with H++ web server (pH = 7.4). - Download the output files as `PLCC.top` and `PLCC.crd`. Convert the files by executing `python GAFF/acpype.py -x PLCC.crd -p PLCC.top`. - Preparation of the peptide topology file - Change HETATM records in `pep7.pdb` to ATOM records. - Download the output files as `pep7.top` and `pep7.crd`. Convert the files by executing `python GAFF/acpype.py -x pep7.crd -p pep7.top`. - Combination of the topology files - Execute the commands below: ``` cp PLCC_GMX.gro complex_GMX.gro cp PLCC_GMX.top complex_GMX.top ``` - Cut the coordinates of `pep7_GMX.gro` and append to the bottom of `complex_GMX.gro`. Change the numer of atoms from 1744 to 1878 and save the file. - Open up `complex_GMX.top` and `pep7_GMX.top` and compare the `[ atomtypes ]` directives, append the atom types that only appear in `pep7_GMX.top` to the `[ atomtypes ]` directive of `complex_GMX.top`. The appdended atom types are: ``` OV OV 0.00000 0.00000 A 2.95101e-01 7.11280e-01 ; 1.66 0.1700 P P 0.00000 0.00000 A 3.74177e-01 8.36800e-01 ; 2.10 0.2000 OT OT 0.00000 0.00000 A 2.91092e-01 8.78640e-01 ; 1.63 0.2100 ``` - Copy all the directives other than `[ defaults ]`, `[ atomtypes ]`, `[ system ]` and `[ molecules ]` in `pep7_GMX.top`, including `[ moleculetype ]`, `[ atoms ]`, `[ pairs ]`, `[ bonds ]`, `[ angles ]` and `[ dihedrals ]` (including the propers and the impropers). Paste all these directives to `complex_GMX.top`. - Change the system name in `[ system ]` from `PLCC` to `complex in water` and add `pep7 1` to the bottom of the file. Save the file to complete the combination of the topology files. - Modification of the topology file `complex_GMX.top` - Before `[ system ]` directive, add: ``` ; Include water topology #include "amber99sb-ildn.ff/tip3p.itp" ; Include topology for ions #include "amber99sb-ildn.ff/ions.itp" ``` - Add the following atom types to `[ atomtypes ]` directive: ``` OW OW 0.00000 0.00000 A 0.315075 0.635968 HW HW 0.00000 0.00000 A 0 0 Cl Cl 35.45 0.00000 A 4.40104e-01 4.18400e-01 Na Na 22.99 0.00000 A 3.32840e-01 1.15897e-02 IB IB 0.00000 0.00000 A 8.90899e-01 4.18400e-01 C0 C0 0.00000 0.00000 A 3.05240e-01 1.92376e+00 MG MG 0.00000 0.00000 A 1.41225e-01 3.74342e+00 K K 0.00000 0.00000 A 4.73602e-01 1.37235e-03 Rb Rb 0.00000 0.00000 A 5.26699e-01 7.11280e-04 Cs Cs 0.00000 0.00000 A 6.04920e-01 3.37230e-04 Li Li 0.00000 0.00000 A 2.02590e-01 7.65672e-02 Zn Zn 0.00000 0.00000 A 1.95998e-01 5.23000e-02 ``` - Tweak the partial charges if needed. Spread out the difference of the charges to the residue furtherst to the structural entity of most interest. - Save the file to complete the modification. - **Solvation and neutralization of the system** - Execute `gmx editconf -f ../Topology/complex_GMX.gro -o complex_box.gro -bt dodecahedron -d 1.0 -c` - Execute `gmx solvate -cp complex_box.gro -p ../Topology/complex_GMX.top -o complex_sol.gro -cs ` - After adding the solvent molecules, make sure that `[ molecules ]` directive in `complex_GMX.top` is well-formatted. - Create `ions.mdp` and execute `gmx grompp -f ions.mdp -c complex_sol.gro -p ../Topology/complex_GMX.top -o complex_ions.tpr -maxwarn 1` - The content of `ions.mdp`: ``` ; ions.mdp - used as input into grompp to generate ions.tpr ; All unspecified parameters adopt their own default values. ; Run Control integrator = steep ; Algorithm (steep = steepest descent minimization) nsteps = 500000 ; (0) maximum number of steps to integrate or minimize ; Energy minnimization emtol = 10.0 ; (10.0) Stop minimization when the maximum force < 100.0 kJ/mol/nm emstep = 0.01 ; (0.01) [nm] Minimization step size ; Neighbor searching/Electrostatics/Van der Waals cutoff-scheme = Verlet ; Buffered neighbor searching nstlist = 10 ; (10) Frequency to update the neighbor list and long-range forces ns_type = grid ; Method to determine neighbor list (simple, grid) pbc = xyz ; Periodic Boundary Conditions in all 3 dimensions coulombtype = PME ; Treatment of long-range electrostatic interactions rcoulomb = 1.0 ; Short-range electrostatic cut-off rvdw = 1.0 ; Short-range Van der Waals cut-off ``` - Execute `gmx genion -s complex_ions.tpr -o complex_ions.gro -p ../Topology/complex_GMX.top -pname NA -nname CL -np 26 -nn 26` to add counterions. (Eneter `15` in the prompt to choose `SOL`.) - **Energy minimization, equilibration and MD simulation** - Energy minmization - Create `em.mdp` and execute `gmx grompp -f em.mdp -c ../Solvation/complex_ions.gro -p ../Topology/complex_GMX.top -o complex_em.tpr` - The content of `em.mdp`: ``` ; em.mdp - used as input into grompp to generate em.tpr ; All unspecified parameters adopt their own defaults values. ; Run Control integrator = steep ; Algorithm (steep = steepest descent minimization) nsteps = 500000 ; (0) maximum number of steps to integrate or minimize ; Energy minnimization emtol = 100.0 ; (10.0) Stop minimization when the maximum force < 100.0 kJ/mol/nm emstep = 0.01 ; (0.01) [nm] Minimization step size ; Neighbor searching/Electrostatics/Van der Waals cutoff-scheme = Verlet ; Buffered neighbor searching nstlist = 10 ; (10) Frequency to update the neighbor list and long range forces ns_type = grid ; Method to determine neighbor list (simple, grid) pbc = xyz ; Periodic Boundary Conditions in all 3 dimensions coulombtype = PME ; Treatment of long range electrostatic interactions rcoulomb = 1.0 ; Short-range electrostatic cut-off rvdw = 1.0 ; Short-range Van der Waals cut-off ``` - Execute `gmx mdrun -s complex_em.tpr -o complex_em.trr -c complex_em.gro -g em.log -e em.edr`. - NVT equilibration - Create `nvt_equil.mdp` and execute `gmx grompp -f nvt_equil.mdp -c ../../EM/complex_em.gro -p ../../Topology/complex_GMX.top -o complex_equil.tpr`. - The content of `nvt_equil.mdp`: ``` ; Run control integrator = md tinit = 0 dt = 0.002 nsteps = 100000 ; 200 ps comm-mode = Linear nstcomm = 1 ; Output control nstlog = 1000 nstcalcenergy = 1 nstenergy = 1000 nstxout-compressed = 1000 ; Neighborsearching and short-range nonbonded interactions nstlist = 10 ns_type = grid pbc = xyz rlist = 1.0 ; Electrostatics cutoff-scheme = verlet coulombtype = PME coulomb-modifier = Potential-shift-Verlet rcoulomb-switch = 0.89 rcoulomb = 0.9 ; van der Waals vdw-type = Cut-off vdw-modifier = Potential-switch rvdw-switch = 0.85 rvdw = 0.9 ; Apply long range dispersion corrections for Energy and Pressure DispCorr = AllEnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.10 ; EWALD/PME/PPPM parameters fourier_nx = 0 fourier_ny = 0 fourier_nz = 0 pme_order = 4 ewald_rtol = 1e-05 ewald_geometry = 3d epsilon_surface = 0 ; Temperature coupling tcoupl = v-rescale nsttcouple = 1 tc_grps = System tau_t = 0.5 ref_t = 298 ; Pressure coupling is on for NPT pcoupl = no ; no pressure for NVT equilibration gen_vel = yes gen-temp = 298 gen-seed = 6722267; need to randomize the seed each time. ; options for bonds constraints = h-bonds ; we only have C-H bonds here ; Type of constraint algorithm constraint-algorithm = lincs ; Highest order in the expansion of the constraint coupling matrix lincs-order = 12 lincs-iter = 2 ``` - Execute `gmx mdrun -s complex_equil.tpr -o complex_equil.trr -c complex_equil.gro -g equil.log -e equil.edr`. - NPT equilibration - Create `npt_equil.mdp` and execute `gmx grompp -f npt_equil.mdp -c ../NVT/complex_equil.gro -t ../NVT/state.cpt -p ../../Topology/complex_GMX.top -o complex_equil.tpr` - The content of `npt_equil.mdp: ``` ; Run control integrator = md tinit = 0 dt = 0.002 nsteps = 100000 ; 200 ps comm-mode = Linear nstcomm = 1 ; Output control nstlog = 1000 nstcalcenergy = 1 nstenergy = 1000 nstxout-compressed = 1000 ; Neighborsearching and short-range nonbonded interactions nstlist = 10 ns_type = grid pbc = xyz rlist = 1.0 ; Electrostatics cutoff-scheme = verlet coulombtype = PME coulomb-modifier = Potential-shift-Verlet rcoulomb-switch = 0.89 rcoulomb = 0.9 ; van der Waals vdw-type = Cut-off vdw-modifier = Potential-switch rvdw-switch = 0.85 rvdw = 0.9 ; Apply long range dispersion corrections for Energy and Pressure DispCorr = AllEnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.10 ; EWALD/PME/PPPM parameters fourier_nx = 0 fourier_ny = 0 fourier_nz = 0 pme_order = 4 ewald_rtol = 1e-05 ewald_geometry = 3d epsilon_surface = 0 ; Temperature coupling tcoupl = v-rescale nsttcouple = 1 tc_grps = System tau_t = 0.5 ref_t = 298 ; Pressure coupling is on for NPT pcoupl = Berendsen pcoupltype = isotropic tau-p = 5.0 ref-p = 1.0 ; bar compressibility = 4.5e-5 ; For water at 1 atm and 300 K the comp ressibiilty is 4.5e-5 bar^(-1) ; refcoord_scaling should do nothing since there are no position restraints. gen_vel = no ; gen-temp = 298 ; gen-seed = 6722267; need to randomize the seed each time. ; options for bonds constraints = h-bonds ; we only have C-H bonds here ; Type of constraint algorithm constraint-algorithm = lincs continuation = yes ; Highest order in the expansion of the constraint coupling matrix lincs-order = 12 lincs-iter = 2 ``` - Execute `gmx mdrun -s complex_equil.tpr -o complex_equil.trr -c complex_equil.gro -g equil.log -e equil.edr` - MD simulation - Create `md.mdp` and execute `gmx grompp -f md.mdp -c ../Equil/NPT/complex_equil.gro -p ../Topology/complex_GMX.top -o complex_md.tpr -t ../Equil/NPT/state.cpt` - The content of `md.mdp`: ``` ; Run control integrator = md tinit = 0 dt = 0.002 nsteps = 2500000 ; 5 ns comm-mode = Linear nstcomm = 1 ; Output control nstlog = 1000 nstcalcenergy = 1 nstenergy = 1000 nstxout-compressed = 1000 ; Neighborsearching and short-range nonbonded interactions nstlist = 10 ns_type = grid pbc = xyz rlist = 1.0 ; Electrostatics cutoff-scheme = verlet coulombtype = PME coulomb-modifier = Potential-shift-Verlet rcoulomb-switch = 0.89 rcoulomb = 0.9 ; van der Waals vdw-type = Cut-off vdw-modifier = Potential-switch rvdw-switch = 0.85 rvdw = 0.9 ; Apply long range dispersion corrections for Energy and Pressure DispCorr = AllEnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.10 ; EWALD/PME/PPPM parameters fourier_nx = 0 fourier_ny = 0 fourier_nz = 0 pme_order = 4 ewald_rtol = 1e-05 ewald_geometry = 3d epsilon_surface = 0 ; Temperature coupling tcoupl = v-rescale nsttcouple = 1 tc_grps = System tau_t = 0.5 ref_t = 298 ; Pressure coupling is on for NPT pcoupl = Parrinello-Rahman pcoupltype = isotropic tau-p = 5.0 ref-p = 1.0 ; bar compressibility = 4.5e-5 ; For water at 1 atm and 300 K the compressibiilty is 4.5e-5 bar^(-1) ; refcoord_scaling should do nothing since there are no position restraints. gen_vel = no ; gen-temp = 298 ; gen-seed = 6722267; need to randomize the seed each time. ; options for bonds constraints = h-bonds ; we only have C-H bonds here ; Type of constraint algorithm constraint-algorithm = lincs continuation = yes ; Highest order in the expansion of the constraint coupling matrix lincs-order = 12 lincs-iter = 2 ``` - Execute `gmx mdrun -s complex_md.tpr -o complex_md.trr -c complex_md.gro -g md.log -e md.edr` - Postprocessing of the MD simulation - Preparation of the index file Execute `gmx make_ndx -f complex_md.gro -o complex.ndx`. In the prompt, specify the index groups as follows: ``` a 1-1878 name 24 Binding_Complex a 1742-1878 name 25 pep7 a 1817 name 26 P_atom ri 15 name 27 Arg1 ri 34 name 28 Arg2 ri 36 name 29 Arg3 ri 56 name 30 Arg4 ``` - Extraction of a reasonable snapshot - Execute `gmx energy -f complex_md.gro -o volume.xvg` and select `Volume` in the prompt. - Use Python script `gmx_plot2d` to plot the volume as a function of time and determine the average volume and the sanpshop having a volume cloest to the average volume. - Execute `gmx trjconv -s complex_md.tpr -f traj_comp.xtc -o complex_avg.gro -dump 3478` to extract the snapshot. - Collection of all the final input files Execute the following commands ``` mkdir Final_inputs && Final_inputs cp ../Topology/complex_GMX.top PLCpep7.top cp ../short_MD/complex_avg.gro PLCpep7.gro cp ../short_MD/complex.ndx PLCpep7.ndx ``` **This is the end of the first part of the tutorial!** For detailed methods for preparing simulation inputs for advanced sampling methods and the application of the workflows introduced here to other similar systems, please refer to [the second part of the tutorial](https://hackmd.io/@WeiTseHsu). --- *If you found the article useful, you're very welcome to share the article or leave a comment below so that I will be motivated to write more articles! If you have any questions about the tutorial, please feel free to contact weitse.hsu@colorado.edu. Thank you for reading this far!*