NMRLipids + PCA

# NMRLipids + PCA The basic strcture and interaction of the PCA elements is as follows. ```sequence participant Loop Loop->>Parser: Call parser Parser->>Loop: Check trajectory Parser->>Loop: AMBER mergerer Parser->>Loop: Concatenated trajectories Loop->>PCA: Call PCA PCA->>Loop: Calculate vectors PCA->>Loop: Calculate projections PCA->>Loop: Calculate autocorrelations Loop->>Time estimator: Call Time Time estimator->>Loop: Equilibration time ``` There are three main classes: - `Parser` - `PCA` - `TimeEstimator` ## `Parser` class Class `Parser` is a basic class to work with the trajectory. It has basic utility for preparing trajectories: 1. Checking if simulation has a correct name 2. Checking if trajectory is downloaded. Downloading, if not 3. Calling AmberMerger to merge head groups and tails for Amber trajectories 4. Concatenate trajectories ```python! class Parser: """ Constructor for Parser: root - path to the directory with trajectories readme - simulation data path - name of a particular trajectory v - verbosity for testing """ def __init__(self, root, readme, path=None, v=True) """ Path validation. Behaviour depends on the input. 1. If ther were any errors, validation fails. Currently the only error tested is that .xtc and .tpr files are not present. This is the case for non-GROMACS trajectories. 2. If path is not provided, parser is iterating over all trajectories. 3. If path is provided and current path is equal to the provided one, parser reports that it finds the trajectory for the analysis. """ def validatePath(self) """ Basic trajectory and TPR download. """ def downloadTraj(self) """ Preparing trajectory. If centered trajectory is found, use it. If whole trajectory is found, use it. Otherwise, call gmx trjconv to make whole trajectory. The selected trajectory is loaded into Universe. """ def prepareTraj(self) """ Create Concatenator and corresponding concatenated trajectories for all lipids available for the trajectory. """ def concatenateTraj(self) ``` Class `Parser` interacts with 2 helper classes `Topology` and `Concatenator`. ### `Topology` class Class `Topology` is a class needed to extract lipid specific data and also to merge lipids from Amber trajectories, where lipid is often represented as 3 residue types. It has the following methods: 1. Simple constructor, which sets the force field,residue names, trajectory, and loads mapping_file 2. Mapping file loader 3. Function that outputs atomNames 4. Checker if the merge is needed. Currently defunct 5. Runner, that returns lists for head, tail1, tail2 orders for merging Currently defunct ```python! class Topology: """ Constructor for Topology: ff - force field name traj - MDAnalysis trajectory lipid_resname - name of lipid mapping_file - path to maping file """ def __init__(self, ff, traj, lipid_resname, mapping_file) """ Basic loader of the mapping file. """ def loadMappingFile(self, mapping_file) """ Extract all names of heavy atoms from the mapping """ def atomNames(self) """ Checker for merge. Currently defunct. TODO: return True for Amber force fields """ def isMergeNeeded(self, fnames=[]) """ Find lipid tails that correspond to a particular head-group. Currentlyy defunct. TODO: get the correspondence from structure """ def runMerger(self) ``` ### `Concatenator` class lass `Concatenator` is a class needed to concatenate trajectory for lipid types. It has the following methods: 1. Simple constructor, which sets the topology,residue names, and trajectory 2. `concatenateTraj` method to do the basic by lipid concatenation 3. `alignTraj` method to perform the alignment of the concatenated trajectory 4. The enveloping `concatenate` method ```python! class Concatenator: """ Constructor for Concatenator: topology - topology for lipid traj - MDAnalysis trajectory lipid_resname - name of lipid """ def __init__(self, topology, traj, lipid_resname) """ concatenateTraj performs basic trajectory concatination. First, it extracts coordinates from trajectory, next, it reshapes the coordinate array, swaps time and resid axes to obtain continuous trajectories of individual lipids (this is needed for autocorrelation time analysis), and finally merges individual lipid trajectories. --- TODO: add merge for Amber """ def concatenateTraj(self) """ alignTraj alignes the concatenated trajectory in two steps: (1) it computes average structure after alignment to the first frame, and (2) it alignes the structure to the calculated average structure in (1). """ def alignTraj(self, concatenated_traj) """ Simple enveloping function to perform concatenation """ def concatenate(self) ``` ## `PCA` class Class `PCA` is a class that actually performs PCA. It has the following methods: 1. Simple constructor, which sets the aligned trajtory, average coordinates, number of lipids, number of frames in the concatenated trajectory and trajectory length in ns 2. `PCA` prepares the trajectory coordinates for analysis and calculates principal components 3. `get_proj projects` the trajectory on principal components 4. `get_lipid_autocorrelation` calculates the autocorrelation timeseries for individual lipid 5. `get_autocorrelations` calculates the autocorrelation timeseries for trajectory ```python! class PCA: """ Constructor for PCA: aligned_traj - np.array with positions of concatenated and aligned trajectory av_pos - np.array with average positions for the lipid n_lipid - number of lipids of particular type in the system n_frames - number of frames in the concatenated trajectory traj_time - trajectory length in ns """ def __init__(self, aligned_traj, av_pos, n_lipid, n_frames, traj_time) """ PCA calculates the PCA. First the data is centered and then covariance matrix is calculated manually. """ def PCA(self) """ Projecting the trajectory on the 1st principal component """ def get_proj(self, cdata) """ Autocorrelation calculation for individual lipid. """ def get_lipid_autocorrelation(self, data, variance, mean) """ Autocorrelation calculation for the trajectory. """ def get_autocorrelations(self) ``` ## `TimeEstimator` class Class `TimeEstimator` is a class that estimates equilbration time from autocorrelation data. It includes the following methods: 1. Simple constructor, which sets the autocorrelation data 2. Helper method `get_nearest_value` that finds the interval in the data where the autocorrelation falls below specific value 3. `timerelax` method calculates the logarithms of autocorrelation and calculates the decay time 4. `calculate_time` method calculates the estimated equilibration time ```python! class TimeEstimator: """ Constructor for PCA: autocorrelation - autocorrelation data """ def __init__(self, autocorrelation) """ get_nearest_value method return the indices that frame the particular value. As an input it gets an array, which is expected to decay. The method tries to find and index (index_A), for which the array gets below the cutoff value for the first time. Then, for index_A-1 the array was larger than the cutoff value for the last time. It means that the function, represented by the array data is equal to cutoff somewhere between timesteps that correspond to index_A-1 and index_A. It can happen, that this index_A is equal 0. Then, method searches for the first occurance, where array is smaller than array[0]. The method returns the interval inbetween the array gets below cutoff. """ def get_nearest_value(self, iterable, value) """ Estimates autocorrelation decay time. """ def timerelax(self) """ Basic enveloping method that estimates equilibration time from autocorrelation decay time. They are linearly connected and the coefficient is calculated experimentally. """ def calculate_time(self) ``` ## Global comunication between classes ```graphviz digraph chargesetter { pad= 0.1 splines= line nodesep= 0.9 ranksep= 0.4 fontname= "Sans-Serif" fontsize= 13 fontcolor= "#2D3436" node [shape = box style = rounded fixedsize = true width = 1.4 height = .7 fontname = "Sans-Serif" fontsize = 13 fontcolor = "#2D4436"] MainLoop [color = "red", label="main", pos="0,10!", constraint=false] DataBankEntry [color = "blue", label="DataBank\nentry"] Parser [color = "black", label="Parser"] lipid1 [color = "magenta", label="lipid1"] lipid2 [color = "magenta", label="lipid2"] Topology [color = "black", label="Topology"] Concatenator [color = "black", label="Concatenator"] conctrj1 [color = "magenta", label="concatenated\ntrajectory 1"] conctrj2 [color = "magenta", label="concatenated\ntrajectory 2"] PCA [color = "black"] TimeEstimator [color = "black"] acordata1 [color = "magenta", label="Autocorrelation\ndata 1"] acordata2 [color = "magenta", label="Autocorrelation\ndata 2"] MainLoop->DataBankEntry DataBankEntry->Parser [label="check trajectory"] Parser->lipid1 [label="get lipids"] Parser->lipid2 [label="get lipids"] lipid1->Topology lipid2->Topology lipid1->Concatenator lipid2->Concatenator Topology->Concatenator [label="Merging"] Concatenator->conctrj1 Concatenator->conctrj2 conctrj1->PCA conctrj2->PCA PCA->acordata1->TimeEstimator PCA->acordata2->TimeEstimator } ```