# Identification utility functions Some functions which might come in handy to semi-automate manual MS2 identification analysis. All scripts and a template metaData file are available [here](https://chalmersuniversity.box.com/s/g27u372gxhm0qoe64mrxs3f696bcvdpk) ## calcAdductMass() Calculates the exact mass of a range of adducts for masses of interest (MoI) supplied by the user in the form of chemical formulas. The user supplies a '.csv' document containing columns pertaining to the adducts the user is interested in. Only exact masses for adduct columns present in the .csv file will be calculated. **Input** * metaData - dataframe with following columns * $ChemForm: Contains chemical formula for compound as a string. Mandatory inclusion. * $PosH: [M+H]+ * $PosNa: [M+Na]+ * $PosK: [M+K]+ * $PosNH4: [M+NH4]+ * $PosMeOH: [M+CH3OH+H]+ * $NegH: [M-H]- * $NegFAH: [M+FA-H]- * $NegNaH2: [M+Na-2H]- * $NegKH2: [M+K-2H]- * $NegH3O: [M-H2O-H]- * $Unknown: When adduct is unknown and user wants to find MS2 matches * saveFile - character with name of output file if want to save **Output** * metaData - returns a dataframe containing the calculated adduct masses **Example** ``` #Loading metaData file containing at least one column with name corresponding to adduct metaLoc<-file.choose() metaData<-read.csv(metaLoc) metaData<-calcAdductMass(metaData, "Results.csv") ``` ## matchMS() The purpose of matchMS() is to match exact masses of known compounds in MS1 mzML-files with MS2 spectras in .mgf files. For example, when having run a set of chemical standards matchMS() is useful to extract the relevant MS2 information from .mgf files that contain them. It takes a dataframe containing a column of masses of interest (MoI), a column of MS1 ('.mzML') file names and a column of MS2 ('.mgf') file names in which the MoI appears and finds peak+MS2 spectra matches. The MS1 files are processed using XCMS and MS2 files are read into RAM memory. Column containing MS1 file names has to be called "MS1File" and column containing MS2 file names need to be called "MS2File". The matches are made by: 1) Checking if there are peaks in the MS1 file which have masses within a dPPM range specified by user 2) Checking if there are spectra in the MS2 file which have precursor masses within a dPPM range specified by user 3) Checking if the MS1 peak and the MS2 precursor masses are within 2*dPPM range specified by user 4) Checking if the MS2 spectra are within a RT window specified by user **Input** * dPPM - Optional double variable containing the PPM window within which matches are made. Default = 5. * rtWindow - Optional double variable containing the RT window in seconds within which matches are made. Default = 30. * peakInfo - Optional XCMS chromPeaks-object in which to look for the peaks. If not supplied an MS1File column in the metaData dataframe becomes mandatory. * metaData - Mandatory dataframe with at least one column corresponding to one of the adducts mentioned in "calcAdductMass()" and MS1 & MS2 file names in which the adduct mass might appear. Should have following columns * $'Insert any adduct name here' - Mandatory column of doubles. Mandatory to match at least one adduct name * $MS1File - Optional column of characters. Each MoI should correspond to a '.mzML' file in which the MoI is searched for. If a XCMS chromPeak object has been supplied through the "peakInfo" argument no MS1Files need to be specified. * MS2File - Mandatory column of characters. Each MoI should correspond to a '.mgf' file in which the MoI is searched for. * adduct - Mandatory character variable corresponding to one of the adduct columns in the metaData dataframe (see 'calcAdductMass()' documentation) * filePath - Optional character variable containing the file path where .mgf (and potentially .mzML) files to be analyzed are stored. If left empty user will be asked to choose a folder **Output** * matchMSObj - A list containing two lists: 1) A list of MS1 data containing m/z and RT for all peaks which were succesfully matched with MS2 spectra 2) A list of MS2 mgf-objects, each containing a spectra and precursor m/z, RT, intensity and more, for all spectra which were succesfully matched with MS1 peaks **Example** ``` ##Either read a file already containing masses into memory fileName<-file.choose() metaData<-read.csv(fileName) ##Or use a metaData dataframe outputted by calcAdductMass() msMatchObj<-matchMS(metaData, dPPM=10, rtWindow=60) ``` **Template file** ## matchMSmulti() Wrapper function for 'matchMS()' which allows the user to automatically look through several columns of masses per MS1 & MS2 file combination (alternatively chromePeak & MS2 file combination). Used to search for several different adduct masses of the same compound in MS1 and MS2 combinations. **Input** * dPPM, rtWindow, peakInfo, metaData, adduct & filePath - See "matchMS()" description **Output** * matchMSmultiObject - A list containing multiple matchMS objects, one for each adduct investigated **Example** ``` msMatchObj<-matchMS(metaData, dPPM=10, rtWindow=60, filePath="C:/Folder/FolderContainingMSFiles/") ``` ## genReport() Takes a matchMSObj object, extracts the MS1 and MS2 information, formats it and outputs MS2 spectra in separate worksheets corresponding to all the features which were matched to spectra ###Work in progress### ## Full msMatch() workflow example ``` source(file = 'readMGF.R') source(file = 'matchMS.R') source(file = 'genReport.R') source(file = 'calcAdductMass.R') #Loading metaData metaLoc<-file.choose() metaData<-read.csv(metaLoc) #Calculating adduct exact mass from all chemical formulas metaData<-getChemForm(metaData, "Output.csv") write.csv(metaData,"metaDataChemFormCalc.csv") matchMSObj<-matchMS(dPPM=10, metaData=metaData, rtDiff=15) combList<-genReport(matchMSObj, metaData) ``` ## Full msMatchUnk() workflow example ``` source(file = 'matchMSUnk.R') source(file = 'genReportUnk.R') #Loading metaData metaLoc<-file.choose() metaData<-read.csv(metaLoc) #Matching masses of interest to MS2s matchMSUnkObj<-matchMSUnk(dPPM=10, metaData=metaData, rtWindow=30, fileLoc=fileLoc) #Creates a report and formats the MS2 spectra into a more managable format combListUnk<-genReportUnk(matchMSUnkObj,metaData) ```