owned this note
owned this note
Published
Linked with GitHub
22 Nov. 2023
# [MS search](https://ms-search.us) - tutorial
# 1. Introduction
MS search provides an online spectral database, enabling searches within the database using LC-MS/MS data. The website is optimized for search efficiency, allowing spectrum matching at a speed of seconds within a database of millions of entries.
In addition, we offer a batch search feature for spectra. To utilize this function, you need to provide an MS1 peak list (.csv) and an MS2 spectrum file (.mgf). The data preprocessing steps can be found in Section 3: data preparation. We also provide demo data for your testing purposes.
Upon completion of the task, the website will provide a URL for you to download the matching results. We wish you a successful experiment!
# 2. Demo Data
## 2.1 LC-ToF-MS
### 2.1.1 HRMS (LC-ToF-MS) in positive mode with DDA mode
MS1 peak list: [peakList_LCMS_POS](https://drive.google.com/file/d/1CGjf2h4Z91oPRn4nJzwtAw2YuSzXfOZG/view?usp=sharing)
MS2 spectrum file: [ms2Spectrum_LCMS_POS](https://drive.google.com/file/d/17CHI6EMnifdYnIvCTO27ZqYrYn5bk0iI/view?usp=sharing)
### 2.1.2 HRMS (LC-ToF-MS) in negative mode with DDA mode
MS1 peak list: [peakList_LCMS_NEG](https://drive.google.com/file/d/1x3MX94WTjaJmqopevuhJreHRuiIPl6-u/view?usp=sharing)
MS2 spectrum file: [ms2Spectrum_LCMS_NEG](https://drive.google.com/file/d/1KOOKzkggA-UwHvYtEVd07dHZipFSmVYr/view?usp=drive_link)
# 3. Data preparation
## 3.1 Prepare MS1 peak table & MS2 spectrum file
### 3.1.1 Data acquisition
1. The data was acquired by LC-HRMS Sciex TripleToF 6600 in data-dependent mode。
### 3.1.2 XCMS v3.12.0 (XCMS3) + MSconvert
1. The conversion to mzXML format was performed using MSconvert 3.0.20226, and the processed data was subsequently handed over to the R package XCMS (v3.12.0) for data preprocessing.
2. The LC-MS data preprocessing by XCMS actual code is as follows:
```r
#call library
library(xcms)
library(RColorBrewer)
library(pander)
library(magrittr)
library(pheatmap)
library(SummarizedExperiment)
#controll core as 1 core within runtime
if (.Platform$OS.type == "unix") {
register(bpstart(MulticoreParam(1)))
} else {
register(bpstart(SnowParam(1)))
}
#If you want to organize your data into sample groups based on folders like "dda/GroupName1" and "dda/GroupName2"
#Read LC-MS/M file with mzXML formate
MSdata <- paste0("dda/",list.files(path = "fullscan/",pattern = ".mzXML$", recursive = TRUE))
pd <- data.frame(sample_name = sub(basename(MSdata),
pattern = ".mzXML",
replacement = "",
fixed = TRUE),
sample_group = c(rep("gp1", 3), ##("group", repNum.)
rep("gp2", 3)
class=c(rep("gp1", 3),
rep("gp2", 3)
stringsAsFactors
= FALSE)
rawData <- readMSData(MSdata, centroided. = TRUE, mode = "onDisk",
pdata = new("NAnnotatedDataFrame", pd))
#FilterEmptySpectra
rawData <- filterEmptySpectra(rawData)
rawData <- filterRt(rawData, c(0,2400))
#Register single core
if (.Platform$OS.type == "unix") {
register(bpstart(MulticoreParam(1)))
} else {
register(bpstart(SnowParam(1)))
}
#Peak picking
cwp <- CentWaveParam(snthresh = 5, noise = 100, peakwidth = c(5, 30), ppm = 20)
processedData_raw <- findChromPeaks(rawData, param = cwp)
##MergeNeighboringPeaks
mpp <- MergeNeighboringPeaksParam(expandRt = 3, expandMz = 0, ppm=20)
xdata_pp <- refineChromPeaks(processedData_raw,param=mpp)
processedData <- xdata_pp
#Retention time alignment - peakGroups
processedData$sample_type <- "study"
processedData$sample_type[c(1,2,3)] <- "QC"
processedData$sample_type
#PeakGroups-grouping variable
pdp_subs <- PeakDensityParam(sampleGroups = processedData$sample_type,
minFraction = 0.9)
processedData <- groupChromPeaks(processedData, param = pdp_subs)
#Set up QC for RT alignment and configure PeakGroups-subset-alignment options to perform the alignment.
pgp_subs <- PeakGroupsParam(minFraction = 0.9,
subset = which(processedData$sample_type == "QC"),
subsetAdjust = "average", span = 0.2)
processedData <- adjustRtime(processedData, param = pgp_subs)
#Peak grouping
pdp <- PeakDensityParam(sampleGroups = processedData$sample_group,
minFraction = 1, binSize = 0.02)
processedData <- groupChromPeaks(processedData, param = pdp)
#Gap filling (fill missing peaks)
medWidth <- median(chromPeaks(processedData)[, "rtmax"] -
chromPeaks(processedData)[, "rtmin"])
processed_Data <- fillChromPeaks(processedData, param = FillChromPeaksParam(fixedRt = medWidth))
#Export MS2 data to mgf (ms2 spectrum file)
source("https://raw.githubusercontent.com/jorainer/xcms-gnps-tools/master/customFunctions.R")
filteredMs2Spectra <- featureSpectra(processed_Data, return.type = c("MSpectra","list"))
filteredMs2Spectra <- clean(filteredMs2Spectra, all = TRUE)
filteredMs2Spectra <- formatSpectraForGNPS(filteredMs2Spectra)
writeMgfData(filteredMs2Spectra, "ms2spectra_all.mgf")
#Get MS1 data & generate MS1 data table
featuresDef <- featureDefinitions(processed_Data)
featuresIntensities <- featureValues(processed_Data, value = "into")
dataTable <- merge(featuresDef, featuresIntensities, by=0, all=TRUE)
dataTable <- dataTable[, !(names(dataTable) %in% c("peakidx"))]
head(dataTable)
write.table(dataTable, "xcms_all.txt", sep = "\t", quote = FALSE, row.names = FALSE)
```
1. Prepare MS1 peak list
Please follow this formate as bellow:
- **Column 1: Metabolites feature ID, header mast be “M_feature”.**
- **Column 2: precursor ion *m/z***, header mast be “mz”.
- **Column 3:** retention time (sec), header mast be “rt”.

# 4. MS/MS search
## 4.1 Fuzzy search
You can simply enter keywords in the input box for searching. Currently, it supports compound names, molecular formulas, InChI keys, and precursor *m/z*.
Press Enter or click on the search icon to initiate the search.

## 4.2 MS/MS spectrum search
This website provides a spectrum matching function that requires entering precursor m/z and MS2 spectrum for searching. The website also offers default parameters for reference.
1. **Compound Name:** Please enter text; the search supports searching for compounds starting from the beginning. For example, you can search for "tetracycline" using keywords like "tetra" or "tet”.
2. **Formula**: Please enter text and numbers; the search supports searching for formulas starting from the beginning. For example, you can search for "C22H24N2O8" using keywords like "C22H24N2" or "C22H24”.
3. **Exact mass**: Please enter a numeric value for monoisotopic mass (Da). The exact mass, maximum, and minimum values will be calculated based on the MS tolerance.
4. **Precursor *m/z***: Please enter a numeric value for the precursor ion (*m/z*). The maximum and minimum values will be calculated based on the MS tolerance.
5. **MS tolerance**: Please enter a numeric value, with the unit options of ppm or Da.
6. **Charge:** Please choose from All / positive / negative.
7. **MS2 spectrum:** LC-MS/MS spectrum, formate need provide m/z:abundance(%).
8. **MS/MS tolerance:** MS2 spectrum similarity search between experiment and library.
9. **MS/MS similarity tolerance**: Please enter a numeric value between 0 and 1, where 1 represents a full match. The default value is set to 0.5.
10. **Similarity score weight (Forward & Reverse)** : Please enter values between 0 and 1. The sum of the two values must be 1. The default parameters for both values are set to 0.5.
11. **Similarity algorithm**: Selection of spectrum similarity algorithm, currently offering Dot product (DP) and Intensity deviation.

## 4.3 Multiple MS/MS search
### 4.3.1 Data preparation
Currently, this website supports the mgf files exported from XCMS, providing functionality for online matching and batch matching against the database using MS1 peak list file + MS2 spectrum. Upon completion of the task, the website will provide a URL for you to download the matching results.
Please refer to Section 3 Data Preparation or demo data as the data source.
1. Upload files.
2. Input parameters and submit.
3. Go to the Task View page to check the task status.
### 4.3.2 File upload
1. **MS1 peak list** : required, file format must be CSV. Refer to Section 3 for the internal data format specifications.
2. **MS2 spectrum file**: required, file format must be MGF. Refer to Section 3 for the internal data format specifications.
3. **MS2 spectrum data source:** file preprocessing method and data format.
4. **Contact mail**: Notification mail will be sent after the task is completed.
### 4.3.3 Parameter list:
1. **MS1 & MS2 pair parameter**: Please enter a numeric value. This is the parameter for matching MS1 peaks with MS2 spectra, with a maximum value provided up to 100 ppm or sec. The m/z tolerance unit is ppm, and the RT tolerance unit is seconds (sec).
2. **MS/MS search parameter**: Please enter a numeric value for MS1 peak & MS2 spectrum similarity search between experiment and library.
3. **Similarity score weight (Forward & Reverse)** : Please enter values between 0 and 1. The sum of the two values must be 1. The default parameters for both values are set to 0.5.
4. **Similarity algorithm**: Selection of spectrum similarity algorithm, currently offering Dot product (DP) and Intensity deviation.
5. **MS/MS similarity tolerance:** Please enter a numeric value between 0 and 1, where 1 represents a full match. The default value is set to 0.5.