2. XCMS RT Alignment

--- tags: SOP, WIP --- # 2. XCMS RT Alignment ## 2.1 Introduction: ### 2.1.1 Objective Chromatography tends to be unstable and the time at which analytes elute can vary between samples. Therefore, the retention time (RT) Alignment step (also known as RT correction), aims to adjust this by shifting signals along the retention time axis to align the signals between different samples within an experiment. For a visual representation of RT alignment, see below: * Base peak chromatogram before RT alignment: ![](https://i.imgur.com/jJ4VUJW.png) * Base peak chromatogram after RT alignment: ![](https://i.imgur.com/Y85hSkV.png) Original source: [Bioconductor Tutorial: LCMS data preprocessing and analysis with xcms](https://https://www.bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html#3_Initial_data_inspection) ### 2.1.2 Data requirements To perform this step you require the xcms object obtained from the xcms peak picking step (See [SOP 1.XCMS Peak Picking](https://hackmd.io/HLzixbGbQ3ylmUF3ZECwrA)). Usually, the results from peak picking are saved in an .rds file. If you are performing the xcms steps in different R scripts load the .rds file saved from peakpicking. `xdata <- readRDS("name_of_your_file.rds")` ## 2.2 Central Functions: The method to perform the alignment/RT correction in xcms is `adjustRtime` which is a function that: * Calculates adjusted retention times for each spectrum * Adjust the reported retention times of the identified chromatographic peaks. `adjustRtime`can use two different alignments algorithms. Both of the xcms correction algorithms are based on finding anchors or hook groups. Anchors or hook groups can be defined as reference features that are similar and should be present in all samples. Retention time (RT) correction is based on these anchors. The two alignment algorithms that be used are: * **PeakGroups:** * Performs RT alignment using local regression * **Obiwarp:** * Performs RT alignment using warp curve * Finds anchor groups using obiwarp. ## 2.3 Correction Methods There are two methods to correct RT: * **Full data set alignment:** * Perform the alignment and find anchors for the full data set. * **Subset-based alignment:** * Perform the alignment based only on a subset of samples (such as, sQCs) and use these to adjust the full data set. More infromation can be found at [https://www.bioconductor.org/packages/release/bioc/vignettes/xcms/inst/doc/xcms.html](https://) ## 2.4 Key Parameters- RT alignment using PeakGroups: Peakgroups is a good option if your instrument did not behave well throughout the analysis of the different batches. For this correction method you use the `PeakDensityParam` function that requires the following input: * `minfrac value:` * It should be set high. In this example, it is set to 0.95 , which means metabolites must be present in 95% of the samples. * `bw:` * The bandwidth should be optimized depending on your data set. Tip: Try manually testing several bw values and look at the alignment plots. * `binSize:` * Size of the m/z slices to find anchors With these settings, 'PeakDensityParam' finds peaks that are present in 95% of the samples within each sample_group. When so called hook groups are found, the actual retention time adjustment is performed with 'adjustRtime'. Additional parameters in the function adjustRtime * `smooth:` * LOESS= Fits a LOESS curve between retention times. Works on the peaktable (therefore fast) * `span:` * Degree of smooting from local polynomial regression fitting. 0= overfitting, 1= only captures very general trends. 0.4-0.7 is typically a good range. You might need to increase the span if for example you have regions with few peaks. This is a sign that the fit has overfitted heavily. ``` pdp_pregroup <- PeakDensityParam(sampleGroups = xdata$sample_group, minFraction = 0.95, bw = 3, binSize = 0.02) xdata <- groupChromPeaks(xdata, param = pdp_pregroup) pgp <- PeakGroupsParam(minFraction = 0.95, smooth = "LOESS", span = 0.4, family = "gaussian") xdata <- adjustRtime(xdata, param = pgp) ``` ## 2.5 Key Parameters- RT alignment using Obiwarp: For this correction method you use the `ObiwarpParam` function that requires the following input: * `binSize:` * Size of the m/z slices to find anchors * `Response:` * Sets the proportion of anchors that will be used. It will range between 1-20, lower mean less drift * `gapInit:` * Cost to start to deviate from the optimal solution (diagonal). * `gapInit:` * Cost to deviate even further from the from the optimal solution (diagonal). These parameters are optimized with IPO. As seen in SOP 1 IPO is optimizing some parameters for peak picking, but it also optimize parameters for retention time with obiwarp (not for the hook group method). In the code below peak picking has been performed on a subset of 5 random sQC from each batch. Parameters with a parenthesis and 2 values, (x,y) are optimized. One value means it is a set value. Thereafter, the output from peak picking is used for the setting of parameter for obiwarp. In the output of the code, the estimated best parameters for obiwarp will be presented. For the whole code, see data sorce 1.2. For more information about peak picking with IPO, see [SOP 1](https://hackmd.io/HLzixbGbQ3ylmUF3ZECwrA) ``` ## ----optimize_retcor_group, fig.height=7, fig.width=7, warning = FALSE-------- retcorGroupParameters <- getDefaultRetGroupStartingParams() retcorGroupParameters$minfrac <- 0.8 #a set value, minimum amount of samples peaks have to be found in to be considered a feature retcorGroupParameters$gapInit <- c(0.2, 0.7) retcorGroupParameters$gapExtend <- c(2,3) retcorGroupParameters$response <- c(1,10) retcorGroupParameters$profStep <- c(0.8,1.1) #In xcms3 XX retcorGroupParameters$mzwid <- c(0.015, 0.035) time.RetGroup <- system.time({ # measuring time resultRetcorGroup <- optimizeRetGroup(xset = optimizedXcmsSetObject, #object after peak picking params = retcorGroupParameters, #parameters nSlaves = nCore, subdir = NULL, plot = TRUE) }) ``` If your chromatography is good, then it is going to cost a lot to make RT adjustments. Therefore the value for `response` is going to be low, but the value for `gapInit` and `gapInit` is going to be high. On the other hand, if your chromatography is bad, it is going to cost less to make RT adjustments.Therefore the value for `response` is going to be higher, but the value for `gapInit` and `gapInit` must be low. Here is an example on how you will perfrom RT alignment using obiwarp: ``` pgp <- ObiwarpParam(binSize=0.9425, response=9.73, gapInit= 0.46, gapExtend=2.5) xdata <- adjustRtime(xdata, param = pgp) saveRDS(xdata, "name_of_your_file.rds") ``` For more information on how to use the ObiwarpParam funtion. Please see: [Bioconductor:adjustRtime-obiwarp ](https://rdrr.io/bioc/xcms/man/adjustRtime-obiwarp.html) ### 2.6 Output: Alignment plot To evaluate the quality of the alignment, plot the difference between the adjusted and raw retention times along the retention time axis. This plot can help you to evaluate the quality of the alignment. Observe if there are any systematic trends in time deviations. If the differences between adjusted and raw retention times are too large, then it is probable that either your alignment or your samples performed poorly. To create an alignment plot use `plotAdjustedRtime` function. Here is an example: ``` png("name_of_your_file.png") plotAdjustedRtime(xdata) dev.off() ``` The output obtained will be a plot similar to this: ![](https://i.imgur.com/unPp6O4.jpg) In the plot above, each curve (line) represents one sample and the anchors (hook) groups can be observed as dots . On the y axis the difference in retention time of raw vs adjusted retention time is presented. In addition, for multibatch experiments, it is possible to color the alignment plots according to each batch. This can be useful to identify outliers, for example. ``` #Load libraries------------------- library(RColorBrewer) library(scales) #Define Colors------------------- group_colors= paste0(hue_pal()(15)) #Loads 15 different color from hue_pal palette names(group_colors)= paste0("B",c(1:15)) #Assign a different color to each batch #Alignment --------------------------- pdp_pregroup <- PeakDensityParam(sampleGroups = xdata$sample_group, minFraction = 0.95, bw = 1.5, binSize = 0.02) xdata <- groupChromPeaks(xdata, param = pdp_pregroup) pgp <- PeakGroupsParam(minFraction = 0.95, smooth = "loess", span = 0.4, family = "gaussian") xdata <- adjustRtime(xdata, param = pgp) xdata@phenoData@data$batch<-sub(".*(B\\d+).*", "\\1", basename(xdata@phenoData@data$sample_name)) #Adds information of batch to xdata save(xdata, file= "name_of_your_file.rda")) #Create Alignment Plot-------------- jpeg("name_of_your_file.jpg")) plotAdjustedRtime(xdata, col=group_colors[xdata$batch], lwd = 2 #line width of curve ) dev.off() ``` ![](https://i.imgur.com/6v9BVn1.jpg) Once this is done, you have a retention time adjusted data set ready for correspondance and peak filling, which are gone through in SOP 3.