# Landmarks project
The aim of the landmarks project is to determine the quality of a newly analyzed sample based on historical data and the information for samples which are currently being analyzed. To do this so-called "landmarks" (LM), features which are omnipresent in blood sample data, will be used to determine whether the most recent sample is OK or if it should be discarded and the whole analysis be interrupted.
## Workflow
*Pre-QC package operations*
* Instrument creates new files during analysis
* R-script monitors the folder where files are created
* Checking if file is finalized (is it still growing in size?)
* If finalized, will extract metadata from the file and input it into dataframe (DF)
Files need to conform to naming strategy:
* "PP-QQ-RR-SS-XX-YY-ZZ.mzML"
* PP: Date (e.g. 2021-04-09)
* QQ: Batch, in the form of BNWNN (e.g. B1W42: Batch 1 week 42)
* RR: Chromatography (i.e RP or HILIC)
* SS: Polarity (i.e. POS or NEG)
* XX: Sample name
* YY: Injection number
* .mzML: File ending
*QC package starts here*
* checkLM() compares latest file LMs against DB of historical LMs and previous samples in the same injection sequence. Will give a report on sample quality and send a notification to a slack-channel.
### buildDB()
**Function overview**
The purpose of the function is to build an empty database containing all the structures needed to work with the code in this package. Useful during testing of code but should not be used when workflow is fully established, then a designated DB file will be used.
* **Input**
* dbName - character - Path to DB file (.db) to be used for the analysis
* **Step-by-step procedure of function**
* Sets up empty data frames corresponding to structure of the DB
* Creates a .db file with the name supplied by user
* Sets up a number of views which are used by various functions to query the DB
* **Output**
* A .db file with the structure needed to interact with code
### checkLM()
**Function overview**
The purpose of the function is to compare the latest sample LMs against a DB of LMs and the LMs from samples in the current analysis sequence. This is done using a set of statistical tests of the properties of the LMs. Is integrated into a monitor function which keeps track of any new file in a folder and sends it to checkLM().
* **Input**
* "filePath" - character - The path to the sample to be analyzed
* "dbName" - character - The name of the DB to be used
* "instrument" - character - The type of instrument used to analyze the data
* "projectID" - character - The name of the project the samples belong to
* "sampMatrix" - character - The sample matrix of the current injection
* "dPPM" - integer - The dPPM window for matching LMs in sample to LMs in the DB
* "rtWin" - integer - The RT window (in second) for matching LMs in sample to LMs in the DB
* "alpha" - double - The level of significance to use for statistical tests
* "no_check" - character vector - Contains strings which corresponds to file which are not to be quality controlled
* "cwp" - CentWaveParam object (XCMS) - Setting all parameters for peak picking in the sample to be analyzed
* "reportPath" - character - The path to where reports will be saved
* "reportName" - character - Name to be used for report. Should include ".csv" file-ending
* "firstSamp" - boolean - Stating if this is the first sample in which case the sample will not be analyzed, but still added to the DB
* **Step-by-step procedure of function**
* Check if sample-type is to be quality checked ('sQC' = QC samples in sequence; 'samples' = previous samples in sequence)
* Determine what type of previous samples to collect from DB (e.g. 'sQC' or 'sample')
* Reading file and processing it with XCMS to interrogate the feature data
* Querying DB to find the matching LM based on mz and RT
* Collects LM data from current sample and DB and formats it for statistical testing
* Performs statistical tests in current sample vs DB data
* t.test of intensity
* t.test of RT
* pnorm of number of LMs
* pnorm of IPO score
* pnorm of peakNumber
* Checking significance of all tests
* If no sig. outcome no action is taken
* If sig. outcome(s), all the test results are printed in an abnormality report
* **Output**
* Writes, and updates, a .xlsx 'Report' containing all the statistical analysis information gathered. Contains all the sample analysis carried out during the current analysis.
### findLM()
**Function overview**
The purpose of the function is to find landmarks in a given XCMS object and to return the meta data pertaining to those landmarks. Used to build up the database you later use to compare your new samples with.
* **Input**
* "mode" - character - The ionization mode of the analysis
* "XCMSObj" - XCMS object - The XCMS object created from XCMS peak picking
* "time_interval" - double - The time intervals within which findLM() will incrementally look for landmarks
* "mzdif" - double - The allowed difference in m/z between different features, depends on instrument accuracy
* "rtdif" - double - The allowed difference in RT between features, depends on chromatography of analysis
* **Step-by-step procedure of function**
* Extract m/z and RT information from XCMS object
* Checks to see if any features are present, if they are they will be stored in a matrix
* Formats the information collected, creates a plot with x=RT and y=m/z
* Returns the formatted information
* **Output**
* Returns a matrix containing all the features extracted from the XCMS object
### getFiles (obsolete)
***getFiles() dependency is now removed, information below not used anymore!
**Function overview**
The purpose of the function is to list all .mzML files in a folder and to collect meta data from them.
* **Input**
* path - character - Path to folder where files which are to be checked are present
* pattern - character - File ending of files to be checked (e.g. ".mzML")
* filesize - boolean - Checks if user wants filesize of the files to be reported
* **Step-by-step procedure of function**
* Will prompt user for a file path if no path supplied
* Lists all the files and extracts information from the file names
* Checks if file name strategy has been fulfilled and aborts if not
* Formats information and builds a matrix to return to user
* Returns the matrix containing meta data
* **Output**
* Returns a matrix containing all meta data that was gathered from the file name of all the files to be analyzed
### Code example ###
```
#Building an empty .db file
buildDB(dbName="NameOfDB.db")
#Reading RP & RN landmarks from .csv files (RP = Reversed phase Positive, RN = RP Negative)
LM.RPs<-CSVtoLM()
submitLMToDB(dbName="NameOfDB.db", LMsToSub=LM.RPs)
LM.RNs<-CSVtoLM()
submitLMToDB(dbName="NameOfDB.db", LMsToSub=LM.RNs)
#Submit a batch of files to the DB
submitBatchToDB(dir=choose.dir(), projectID="MedGICarb", matrix="Blood", dbName="NameOfDB.db")
#After setup, run checkLM with a single file
checkLM(filePath="2020-11-25_B12W46_RP_POS_sQC_012.mzML", dbName="NameOfDB.db", projectID="MedGICarb", sampMatrix="Blood", firstSamp=FALSE)
```