Chapter 7 - HackMD

# Chapter 7 ## [Chapter 7 Testing and Experiment]{.center}{.page_break_before} ### 7.1 Testing and Experiment Scope In order to test our project, we will use classic machine learning metrics to evaluate the performance of our model on test datasets including F1 Score, precision, and recall. These metrics will help determine how well our model performs. BioBERT provides datasets for Relation Extraction that we will use to evaluate our model. We will also evaluate the user interface of our Gene Knowledge Graph by assessing the time it takes to display results, the relevancy of the results, and how readable to graph is. ### 7.2 Testing and Experiment Approach Since we will not be able to evaluate the model on the unlabeled PubMed data, we will use a semi-supervised model that uses both labeled and unlabeled data to evaluate the model. The metric used to assess the quality of the model is the F1 score, generated by using the precision and recall values. $$F1 Score = \frac{Precision* Recall}{(Precison + Recall)}$${#eq:f1-equation} Precision is calculated by dividing the true positives by anything that was predicted as positive. A true positive is a positive predicted value with a positive actual value. A false positive is a value that was predicted as positive but with a negative actual value. $$Precision = \frac{True Positive}{(True Positive + False Positive)}$${#eq:precision-equation} Recall is calculated by dividing the true positives by anything that should have been predicted as positive. A false negative is a negative predicted value with a positive actual value. $$Recall = \frac{True Positive}{(True Positive + False Negative)}$${#eq:recall-equation} The closer the F1 score is to 1, the more accurate the result. The model that has the highest F1 score will be used in the project for generating the knowledge graph. In order to evaluate the Gene Knowledge Graph, we will observe the clarity of the graph, the time it takes for the graph to display the results for a specified gene, and the relevancy of the results. We will select 10 random genes to observe these graph characteristics. We will measure in seconds the time it takes to display the results. For graph clarity, we will provide an analysis of whether the nodes and edges were easy to interpret. For relevancy, we will choose one relationship of the graph to fact-check for accuracy. ### 7.3 Testing and Experiment Results and Analysis [Describe testing and experiment results and analysis. For example, test execution and test result summary, performance test result analysis, test coverage, bug distribution report, and so on. This section must include textual description accompanied with figures and/or tables.] Here we will analyze #### Relation Extraction Model Results | Model | F1 Score | Precision | Recall | |-------|----------|-----------|--------| | | | | | #### Knowledge Graph Results | Gene | Result relevancy | Time to Display (sec) | Graph Clarity | |------|----------------------------------------|-----------------------|------------------------------------------------------------------------| | | The results were relevant/not relevant | xx:xx | The graph displayed clear nodes and edges displaying the relationships |