Final report - HackMD

## Abstract 林楷鈞秦文峰 Medical imaging plays an important role in both medical practice and research trials, supports the diagnosis and treatment of various medical conditions. However, the process of producing imaging reports can be time-consuming and likely occurs error, especially for those radiologists who lacks experience . Therefore, automatic generation of radiology reports are introduced to solve these problem, and the essential step is to integrate artificial intelligence into the medical field. The R2gen model, was proposed to generate radiology reports automatically. And throughout experiment it achieved a great performance for producing accurate image reports. In this project , we aimed to optimize the R2gen model on its ends. We utilize unsharp masking and highboost filtering to preprocess input image before feeding in the R2gen model , then uses text embedding to recontruct the output report from R2gen. By conducting two experiments on the IU_xray and Mimic-CXR dataset, we achieved higher scores on NLP metrics such as BLEU-3, BLEU-4, ROUGE, and CIDEr. ## Introduction 林紹恩 Radiology report generation, which aims to automatically generate a free-text description for a clinical radiograph (e.g., chest X-ray), has emerged as a prominent attractive research direction in both artificial intelligence and clinical medicine. It can greatly expedite the automation of workflows and improve the quality and standardization of health care. Recently, there are many method proposed in this area, including the retrieval-based method, and memory-driven Transformer method. Although promising result may be obtained by both, they still either is limited by the preparation of large databases, or found inaccuracy contents in the generated report. In the report, we propose to refine the generated report from the memory-driven Transformer model via unsharp masking and highboost filtering (UMHF) and text embedding model. In detail, the unsharp masking and highboost filtering (UMHF) is proposed to add a mask, which is made from the differnce between the blurred image and the original one. As the result, the processed image has higher contrast and reveal more information. The text embedding model is proposed to compute the similiarity between report's sentences and reference sentences. As the result, the most similar sentences will replace the inaccuracy contents. Experimental results on two benchmark datasets confirm the validity and effectiveness of our approach, where UNHF and text embedding model achieve better performance in certain evalutaion. To summarize, the contributions of this paper are three-fold: * We propose to generate radiology reports with images processed by UMHF via memory-driven Transformer model. * We propose to refine radiology reports via text embedding model with reports generated by memory-driven Transformer model. * Extensive experiments are performed and the results show that our proposed model outperform the origin model in certain evaluation. ## Methods 林楷鈞秦文峰 ### Unsharp masking and highboost filtering (already in https://drive.google.com/drive/folders/1ZPEzKEd93-2zzSmq_jQ5AEa99q-zPOk-) ### R2Gen After done preprocessing with our input image using unsharp masking and highboost filtering, we feed these image into the R2gen model to produce image reports. (Basically , R2gen relies on a memory-driven Transformer as its framework for producing radiology reports. In this approach, a specialized relational memory is created to capture essential details during the report generation. Furthermore, a memory-driven conditional layer normalization is implemented to integrate this memory into the decoding phase of the Transformer.) ### Text Embedding By using the text embedding GPT model from OpenAi , we reconstruct the report sentences generated from R2gen. For each report sentence in the image report, we go on with five steps : 1. Create reference sentence list (split input data set by “.” period) * We at first obtain the training datas which contains a set of sentences for radio image reports. We then split each sentence by periods and collect them into a reference sentence list. 2. Add report sentence to list, and input to the embedding model * Now , we append our report sentence to the reference sentence list , and we feed the list into the text embedding model. And obtain the embedding vectors. 3. Get the return embedding vectors of the model, pop the embedding vector of the report out the list * By obtaining the embedding vectors of the model , we extract the last element of the embedding model , which is the embedded report sentence. 4. Calculate dot product between the report’s embedding vector and each reference sentence’s embedding vector in the list * We now use the report sentence's embedding vector , and calculate dot product with each of the other embedding vectors in the reference sentence list. 5. Return the reference sentence which have the largest dot product(biggest similarity) * Lastly, we find the maximum reference sentence that have the biggest dot product with the report sentence, which holds biggest similarity with the report sentence. We change the replace the chosen reference sentence with the report sentence. ![Text_embedding](https://hackmd.io/_uploads/BJcJjmWtT.png) ## Results 楊琮熹 ![image](https://hackmd.io/_uploads/BJAVUrbtp.png) ![image](https://hackmd.io/_uploads/SJN7IBZt6.png) To illustrate the effectiveness of our proposed method, we conducted two experiments on the IU_xray dataset, namely R2Gen + Embedding (RE) and UMHF + R2Gen (UR). The results are reported in **Figure 1**. There are several observations. Firstly, in terms of Natural Language Generation (NLG) metrics, RE does not achieve superior scores compared to existing studies. One significant reason is that when running the Embedding model, OpenAI imposes restrictions on input size, resulting in insufficient variability in the output and consequently leading to lower NLG scores. However, it will be mentioned later that RE can be used to improve the potential shortcomings in existing models. Secondly, UR obtains higher scores on BLEU-3, BLEU-4, ROUGE, and CIDEr metrics, which are widely used in evaluating machine translation or similar tasks. Therefore, UR's superior performance in these aspects demonstrates its effectiveness. It confirms that preprocessing images with UMHF, highlighting edges and details in the image, and sharpening the image through high-pass filtering are effective processes. To further investigate the effectiveness of our model, we performed qualitative analysis on some cases, including their ground truth data and generated reports from different models. **Figure 2** shows an example from IU_xray with a frontal chest X-ray image, and corresponding reports, where colors on the text indicate areas with errors. In this example, it can be observed that using RE can correct sentences with errors, repetitions, or even incomplete sentences. ## Discussion/Conclusion 許祐銓 Discussion: The implementation of the Embedding Method has demonstrated tangible enhancements in the generation of R2Gen reports. Upon thorough preprocessing of the original reports, it became evident that the integration of the Embedding Method led to significant improvements in the content of the sentences. However, an intriguing finding emerged during this process - despite the evident enhancement in sentence quality, the new reference sentence list surprisingly exhibited a detrimental effect on the overall accuracy of the scores. Conclusion: In the aftermath of our preprocessing efforts on the original reports, it is noteworthy that the quality of the sentences has been noticeably elevated. Nevertheless, the unexpected discovery of a negative impact on the accuracy of the scores stemming from the revised reference sentence list prompts further investigation into the intricacies of the Embedding Method's influence on the R2Gen reports. ## Data and Code Availability 鄭敦謙 https://hackmd.io/Iu4XNch1S6KaRQyRqr-lhg?view https://github.com/JDChian/R2Gen/tree/main?tab=readme-ov-file ## Author Contribution Statements > [Example 2] > P.H (30%): study design, programming of the artificial neural network, data collection, data analysis, figure design and writing. > M.B (30%): data analysis, statistical interpretation, figure design and writing. > M.C (20%): data collection, data interpretation and image processing. > L.S (10%): data analysis, final presentation. > K.M (10%): study design. > Research & Developer: find new methods to fix the issue > Programmer: coding > Tester: do evaluation, finding bugs through testing, good at thinking of edge cases > Presenter: packed your team’s work into a good story for marketing * R&D Programmer tester presenter * 秦文峰 (presenter: 12%) (R&D: 7.5%) (Programmer: 8%) * (R&D) research Unsharp Masking and Highboost Filtering methods * (Programmer, tester) implement OpenCXR, histogram equalization and Unsharp Masking and Highboost Filtering methods * (presenter) present Unsharp Masking and Highboost Filtering in final presentation * (presenter) write Unsharp Masking and Highboost Filtering and reference in final report * 鄭敦謙 (37%): study design, programming, test our models, final presentation. * 林楷鈞(presenter:12%)(R&D 35%)(Programmer:26%)(Contribute :20.7%):study design, programming, final presentation * 林紹恩 (presenter: 40%)(R&D: 7.5%)(Programmer: 8%) (Contribute: 13.875%) * data preprocessing, final and proposal presentation, study design. * 許祐銓 (presenter: 12%)(R&D: 7.5%) * 楊琮熹 (17.05%) final presentation, data collection, figure design and writing, data analysis ## References