# Extra Credit Project - Report 2 Tai Wei Wu (twu474) https://github.com/Crysta1ightning/TokenSmith In this project, I focused on performance and reliability. 1. I implemented model caching, which reduced response latency by about five seconds per request. 2. I introduced a confidence-based fallback that prompts the model to answer “I don’t know” when uncertainty is high. ## Caching * The original setup reloaded the model for every query, adding 2–5 s of delay. I added a global `_LLM_CACHE` so the model loads only once during runtime. * This is a easy change but great impact. ## Confidence * To minimize hallucinations, I aimed to have the model respond only when its retrieval confidence was high. * The original behavior is to output no matter what. This is bad because the model might answer something that is totally wrong. * Since we are not aiming for a general model, only a model that can answer database related problems, it is okay to say "I don't know" if the user asked something that cannot be answered using the chunked textbook. * Since we are already using FAISS and BM25 to rank the chunks, my first thought is to use a fused scoring based on these two. (This is what I did in my first submission). * However, this introduces several problems: * First, it is hard to normalize these values, since these values don't have a range, I would have to normalize them using FAISS (1/(1+dist)) and BM25 (s/(s+k)). * Second, even if normalized, and curved to [0, 1]. The level of confidence sometimes doesn't directly reflect if the question can be answered. * A cross encoder can be used instead. This model would take the question and the chunk, and determine if the chunk can be used to answer the question. It solves the above mentioned problems. * It will output a number that can be sigmoid into a confidence of [0, 1] nicely. * It directly reflects whether the question can be answered. * Before ![image](https://hackmd.io/_uploads/SyMmnPol-g.png) * After ![image](https://hackmd.io/_uploads/HJQNhvilZe.png) * Examples: * Without cross encoder, know how to answer. ![image](https://hackmd.io/_uploads/H1IX4xOx-e.png) * With cross encoder, you can see the ranking is different. ![image](https://hackmd.io/_uploads/rkAEWgOxZx.png) * Without cross encoder, don't know how to answer. ![image](https://hackmd.io/_uploads/Skk24xOl-e.png) * with cross encoder, answer "I don't know". ![image](https://hackmd.io/_uploads/HyQ_-g_lWl.png)