# Thompson Sampling on Searching Engine [TOC] ## Overview From the last article [**Embedding Searching Articles and Corresponding Thompson Sampling**](https://medium.com/@ar851060/embedding-searching-articles-ca27291c2951), I explain how to build a embedding vector seraching engine with OpenAI embedding model. However, in OpenAI, there three types of embedding models: **text-embedding-3-small**, **text-embedding-3-large**, and **ada v2**. Although I can cancel out **text-embedding-3-large** for its bad performance on Mandarin, but I cannot tell the difference between **ada v2** and **text-embedding-3-small** in a short time. Therefore, I decide to perform a Bayesian AB Testing, more precisely, **Thompson Sampling** to decide which embedding model is more suitable in this project. ## Thompson Sampling Since here we only record clicking, in Thompson Sampling, we use bernoulli-beta model. The model looks like: $$ p \sim \text{Beta}(\alpha+x,\beta+(1-x)), $$ where $p$ means the probability, $x\in\{0,1\}$ means click or not, $\alpha$ and $\beta$ are the parameters of beta models. The steps to do experiment by Thompson Sampling: 1. set $Beta(1,1)$ for each embedding model: $Beta_{ada}$ and $Beta_{small}$ 2. when users ask question, each beta model comes out a probability $p_{ada}$ and $p_{small}$, and assign the model whose probability is higher. 3. after sending the searching results back to users, the $\beta$ parameter of the beta model which is corresponding to assigned embedding model plus one. That is, $$\beta_{assigned} = \beta_{assigned} + 1 $$ 4. Once clicking searching results, **Record** function will do two things: $\beta-1$ and $\alpha+1$ The following is the procedure of experiment during the searching results and clicking results. ```mermaid graph LR Search["Search"] Record["Record"] User(("User")) Test[("Testing\nParameters")] Target["Target Url"] User -- Query --> Search -- Searching Results --> User -- Click --> Record -- Redirect --> Target Search -- beta+1 --> Test -- Assign Embedding Models --> Search Record -- beta-1\nalpha+1 --> Test ``` ## Record The Record structure is only for recording the results of the experiment. ```mermaid graph LR User(("User")) Test[("Testing\nParameters")] UserDB[("User\nDatabase")] TA["Target Article"] Result["Article Urls"] subgraph Record UserDB Test end User -- Click --> Result -- Check if Search Again --> UserDB -- Update --> Test Record -- Redirect --> TA ``` ### Clicking When users click the urls of searching results, the urls send http information to Google Cloud Function. ### User Database In GCF, the function checks whether this clicking is right after searching or not. Since we only want to know the results which embedding models made we sent to customers is more suitable, we only record the clicking right after searching. ### Update Parameters and Redirect Once checking the click, then we update the testing parameter. After recording, we redirect that users to the real result article. ## Experiment Results I ended up the experiment on July 3, 2024. The results of parameters from two beta models are | Model/Parameters | $\alpha$ | $\beta$ | success | Number of Trials | | -------- | -------- | -------- | -------- | -------- | | **ada v2** | 63 | 35 | 62 | 96 | | **text-embedding-3-small** | 160 | 80 | 159 | 238 | The following is the mean probability and the corresponding credible interval for each model. The x-axis is the click, and the y-axis is probability. ![credible](https://hackmd.io/_uploads/B1VQjgb_A.png) You can see most of time that **Small** are larger than **Ada** and they became stable after around 300 clicks. You can see the change of beta distributions in the next gif. ![click rate animation](https://hackmd.io/_uploads/rkh-SGG_0.gif) According to the end period of experiment time and the stable of both models, I decided that **text-embedding-3-small** is the embedding model for searching article engine. ## Conclusion This time, I use a different method, **Thompson Sampling**, to do the AB testing. In this method, the adventage is that I do not need to worry about UX during experiment period, since the algorithm automatically increase the probability of the better model. However, in this method, it is hard to do the analysis after experiment. I can not claim that a straghtforward statement with some quantified results, for instance, "A is better than B since the click rate is significantly larger than B by xx%", so if the results of experiment needs to report to others, it might not be a best choice.