Recommender evaluation

## How to evaluate a recommendation system? > Lie Cheater owns an online store > One day, a customer bought a dried Mango, and his system recommended the buyer a Toad. How to assess the recommendation result? ### Cumulative Gain, CG Given a query, assign relevant scores to each item based on similarity: | Items | Brazilian eggs | dread | music | toad | ballot | future | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | rel | 1 | 4 | 2 | 3 | 5 | 0 | * item: goods for sell * rel: relevance (represented by score) $CG_{\mathrm{p}} = \sum_{i=1}^{p} rel_i$ * $CG_{\mathrm{p}}$: Cumulative Gain * p: number of recommended items * $rel_i$: $i^{th}$ item's relevant score Nothing fancy, just sum them up --- Notice: **CG** only grades items, having nothing to do with recommendation results Therefore, a rank-related score is required ### Discounted Cumulative Gain, DCG _Idea: the most revelant item ought to be put at first place_ Given a descending recommendation result, discounting each item's relevant score according to its position I.e. > Rank: top -> last > discount: least -> most $\text{DCG}_{\mathrm{p}} = \sum_{i=1}^{p} \frac{\text{rel}_i}{\log_2(i+1)} = \text{rel}_1 + \sum_{i=2}^{p} \frac{\text{rel}_i}{\log_2(i+1)}$ or $\text{DCG}_{\mathrm{p}} = \sum_{i=1}^{p} \frac{2^{\text{rel}_i}-1}{\log_2(i+1)}$ * $\text{DCG}_{\mathrm{p}}$: Discounted Cumulative Gain * p: number of recommended items * $rel_i$: $i^{th}$ item's relevant score Looks nice, doesn't it? --- Nonetheless, recommendation results under different conditions may vary in length **Longer list brings about larger DCG, even without a necessary better output** => Normalisation can help! ### Normalised Discounted Cumulative Gain, NDCG 1. To perform normalisation, we shall define a baseline beforehand #### Ideal Discounted Cumulative Gain, IDCG _ad hoc_ perfect rank (maximum achievable DCG) at fixed length $\text{IDCG}_{\mathrm{p}} = \sum_{i=1}^{p} \frac{\text{rel}_i}{\log_2(i+1)}$ or $\text{IDCG}_{\mathrm{p}} = \sum_{i=1}^{p} \frac{2^{\text{rel}_i}-1}{\log_2(i+1)}$ * $\text{IDCG}_{\mathrm{p}}$: Ideal Discounted Cumulative Gain * p: number of recommended items * $rel_i$: $i^{th}$ item's relevant score Also, only retrieving a constant number of results across all recommendation 2. Divide the net model gain by baseline -> normalisation Namely, how far the model-predicted rank deviates from ideal (perfect) one $\text{NDCG}_{\mathrm{p}} = \frac{\text{DCG}_p}{\text{IDCG}_p}$ * $\text{NDCG}_{\mathrm{p}}$: Normalised Discounted Cumulative Gain * $\text{DCG}_{\mathrm{p}}$: Discounted Cumulative Gain * $\text{IDCG}_{\mathrm{p}}$: Ideal Discounted Cumulative Gain * p: number of recommended items * $rel_i$: $i^{th}$ item's relevant score reference: * [redis blog](https://redis.io/blog/evaluating-information-retrieval-with-ndcgk-redis/) * [On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top-n Recommendation](https://arxiv.org/html/2307.15053v3) * [wikipedia](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) * [Ithome](https://ithelp.ithome.com.tw/articles/10299050)