[...]DeepFM - HackMD

[...]DeepFM === > [name=ida][time=20220820] > - 再來呢，第一篇文中提到稀疏矩陣的救星，就天真的以為它真的可以來救救我，沒想到⋯⋯ > - 因為呢，瀏覽器分頁開太多，這樣會形成前進的阻止，拖累我們向前走的速度，被困在分頁海裡面，前進不了 > - 雖然現在 chrome 已經可以把分頁分組了，但還是覺得很累，資訊量太大了消化不良，所以打算把它們全都關關掉！ > - 但直接關掉嗎？又覺得不太對，把重點節錄好了，這樣比較不會那麼良心不安（你是有良心逆齁？） > - 我們看那麼多科普的 model based, memory based 推薦法，但在看 DeepFM 又好像來到另一個平行世界，到底發生什麼事？ [toc] --- ### [在 FM 之前，要先理解 MF] #### MF(矩陣分解) 轉 FM！《推荐系统学习笔记-4 FM,FFM,Wide & Deep,DeepFM》 [[src-知乎]](https://zhuanlan.zhihu.com/p/268776484) - 雖然裡面羅列了 FM、FFM、Wide、Deep、DeepFM - 但我現下最想記下來的還是 MF（咦？ - MF（Matrix Factorization，矩陣分解）模型是個在推薦系統領域裡資格很深的老前輩協同過濾模型了。核心思想是通過兩個低維小矩陣（一個代表用戶embedding矩陣，一個代表物品embedding矩陣）的乘積計算，來模擬真實用戶點擊或評分產生的大的協同信息稀疏矩陣，本質上是編碼了用戶和物品協同信息的降維模型。 - MF最基礎的分解方式，將評分矩陣R分解為用戶矩陣U和項目矩陣S，通過不斷的迭代訓練使得U和S的乘積越來越接近真實矩陣，矩陣分解過程如圖： - ![](https://i.imgur.com/ErhCeT6.png =300x) - ![](https://i.imgur.com/zjfcTI6.png =300x)(src = https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/88265583) - 預測值接近真實值就是使其差最小，這是我們的目標函數，然後採用梯度下降的方式迭代計算U和S，它們收斂時就是分解出來的矩陣。我們用損失函數來表示誤差（等價於目標函數）： - ![](https://i.imgur.com/QIObYOJ.png =300x) - R_ij是評分矩陣中已打分的值，U_i和S_j相當於未知變量。為求得最小值，相當於求關於U和S二元函數的最小值（極小值或許更貼切）。通常採用梯度下降的方法： - ![](https://i.imgur.com/inp3xPj.png =300x) - MF到FM的轉化 - ![](https://i.imgur.com/tSqWhH9.png =300x) - 本質上，MF模型是FM模型的特例，MF可以被認為是只有User ID 和Item ID這兩個特徵Fields的FM模型，MF將這兩類特徵通過矩陣分解，來達到將這兩類特徵embedding化表達的目的。通過梯度下降法來計算向量內積和實際評分矩陣之間的損失函數不斷優化確定< >的參數即分數。FM除了User ID和Item ID這兩類特徵外，還加入了二階特徵交叉特徵，可以看作是MF模型的進一步拓展。它將所有這些特徵轉化為embedding低維向量表達，並計算任意兩個特徵embedding的內積，就是特徵組合的權重，如果FM只使用User ID 和Item ID，則等價於MF. #### 從MF到FM模型 [[src]](https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/88265583) - MF 基本原理 - ![](https://i.imgur.com/ow1I4EO.jpg =400x) - ![](https://i.imgur.com/4QDptRF.jpg =400x) - （Matrix Factorization，矩陣分解）MF是個在推薦系統領域裡的協同過濾模型。 - 通過兩個低維小矩陣（一個 user embedding，一個 item embedding）的乘積，來模擬真實用戶點擊或評分產生的大的協同信息稀疏矩陣，本質上是編碼了用戶和物品協同信息的==降維模型==。 - 訓練完成，每個用戶和物品得到對應的低維 embedding 後，如果要預測某個「對的評分」的時候，只要它們做個「內積計算的得分就是預測得分」。看到這裡，讓你想起了什麼嗎？ - MF & FM - ![](https://i.imgur.com/keEEaOZ.jpg =300x) - MF模型是FM模型的特例，MF可以被認為是只有User ID 和Item ID這兩個特徵的 FM 模型，MF 將這兩類特徵通過矩陣分解，來達到將這兩類特徵 embedding 化表達的目的。 - FM則可以看作是MF模型的進一步拓展，除了User ID和Item ID這兩類特徵外，很多其它類型的特徵，都可以進一步融入FM模型裡，它將所有這些特徵轉化為embedding低維向量表達，並計算任意兩個特徵embedding的內積，就是特徵組合的權重，如果FM只使用User ID 和Item ID，你套到FM公式裡，看看它的預測過程和MF的預測過程一樣嗎？ - 從誰更早使用特徵embedding表達這個角度來看的話，很明顯，和FM比起來，MF才是真正的前輩，無非是特徵類型比較少而已。而FM繼承了MF的特徵embedding化表達這個優點，同時引入了更多Side information作為特徵，將更多特徵及Side information embedding化融入FM模型中。所以很明顯FM模型更靈活，能適應更多場合的應用范圍。 - 其一：在你有使用MF做協同過濾的想法的時候，暫時壓抑一下這種沖動，可以優先考慮引入FM來做的，而非傳統的MF，因為可以在實現等價功能的基礎上，很方便地融入其它任意你想加入的特徵，把手頭的事情做得更豐富多彩。 - 其二：從實際大規模數據場景下的應用來講，在排序階段，絕大多數只使用ID信息的模型是不實用的，沒有引入Side Information，也就是除了User ID／Item ID外的很多其它可用特徵的模型，是不具備實戰價值的。原因很簡單，大多數真實應用場景中，User/Item有很多信息可用，而協同數據只是其中的一種，引入更多特徵明顯對於更精準地進行個性化推薦是非常有幫助的。而如果模型不支持更多特徵的便捷引入，明顯受限嚴重，很難真正實用，這也是為何矩陣分解類的方法很少看到在Ranking階段使用，通常是作為一路召回形式存在的原因。 - ....... 剩下的去原網站讀好了 --- ### [FM] #### Factorization Machines — 稀疏資料的救星 [[src]](https://medium.com/@jimmywu0621/factorization-machines-%E7%A8%80%E7%96%8F%E8%B3%87%E6%96%99%E7%9A%84%E6%95%91%E6%98%9F-732153700d10) - 論文： https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf - 這篇講的是 FM 啊，不是 DeepFM! - 但是它提到了我們的困難 - preview : - ![](https://i.imgur.com/r4pQICy.jpg =300x) - ![](https://i.imgur.com/7lquNFt.png =300x) - ![](https://i.imgur.com/mrrhVsy.png =300x) --- ### [DeepFM] #### 輕讀論文(二):DeepFM: A Factorization-Machine based Neural Network for CTR Prediction [[src]](https://lufor129.medium.com/%E8%BC%95%E8%AE%80%E8%AB%96%E6%96%87-%E4%BA%8C-deepfm-a-factorization-machine-based-neural-network-for-ctr-prediction-9de74b8772ab) - 論文： https://arxiv.org/pdf/1703.04247.pdf - DeepFM 是由Wide & Deep這篇論文延伸出來的，因此這次會先介紹Wide & Deep。這篇主要目的是利用深度學習優化推薦系統。 - preview - ![](https://i.imgur.com/X7xSzf0.jpg =300x) #### [d2l .ai] 17.10. Deep Factorization Machines(文後有 DeepFM 範例程式碼) [[src]](https://d2l.ai/chapter_recommender-systems/deepfm.html) - Learning effective feature combinations is critical to the success of click-through rate prediction task. Factorization machines model feature interactions in a linear paradigm (e.g., bilinear interactions). This is often insufficient for real-world data where inherent feature crossing structures are usually very complex and nonlinear. What’s worse, second-order feature interactions are generally used in factorization machines in practice. Modeling higher degrees of feature combinations with factorization machines is possible theoretically but it is usually not adopted due to numerical instability and high computational complexity. > 學習有效的特徵組合對於點擊率預測任務的成功至關重要。分解機器以線性範式（例如，雙線性交互）對特徵交互進行建模。對於固有特徵交叉結構通常非常複雜和非線性的實際數據，這通常是不夠的。更糟糕的是，二階特徵交互在實踐中通常用於分解機器。用分解機對更高程度的特徵組合進行建模在理論上是可能的，但由於數值不穩定性和高計算複雜性，通常不採用這種方法。 - One effective solution is using deep neural networks. Deep neural networks are powerful in feature representation learning and have the potential to learn sophisticated feature interactions. As such, it is natural to integrate deep neural networks to factorization machines. Adding nonlinear transformation layers to factorization machines gives it the capability to model both low-order feature combinations and high-order feature combinations. Moreover, non-linear inherent structures from inputs can also be captured with deep neural networks. In this section, we will introduce a representative model named deep factorization machines (DeepFM) (Guo et al., 2017) which combine FM and deep neural networks. > 一種有效的解決方案是使用深度神經網絡。深度神經網絡在特徵表示學習方面非常強大，並且有可能學習複雜的特徵交互。因此，將深度神經網絡集成到分解機器是很自然的。向分解機器添加非線性變換層使其能夠對低階特徵組合和高階特徵組合進行建模。此外，來自輸入的非線性固有結構也可以用深度神經網絡捕獲。在本節中，我們將介紹一個具有代表性的模型，稱為深度分解機 (DeepFM) (Guo et al., 2017)，它結合了 FM 和深度神經網絡。 - DeepFM consists of an FM component and a deep component which are integrated in a parallel structure. The FM component is the same as the 2-way factorization machines which is used to model the low-order feature interactions. The deep component is an MLP that is used to capture high-order feature interactions and nonlinearities. These two components share the same inputs/embeddings and their outputs are summed up as the final prediction. It is worth pointing out that the spirit of DeepFM resembles that of the Wide & Deep architecture which can capture both memorization and generalization. The advantages of DeepFM over the Wide & Deep model is that it reduces the effort of hand-crafted feature engineering by identifying feature combinations automatically. > DeepFM 由一個 FM 組件和一個 deep 組件組成，它們集成在一個並行結構中。 FM 組件與用於對低階特徵交互進行建模的 2 路分解機相同。深度組件是一個 MLP，用於捕獲高階特徵交互和非線性。這兩個組件共享相同的輸入/嵌入，它們的輸出總結為最終預測。值得指出的是，DeepFM 的精神類似於 Wide & Deep 架構的精神，既能記憶又能泛化。 DeepFM 相對於 Wide & Deep 模型的優勢在於它通過自動識別特徵組合來減少手工特徵工程的工作量。 - We omit the description of the FM component for brevity and denote the output as $\hat{y}^{(FM)}$. Readers are referred to the last section for more details. Let $e_{i} ∈ ℝ^{k}$ denote the latent feature vector of the $i^{th}$ field. The input of the deep component is the concatenation of the dense embeddings of all fields that are looked up with the sparse categorical feature input, denoted as: > 為簡潔起見，我們省略了 FM 組件的描述，並將輸出表示為 $\hat{y}^{(FM)}$ 。讀者可參考最後一節了解更多詳情。讓 $e_{i} ∈ ℝ^{k}$ 表示潛在特徵向量 $i^{th}$ 個field。深度組件的輸入是使用稀疏分類特徵輸入查找的所有 fields 的密集嵌入的串聯 - Colab 範例： deepfm.ipynb - https://colab.research.google.com/github/d2l-ai/d2l-en-colab/blob/master/chapter_recommender-systems/deepfm.ipynb#scrollTo=7a441bf7 #### 【推荐算法实战】DeepFM模型（tensorflow2.0版）[[src]](https://bbs.huaweicloud.com/blogs/343721) --- ### [online 推薦] #### Facebook 如何給你下廣告 [[src]](https://medium.com/@jimmywu0621/facebook-%E5%A6%82%E4%BD%95%E7%B5%A6%E4%BD%A0%E4%B8%8B%E5%BB%A3%E5%91%8A-5b440c601213) - 雖然這個沒有明顯跟 FM / DeepFM 扯上關係， - 但本篇重點 Online Recommendation System ，也許我們的姿勢雲的推薦要從 offline 轉 online 推薦的過程中，這個圖讓我們可以進入正確的思考途徑 - ![](https://i.imgur.com/mRFcEEn.png =300x)