--- layout: global title: SystemML Algorithms Reference - Factorization Machines displayTitle: <a href="algorithms-reference.html">SystemML Algorithms Reference</a> --- <!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> # 7. Factorization Machines ### Description The Factorization Machine (FM), that is a general predictor like SVMs but is also able to estimate reliable parameters under very high sparsity. The factorization machine models all nested variable interactions (compared to a polynomial kernel in SVM), but uses a factorized parameterization instead of a dense parameterisation like in SVMs. ## Core Model ##### 1. Model Equation: <img src="https://tex.s2cms.ru/svg/%20%5Chat%7By%7D(x)%20%3D%0Aw_0%20%2B%0A%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%20w_i%20x_i%20%2B%0A%5Csum_%7Bi%3D1%7D%5E%7Bn%7D%20%5Csum_%7Bj%3Di%2B1%7D%5E%7Bn%7D%20%5Cleft%20%3Cv_i%2C%20v_j%20%5Cright%20%3E%20x_i%20x_j%0A" alt=" \hat{y}(x) = w_0 + \sum_{i=1}^{n} w_i x_i + \sum_{i=1}^{n} \sum_{j=i+1}^{n} \left &lt;v_i, v_j \right &gt; x_i x_j " /> where the model parameters that have to be estimated are: <img src="https://tex.s2cms.ru/svg/%0A%20w_0%20%5Cin%20R%2C%0A%20W%20%20%20%5Cin%20R%5En%2C%0A%20V%20%20%20%5Cin%20R%5E%7Bn%20%5Ctimes%20k%7D%0A%20" alt=" w_0 \in R, W \in R^n, V \in R^{n \times k} " /> and <img src="https://tex.s2cms.ru/svg/%0A%20%20%20%5Cleft%20%3C%5Ccdot%2C%20%5Ccdot%20%5Cright%20%3E%0A%20" alt=" \left &lt;\cdot, \cdot \right &gt; " /> is the dot product of two vectors of size <img src="https://tex.s2cms.ru/svg/k" alt="k" />: <img src="https://tex.s2cms.ru/svg/%0A%20%5Cleft%20%3Cv_i%2C%20v_j%20%5Cright%20%3E%20%3D%20%5Csum_%7Bf%3D1%7D%5E%7Bk%7D%20v_%7Bi%2Cf%7D%20%5Ccdot%20v_%7Bj%2Cf%7D%0A%20" alt=" \left &lt;v_i, v_j \right &gt; = \sum_{f=1}^{k} v_{i,f} \cdot v_{j,f} " /> A row <img src="https://tex.s2cms.ru/svg/%20v_i%20" alt=" v_i " /> with in <img src="https://tex.s2cms.ru/svg/%20V%20" alt=" V " />describes the <img src="https://tex.s2cms.ru/svg/i" alt="i" />th variable with <img src="https://tex.s2cms.ru/svg/k%20%5Cin%20N_0%5E%2B" alt="k \in N_0^+" /> factors. <img src="https://tex.s2cms.ru/svg/k" alt="k" /> is a hyperparameter, that defines the dimensionality of factorization. * <img src="https://tex.s2cms.ru/svg/%20w_0%20" alt=" w_0 " /> : global bias * <img src="https://tex.s2cms.ru/svg/%20w_j%20" alt=" w_j " /> : models the strength of the ith variable * <img src="https://tex.s2cms.ru/svg/%20w_%7Bi%2Cj%7D%20%3D%20%5Cleft%20%3Cv_i%2C%20v_j%20%5Cright%3E%20" alt=" w_{i,j} = \left &lt;v_i, v_j \right&gt; " /> : models the interaction between the <img src="https://tex.s2cms.ru/svg/i" alt="i" />th & <img src="https://tex.s2cms.ru/svg/j" alt="j" />th variable. Instead of using an own model parameter <img src="https://tex.s2cms.ru/svg/%20w_%7Bi%2Cj%7D%20%5Cin%20R%20" alt=" w_{i,j} \in R " /> for each interaction, the FM models the interaction by factorizing it. ##### 2. Expressiveness: It is well known that for any positive definite matrix <img src="https://tex.s2cms.ru/svg/W" alt="W" />, there exists a matrix <img src="https://tex.s2cms.ru/svg/V" alt="V" /> such that <img src="https://tex.s2cms.ru/svg/W%20%3D%20V%20%5Ccdot%20V%5Et" alt="W = V \cdot V^t" /> provided that <img src="https://tex.s2cms.ru/svg/k" alt="k" /> is sufficiently large. This shows that an FM can express any interaction matrix <img src="https://tex.s2cms.ru/svg/W" alt="W" /> if <img src="https://tex.s2cms.ru/svg/k" alt="k" /> is chosen large enough. ##### 3. Parameter Estimation Under Sparsity: In sparse settings, there is usually not enough data to estimate interaction between variables directly & independently. FMs can estimate interactions even in these settings well because they break the independence of the interaction parameters by factorizing them. ##### 4. Computation: Due to factorization of pairwise interactions, there is not model parameter that directly depends on two variables ( e.g., a parameter with an index <img src="https://tex.s2cms.ru/svg/(i%2Cj)" alt="(i,j)" /> ). So, the pairwise interactions can be reformulated as shown below. <img src="https://tex.s2cms.ru/svg/%0A%5Csum_%7Bi%3D1%7D%5En%20%5Csum_%7Bj%3Di%2B1%7D%5En%20%5Cleft%20%3Cv_i%2C%20v_j%20%5Cright%20%3E%20x_i%20x_j%0A" alt=" \sum_{i=1}^n \sum_{j=i+1}^n \left &lt;v_i, v_j \right &gt; x_i x_j " /> <img src="https://tex.s2cms.ru/svg/%0A%3D%20%7B1%20%5Cover%202%7D%20%5Csum_%7Bi%3D1%7D%5En%20%5Csum_%7Bj%3D1%7D%5En%20x_i%20x_j%20-%20%7B1%20%5Cover%202%7D%20%5Csum_%7Bi%3D1%7D%5En%20%5Cleft%20%3Cv_i%2C%20v_j%20%5Cright%20%3E%20x_i%20x_i%0A" alt=" = {1 \over 2} \sum_{i=1}^n \sum_{j=1}^n x_i x_j - {1 \over 2} \sum_{i=1}^n \left &lt;v_i, v_j \right &gt; x_i x_i " /> <img src="https://tex.s2cms.ru/svg/%0A%3D%20%7B1%20%5Cover%202%7D%20%5Cleft%20(%20%5Csum_%7Bi%3D1%7D%5En%20%5Csum_%7Bj%3D1%7D%5En%20%5Csum_%7Bf%3D1%7D%5Ek%20v_%7Bi%2Cf%7D%20v_%7Bj%2Cf%7D%20-%20%5Csum_%7Bi%3D1%7D%5En%20%5Csum_%7Bf%3D1%7D%5Ek%20v_%7Bi%2Cf%7Dv_%7Bi%2Cf%7D%20x_i%20x_i%20%5Cright%20)%0A" alt=" = {1 \over 2} \left ( \sum_{i=1}^n \sum_{j=1}^n \sum_{f=1}^k v_{i,f} v_{j,f} - \sum_{i=1}^n \sum_{f=1}^k v_{i,f}v_{i,f} x_i x_i \right ) " /> <img src="https://tex.s2cms.ru/svg/%0A%3D%20%7B1%20%5Cover%202%7D%20%5Cleft%20(%20%5Csum_%7Bf%3D1%7D%5Ek%20%5Cright%20)%20%5Cleft%20(%20%5Cleft%20(%5Csum_%7Bi%3D1%7D%5En%20v_%7Bi%2Cf%7D%20x_i%20%5Cright%20)%20%5Cleft%20(%5Csum_%7Bj%3D1%7D%5En%20v_%7Bj%2Cf%7D%20x_j%20%5Cright%20)%20-%20%5Csum_%7Bi%3D1%7D%5En%20v_%7Bi%2Cf%7D%5E2%20x_i%5E2%20%5Cright%20)%0A" alt=" = {1 \over 2} \left ( \sum_{f=1}^k \right ) \left ( \left (\sum_{i=1}^n v_{i,f} x_i \right ) \left (\sum_{j=1}^n v_{j,f} x_j \right ) - \sum_{i=1}^n v_{i,f}^2 x_i^2 \right ) " /> <img src="https://tex.s2cms.ru/svg/%0A%7B1%20%5Cover%202%7D%20%5Csum_%7Bf%3D1%7D%5Ek%20%5Cleft%20(%20%5Cleft%20(%20%5Csum_%7Bi%3D1%7D%5En%20v_%7Bi%2Cf%7D%20x_i%20%5Cright%20)%5E2%20-%20%5Csum_%7Bi%3D1%7D%5En%20v_%7Bi%2Cf%7D%5E2%20x_i%5E2%20%5Cright%20)%0A" alt=" {1 \over 2} \sum_{f=1}^k \left ( \left ( \sum_{i=1}^n v_{i,f} x_i \right )^2 - \sum_{i=1}^n v_{i,f}^2 x_i^2 \right ) " /> ##### 5. Learning Factorization Machines <img src="https://tex.s2cms.ru/svg/%0A%25%20%3C!%5BCDATA%5B%0A%7B%20%5Cdelta%20%5Cover%20%5Cdelta%20%5Ctheta%20%7D%20%5Chat%7By%7D(x)%20%3D%0A%5Cbegin%7Bcases%7D%0A1%20%26%20%5Ctext%7Bif%20%7D%20%5Ctheta%20%5Ctext%7B%20is%20%7D%20w_0%20%5C%5C%0Ax_i%20%26%20%5Ctext%7Bif%20%7D%20%5Ctheta%20%5Ctext%7B%20is%20%7D%20w_i%20%5C%5C%0Ax_i%20%5Csum_%7Bj%3D1%7D%5E%7Bn%7D%20v_%7Bj%2Cf%7D%20%5Ccdot%20x_j%20-%20v_%7Bi%2Cf%7D%20%5Ccdot%20x_i%5E2%20%26%20%5Ctext%7Bif%20%7D%20%5Ctheta%20%5Ctext%7B%20is%20%7D%20%5Ctheta_%7Bi%2Cf%7D%0A%5Cend%7Bcases%7D%20%25%5D%5D%3E%0A" alt=" % &lt;![CDATA[ { \delta \over \delta \theta } \hat{y}(x) = \begin{cases} 1 &amp; \text{if } \theta \text{ is } w_0 \\ x_i &amp; \text{if } \theta \text{ is } w_i \\ x_i \sum_{j=1}^{n} v_{j,f} \cdot x_j - v_{i,f} \cdot x_i^2 &amp; \text{if } \theta \text{ is } \theta_{i,f} \end{cases} %]]&gt; " /> ##### 6. Factorization Models as Predictors: * **Regression:** <img src="https://tex.s2cms.ru/svg/%5Chat%7By%7D(x)" alt="\hat{y}(x)" /> can be used directly as the predictor and the optimization criterion is the minimal least square error on <img src="https://tex.s2cms.ru/svg/D" alt="D" />. _**Usage:**_ The `train` function in the [fm-regression.dml](https://github.com/apache/systemml/blob/master/scripts/staging/fm-regression.dml) script, takes in the input variable matrix and the corresponding target vector with some input kept for validation during training. ``` train = function(matrix[double] X, matrix[double] y, matrix[double] X_val, matrix[double] y_val) return (matrix[double] w0, matrix[double] W, matrix[double] V) { /* * Trains the FM model. * * Inputs: * - X : n examples with d features, of shape (n, d) * - y : Target matrix, of shape (n, 1) * - X_val : Input validation data matrix, of shape (n, d) * - y_val : Target validation matrix, of shape (n, 1) * * Outputs: * - w0, W, V : updated model parameters. * * Network Architecture: * * X --> [model] --> out --> l2_loss::backward(out, y) --> dout * */ ... # 7.Call adam::update for all parameters [w0,mw0,vw0] = adam::update(w0, dw0, lr, beta1, beta2, epsilon, t, mw0, vw0); [W, mW, vW] = adam::update(W, dW, lr, beta1, beta2, epsilon, t, mW, vW ); [V, mV, vV] = adam::update(V, dV, lr, beta1, beta2, epsilon, t, mV, vV ); } ``` Once the `train` function returns the weights for the `fm` model, these are passed to `predict` function. ``` predict = function(matrix[double] X, matrix[double] w0, matrix[double] W, matrix[double] V) return (matrix[double] out) { /* * Computes the predictions for the given inputs. * * Inputs: * - X : n examples with d features, of shape (n, d). * - w0, W, V : trained model parameters. * * Outputs: * - out : target vector, y. */ out = fm::forward(X, w0, W, V); } ``` _**running with dummy data:**_ The [fm-regression-dummy-data.dml](https://github.com/apache/systemml/blob/master/scripts/nn/examples/fm-regression-dummy-data.dml) file can be a nice template, to extend. ``` > spark-submit ./SystemML.jar -f ./scripts/nn/examples/fm-regression-dummy-data.dml ``` * **Binary Classification:** The sign of <img src="https://tex.s2cms.ru/svg/%5Chat%7By%7D(x)" alt="\hat{y}(x)" /> is used & the parameters are optimized for the hinge loss or logit loss. _**Usage:**_ The `train` function in the [fm-binclass.dml](https://github.com/apache/systemml/blob/master/scripts/staging/fm-binclass.dml) script, takes in the input variable matrix and the corresponding target vector with a some input kept for validation during training. This script also contain `train()` and `predict()` function as in the case of regression. _**running with dummy data:**_ The [fm-regression-dummy-data.dml](https://github.com/apache/systemml/blob/master/scripts/nn/examples/fm-regression-dummy-data.dml) file can be a nice template, to extend. * **Ranking:** The vectors are ordered by score of <img src="https://tex.s2cms.ru/svg/%5Chat%7By%7D(x)" alt="\hat{y}(x)" /> and the optimization is done over pass of an instance vectors <img src="https://tex.s2cms.ru/svg/%20(x(a)%2C%20x(b))%20%5Cin%20D%20" alt=" (x(a), x(b)) \in D " /> with a pairwise classification loss. Regularization terms like <img src="https://tex.s2cms.ru/svg/L2" alt="L2" /> are usually applied to the optimization objective to prevent overfitting.