03 - HackMD

--- tags: Noah --- :::info Noah Nübling Machine Learning with MATLAB WS 2020/21 ::: # 03 ## Problem 1 Underfitting is when the model can not learn the properties of the training set with enough fidelity to where it can extract the underlying pattern of the data. This can happen for example because the models hypothesis function is not complex enough / it doesn't have enough features to work with Overfitting is when the model is so powerful that it learns the training set too well. But it learns the properties of the specific dataset instead of an underlying pattern. This prevents the model from generalizing and performing well on new data. Underfitting can be addressed by adding more (artificial) features. Overfitting can be addressed by adding a regularization term, decreasing the variance of the model. ## Problem 2 W and b are vectors, so we need to choose an index for which we derive. We call this index $l$ The derivative with respect to $W^l$ is this: $$\frac{d L(W,b)}{dW^l} \\ = \frac{1}{m} \sum_{i=1}^{m} x^i (\sum_{k=1, k\neq l}^{K}(1(y^i = k) \frac{e^{W^lx^i+b^l}}{\sum_{j=1}^{K}e^{W^jx^i+b^j}}) \\ - 1(y^i = l) (1 - \frac{e^{W^lx^i+b^l}}{\sum_{j=1}^{K}e^{W^jx^i+b^j}}))$$ $$ = \frac{1}{m} \sum_{i=1}^{m} x^i (\frac{e^{W^lx^i+b^l}}{\sum_{j=1}^{K}e^{W^jx^i+b^j}} - 1\cdot (y^i = l)) $$ We can quickly see, that the derivative with respect to $b^l$ must be the same, but with the leftmost $x^i$ replaced by a 1. $$ = \frac{1}{m} \sum_{i=1}^{m} (\frac{e^{W^lx^i+b^l}}{\sum_{j=1}^{K}e^{W^jx^i+b^j}} - 1\cdot (y^i = l)) $$ Mathematical proofs: ![](https://i.imgur.com/AEC3FDa.jpg) ![](https://i.imgur.com/wCjRZnt.jpg) ![](https://i.imgur.com/f9yCIw4.jpg) (At the end we pulled $x^i$ out of the sum which defines i, this doesn't make sense of course. $x^i$ should be within the outermost sum, as we've written in Latex above)