# Kernel Methods for Unsupervised Learning
###### tags: `Machine Learning II`
## Novelty detection
> We should consider changing our models when the data they are using changes, and is no longer stationary. This
- If data changes: **Adaptive Learning**
- Throw out your model and create a new one: **Novelty detection**
## Kernel clustering
### Similarity functions and distances
The goal of clustering is to group similar observations and to separate those that are different. But how do we define similarity?
Similarity functions serve as a measurement to analitically compute how similar two data points are.
>The ideal similarity function would then be 1 for
### Why does K-Means fail?
K-means is very much centered in the definition of a centroid, and there are cases in which this is not optimal (see slides 37 and 39)
### The alternative: spectral clustering
It is possible to see that the adjacency matrix contains all the information that is needed to separate observations into different groups. In kernel methods the kernel matrix is used, which allows us to use the input data to directly train the algorithm. Therefore,
Pros:
- No need to fix the number of clusters
Cons:
- Optimization takes much longer.
The algorithm diagonalizes the ==**similarity matrix**==, which contains **results of the similarity function** .
>Take a look at Density based clustering
#### Normalized cut formulation
## Appendix
### RBF kernel
All the data points in the RBF kernel have lenght one because of the kernel transformation:
- **RBF kernel**: $\exp(-\gamma \ ||x_1-x_2||^2)$
Where
$K(x_1,x_2) = h(x_1)^Th(x_2)$
$\sqrt{x_1^2+x_2^2+\dots+x_d^2}$
$|x| = \sqrt{x^Tx}$