School: National Chiao Tung University
Name: Shao-Wei ( Willy ) Chiu
NCTU
, Machine Learning
, Gaussian Process
, SVM
, GitHub
There is a function could transfer each into coressponding ( i.e., ).
Assume that and .
(i.e., )
On estimate the point, we have formula
After the derivation of probability, we get the
This form is almost the same as the formula mentioned by Prof. Chiu in the class.
The tiny difference is that it take the out of the matrix , but the course silde takes it into the matrix.
Willy Chiu
Thus, we could apply the to describe our model.
We notice that the kernel method is decided by some kernel parameters ( e.g., , , ), so we need to find the parameters which could have the maximum likelihood.
In my practice, I choose the random value of all parameter between 0 and 10, and call the scipy.optimize.minimize
to optimize it.
The relative formula is shown below,
Use get_K
to compute the covarince matrix, and get_k_test
use to compute the matrix.
Initial the value of parameters.
Add the for the range of [-60, 60], and apply the formula derived above.
Use scipy.optimize.minimize
to find the optimized parameter and re-do the gaussian process.
I found that if using random value of kernel parameters as its initial value, the result after optmizing might be bad for some extremly initial value.
Willy Chiu
libsvm
is availableCompute the user-defined kernel and convert the given training data in libsvm
format, for example:
For the precomputed kernel case, isKernel
must be set to True
.
Referrencing the
libsvm/svm.cpp
file, and use the defaultRBF
kernel inlibsvm/svm.cpp
for constructing user-definedlinear_RBF kernel
in the same way.( i.e., Transform c-style code to python-style. )
Willy Chiu
Grid search for each kernel ( include the precomputed kernel ) and find the best parameters of them.
libsvm/tools/grid.py
provide the API to grid-search forRBF
kernel. I tried to revise it into./grid.py
, so that it could find the best parameters of kernels which could be taken in thelibsvm
( except sigmoidkernel
).
And the./grid.py
take a log scale to the searching range such aslibsvm/tools/grid.py
did.
Willy Chiu
According the parameters form Step2
, doing prediction.
For saving time, I store the model after the first training of a new options and use the best parameters from grid-search result.
Willy Chiu
Output
Besides the ACC rate and MSE, I also record the training time, predict time and number of support vectors for each kernel.
In general, we would get higher ACC rate when the value of C
is smaller. This reason of this circustance is we would get the smaller slack
when setting larger value of C
, so that overfitting would be occurred.
Best parameter of each kernel
Linear
c | options | rate |
---|---|---|
0.03125 | -q -t 0 -v 5 -c 0.03125 | 97.2 |
0.125 | -q -t 0 -v 5 -c 0.125 | 96.98 |
0.5 | -q -t 0 -v 5 -c 0.5 | 96.22 |
2.0 | -q -t 0 -v 5 -c 2.0 | 96.28 |
8.0 | -q -t 0 -v 5 -c 8.0 | 96.32 |
32.0 | -q -t 0 -v 5 -c 32.0 | 96.2 |
128.0 | -q -t 0 -v 5 -c 128.0 | 96.34 |
512.0 | -q -t 0 -v 5 -c 512.0 | 96.34 |
2048.0 | -q -t 0 -v 5 -c 2048.0 | 96.16 |
8192.0 | -q -t 0 -v 5 -c 8192.0 | 96.16 |
32768.0 | -q -t 0 -v 5 -c 32768.0 | 96.48 |
Polynomial
c | g | r | d | options | rate |
---|---|---|---|---|---|
0.125 | 8192.0 | 32.0 | 2 | -q -t 1 -v 5 -c 0.125 -g 8192.0 -r 32.0 -d 2 | 98.4 |
0.125 | 512.0 | 32.0 | 2 | -q -t 1 -v 5 -c 0.125 -g 512.0 -r 32.0 -d 2 | 98.32 |
32768.0 | 2.0 | 2.0 | 2 | -q -t 1 -v 5 -c 32768.0 -g 2.0 -r 2.0 -d 2 | 98.32 |
… | |||||
0.03125 | 0.03125 | 8.0 | 1 | -q -t 1 -v 5 -c 0.03125 -g 0.03125 -r 8.0 -d 1 | 95.44 |
0.03125 | 0.03125 | 32.0 | 1 | -q -t 1 -v 5 -c 0.03125 -g 0.03125 -r 32.0 -d 1 | 95.40 |
0.03125 | 0.03125 | 0.5 | 1 | -q -t 1 -v 5 -c 0.03125 -g 0.03125 -r 0.5 -d 1 | 95.36 |
RBF
c | g | options | rate |
---|---|---|---|
2048.0 | 0.03125 | -q -t 2 -v 5 -c 2048.0 -g 0.03125 | 98.74 |
2.0 | 0.03125 | -q -t 2 -v 5 -c 2.0 -g 0.03125 | 98.62 |
8.0 | 0.03125 | -q -t 2 -v 5 -c 8.0 -g 0.03125 | 98.6 |
… | |||
32768.0 | 2048.0 | -q -t 2 -v 5 -c 32768.0 -g 2048.0 | 20.0 |
32768.0 | 8192.0 | -q -t 2 -v 5 -c 32768.0 -g 8192.0 | 20.0 |
32768.0 | 32768.0 | -q -t 2 -v 5 -c 32768.0 -g 32768.0 | 20.0 |
Kernel function:
According to the experiment result, when the becoming larger, the ACC rate
is much better than smaller .
Because the RBF kernel's formula is similar to Gaussion distribution ( i.e., they are in direct proportion ), and the is equal to , when is larger hints that is smaller. And when is smaller hints that the PDF of Gaussion distribution is sharper, so the overfitting may occurred.
Willy Chiu
Linear + RBF
c | options | rate |
---|---|---|
0.03125 | -q -t 4 -v 5 -c 0.03125 | 97.18 |
0.125 | -q -t 4 -v 5 -c 0.125 | 97.0 |
32768.0 | -q -t 4 -v 5 -c 32768.0 | 96.66 |
8.0 | -q -t 4 -v 5 -c 8.0 | 96.64 |
128.0 | -q -t 4 -v 5 -c 128.0 | 96.64 |
512.0 | -q -t 4 -v 5 -c 512.0 | 96.62 |
2.0 | -q -t 4 -v 5 -c 2.0 | 96.56 |
8192.0 | -q -t 4 -v 5 -c 8192.0 | 96.5 |
0.5 | -q -t 4 -v 5 -c 0.5 | 96.46 |
32.0 | -q -t 4 -v 5 -c 32.0 | 96.42 |
2048.0 | -q -t 4 -v 5 -c 2048.0 | 96.42 |
Linear+RBF kernel seems to work well while doing cross validation, but i got such a trashlike result while predicting.
At the first, I thought it might be wrong in the precomputed format and I tried to replace the default RBF kernel function in
libsvm
into linear+RBF function and re-run th e RBF kernel ( It is linear+RBF kernel now ) for checking whether the predicting result is bad or not.
But it still bad…, I don't even know why. I guess the reason is that RBF is none-linear, so when it combine to a linear kernel, the kernel funciton works badly.
Willy Chiu
Executing result of each kernel