# HEIG-VD MLG - Pratical work 3
Authors: Bécaud Arthur and Egremy Bruno.
<style type="text/css">
.t .c{border-color:#ffffff}
.c-center{text-align:center;vertical-align:middle}
.c-param{width:285px}
</style>
## Hold-out validation
1. The parameters are set using constant:
```python
N_INITS = 2; N_SPLITS = 10; DATASET_SIZE = 200;
EPOCHS = 100; N_NEURONS = 2; LEARNING_RATE = 0.001;
MOMENTUM = 0.7; TRAIN_TEST_RATIO = 0.8; DATA_PARAMS = np.arange(0.4, 0.71, 0.1)
```
While the dataset split has a custom function call *split_dataset(dataset, train_test_ratio)* to split the dataset following a specific ratio, here 0.8 = 80%.
1. The cyan curves represent the training runs while the red curves represent the testing runs.
The training is far more chaotic because the MLP is learning by using the gradient descent to minimize the error function. While, during the testing phases, it does not learn, the curve is more stable and represents the model understanding of the classification problem at this point.
1. The greater the spread, the greater the mean MSE. By increasing the spread, the problem becomes more complex and more difficult to separate using a linear function.
1. The random split might let the MLP train on a certain dataset that is different from the corresponding test dataset which results in a higher error for the testing set.
1. The more the dataset is spread, the more the result is variable. And as said before, the MSE is higher with a more spread out dataset.
## Cross-validation
1. The parameters are set as constants :
```python
N_SPLITS = 10
DATASET_SIZE = 200
EPOCHS = 20
N_NEURONS = 2
K = 5
LEARNING_RATE = 0.001
MOMENTUM = 0.7
DATA_PARAMS = np.arange(0.4, 0.71, 0.1)
```
While the dataset is made using a custom function *split_dataset(dataset, n_parts)*.
1. The hold-out validation guarantees only the ratio of train vs test data. Which data is used for training or testing is chosen randomly but the whole dataset will eventually be used.
The cross-validation also uses a train vs test ratio but will always test the whole dataset in k batch. Each time, the test set will be a subset of the whole dataset but each time a different one from the previous already tested. The cross-validation also implies the use of batch which means that, by design, the cross-validation will use the whole dataset k times instead of only one time like the hold-out validation.
1. First of all, the overall MSE is far better for cross-validation. Furthermore, the boxes are a lot more compact which means there is less variability in the results for each dataset.
## Voice recognition
### Man vs Woman - natural
The dataset was prepared with the average of every 13 features/coefficients of the 72 natural voice samples of men and women (evenly distributed) using the Mel-Frequency Cepstrum Coefficients (MFCC). Then the number of iterations of the backpropagation algorithm (epochs number) was estimated by fixing the `learning rate to 0.001` and `momentum to 0.5`.
<center>
<img src="./assets/n_men_women_epochs.png" alt="epochs for natural man vs woman model" width="560"/>
</center>
At around 100, for every neuron count, the MSE stabilizes. So the sweet spot should be around 100. Knowing the number of epochs, the model was tested with different numbers of hidden neurons to test its performance (MSE).
<center>
<img src="./assets/n_men_women_neurons.png" alt="neurons for natural man vs woman model" width="560"/>
</center>
<table class="t">
<tbody>
<tr>
<td class="c">
At 6 neurons, the testing curve does not get any better, only the thickness of the curves varies in an unstable manner. So, with resource and computation trade-off in mind, 6 neurons should be good enough.
We can also see that the curves tend to be more and more horizontal, meaning the MSE does not diminish anymore, the choice of 100 epochs is, here, confirmed as good.
We concluded that the final model will be able to solve the problem with the following parameters using the cross-validation method from the `k_fold_cross_validation.py` file.
</td>
<td class="c c-param">
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Number of features</td>
<td class="c-center">13</td>
</tr>
<tr>
<td>k</td>
<td class="c-center">5</td>
</tr>
<tr>
<td>Learning rate</td>
<td class="c-center">0.001</td>
</tr>
<tr>
<td>Momentum</td>
<td class="c-center">0.5</td>
</tr>
<tr>
<td>Epochs</td>
<td class="c-center">100</td>
</tr>
<tr>
<td>Number of hidden neurons</td>
<td class="c-center">6</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
The model evaluation shows that the problem could be solved with the chosen parameters :
<center>
<table class="t">
<tbody>
<tr>
<td class="c">
<table>
<thead>
<tr>
<th>Evaluation</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td>MSE training</td>
<td>0.0299</td>
</tr>
<tr>
<td>MSE test</td>
<td>0.1390</td>
</tr>
<tr>
<td>Precision</td>
<td>0.9722</td>
</tr>
<tr>
<td>Recall</td>
<td>0.9722</td>
</tr>
<tr>
<td>F-Score</td>
<td>0.9722</td>
</tr>
</tbody>
</table>
</td>
<td class="c">
<table>
<thead>
<tr>
<th class="c-center" colspan="2" rowspan="2">Confusion matrix</th>
<th class="c-center"" colspan="2">True/Actual</th>
</tr>
<tr>
<td class="c-center">Man</td>
<td class="c-center">Woman</td>
</tr>
</thead>
<tbody>
<tr>
<td class="c-center" rowspan="2">Predicted</td>
<td class="c-center">Man</td>
<td class="c-center">70</td>
<td class="c-center">2</td>
</tr>
<tr>
<td class="c-center">Woman</td>
<td class="c-center">2</td>
<td class="c-center">70</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</center>
### Man vs Woman
The dataset was prepared with the average of every 13 features/coefficients of the 144 natural and synthetic voices samples of men and women (evenly distributed) using the Mel-Frequency Cepstrum Coefficients (MFCC).
Then the same procedure from the `Man vs Woman - natural` section is applied to find an estimation of `epochs` and `hidden neurons`.
<center>
<img src="./assets/ns_men_women_epochs.png" alt="epochs for man vs woman model" width="560"/>
</center>
At around 50, for every neuron count, the MSE stabilizes (some extreme occurrences can be seen but are rare). So the sweet spot should be around 50
<center>
<img src="./assets/ns_men_women_neurons.png" alt="neurons for man vs woman model" width="560"/>
</center>
<table class="t">
<tbody>
<tr>
<td class="c">
At 6 neurons, the testing curve does not get any better, only the thickness of the curves varies in an unstable manner. So, with resource and computation trade-off in mind, 6 neurons should be good enough.
We can also see that the curves tend to be more and more horizontal, meaning the MSE does not diminish anymore, the choice of 100 epochs is, here, confirmed as good.
We concluded that the final model will be able to solve the problem with the following parameters using the cross-validation method from the `k_fold_cross_validation.py` file.
</td>
<td class="c c-param">
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Number of features</td>
<td class="c-center">13</td>
</tr>
<tr>
<td>k</td>
<td class="c-center">5</td>
</tr>
<tr>
<td>Learning rate</td>
<td class="c-center">0.001</td>
</tr>
<tr>
<td>Momentum</td>
<td class="c-center">0.5</td>
</tr>
<tr>
<td>Epochs</td>
<td class="c-center">50</td>
</tr>
<tr>
<td>Number of hidden neurons</td>
<td class="c-center">6</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
The model evaluation shows that the problem could be solved with the chosen parameters :
<center>
<table class="t">
<tbody>
<tr>
<td class="c">
<table>
<thead>
<tr>
<th>Evaluation</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td>MSE training</td>
<td>0.0347</td>
</tr>
<tr>
<td>MSE test</td>
<td>0.1508</td>
</tr>
<tr>
<td>Precision</td>
<td>0.9853</td>
</tr>
<tr>
<td>Recall</td>
<td>0.9306</td>
</tr>
<tr>
<td>F-Score</td>
<td>0.9571</td>
</tr>
</tbody>
</table>
</td>
<td class="c">
<table>
<thead>
<tr>
<th class="c-center" colspan="2" rowspan="2">Confusion matrix</th>
<th class="c-center"" colspan="2">True/Actual</th>
</tr>
<tr>
<td class="c-center">Man</td>
<td class="c-center">Woman</td>
</tr>
</thead>
<tbody>
<tr>
<td class="c-center" rowspan="2">Predicted</td>
<td class="c-center">Man</td>
<td class="c-center">67</td>
<td class="c-center">5</td>
</tr>
<tr>
<td class="c-center">Woman</td>
<td class="c-center">1</td>
<td class="c-center">71</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</center>
### Man vs Woman vs Children
The dataset was prepared with the average of every 13 features/coefficients of the 360 natural and synthetic voices samples of men, women and kids (not evenly distributed, 1/5 men, 1/5 women and 3/5 kids of 3, 5 and 7 years) using the Mel-Frequency Cepstrum Coefficients (MFCC).
Then the same procedure from the previous sections is applied to find an estimation of `epochs` and `hidden neurons`.
<center>
<img src="./assets/ns_men_women_kid_epochs.png" alt="epochs for man vs woman vs children model" width="560"/>
</center>
At around 125, for every neuron count, the MSE stabilizes. So the sweet spot should be around 125.
<center>
<img src="./assets/ns_men_women_kid_neurons.png" alt="neurons for man vs woman model" width="560"/>
</center>
<table class="t">
<tbody>
<tr>
<td class="c">
At 10 neurons, the testing curve does not get any better, only the thickness of the curves varies in an unstable manner. So, with resource and computation trade-off in mind, 10 neurons should be good enough. We can also see that the curves tend to be more and more horizontal, meaning the MSE does not diminish anymore, the choice of 80 epochs would be sufficient, 125 is too much.
We concluded that the final model will be able to solve the problem with the following parameters using the cross-validation method from the `k_fold_cross_validation.py` file.
</td>
<td class="c c-param">
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Number of features</td>
<td class="c-center">13</td>
</tr>
<tr>
<td>k</td>
<td class="c-center">5</td>
</tr>
<tr>
<td>Learning rate</td>
<td class="c-center">0.001</td>
</tr>
<tr>
<td>Momentum</td>
<td class="c-center">0.5</td>
</tr>
<tr>
<td>Epochs</td>
<td class="c-center">125</td>
</tr>
<tr>
<td>Number of hidden neurons</td>
<td class="c-center">10</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
The model evaluation shows that the problem could be solved with the chosen parameters :
<center>
<table class="t">
<tbody>
<tr>
<td class="c">
<table>
<thead>
<tr>
<th>Evaluation</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td>MSE training</td>
<td>0.1265</td>
</tr>
<tr>
<td>MSE test</td>
<td>0.3414</td>
</tr>
<tr>
<td>Precision</td>
<td>0.9857</td>
</tr>
<tr>
<td>Recall</td>
<td>0.9452</td>
</tr>
<tr>
<td>F-Score</td>
<td>0.9650</td>
</tr>
</tbody>
</table>
</td>
<td class="c">
<table class="">
<thead>
<tr>
<th class="c-center" colspan="2" rowspan="2">Confusion matrix</th>
<th class="c-center" colspan="3">True/Actual</th>
</tr>
<tr>
<td class="c-center">Man</td>
<td class="c-center">Woman</td>
<td class="c-center">Kid</td>
</tr>
</thead>
<tbody>
<tr>
<td class="c-center" rowspan="3">Predicted</td>
<td class="c-center">Man</td>
<td class="c-center">69</td>
<td class="c-center">4</td>
<td class="c-center">2</td>
</tr>
<tr>
<td class="c-center">Woman</td>
<td class="c-center">1</td>
<td class="c-center">38</td>
<td class="c-center">31</td>
</tr>
<tr>
<td class="c-center">Kid</td>
<td class="c-center">1</td>
<td class="c-center">24</td>
<td class="c-center">189</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</center>
### Man vs Children
The dataset was prepared with the average of every 13 features/coefficients of the 288 natural voice samples of men and kids (not evenly distributed, 1/4 men and 3/4 kids with the age of 3/5/7) using the Mel-Frequency Cepstrum Coefficients (MFCC). Then the number of iterations of the backpropagation algorithm (epochs number) was estimated by fixing the `learning rate to 0.001` and `momentum to 0.5`.
<center>
<img src="./assets/ns_men_kids_epochs.png" alt="epochs for natural man vs kid model" width="540"/>
</center>
At around 40, for every neuron count, the MSE stabilizes. So the sweet spot should be around 40. Knowing the number of epochs, the model was tested with different numbers of hidden neurons to test its performance (MSE).
<center>
<img src="./assets/ns_men_kids_neurons.png" alt="neurons for natural man vs kid model" width="540"/>
</center>
<table class="t">
<tbody>
<tr>
<td class="c">
At 10 neurons, the testing curve does not get any better, only the thickness of the curves varies in an unstable manner. So, with resource and computation trade-off in mind, 10 neurons should be good enough.
We can also see that the curves tend to be more and more horizontal, meaning the MSE does not diminish anymore, the choice of 40 epochs is, here, confirmed as good.
We concluded that the final model will be able to solve the problem with the following parameters using the cross-validation method from the `k_fold_cross_validation.py` file.
</td>
<td class="c c-param">
<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Number of features</td>
<td class="c-center">13</td>
</tr>
<tr>
<td>k</td>
<td class="c-center">5</td>
</tr>
<tr>
<td>Learning rate</td>
<td class="c-center">0.001</td>
</tr>
<tr>
<td>Momentum</td>
<td class="c-center">0.5</td>
</tr>
<tr>
<td>Epochs</td>
<td class="c-center">40</td>
</tr>
<tr>
<td>Number of hidden neurons</td>
<td class="c-center">10</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
The model evaluation shows that the problem could be solved with the chosen parameters :
<center>
<table class="t">
<tbody>
<tr>
<td class="c">
<table>
<thead>
<tr>
<th>Evaluation</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td>MSE training</td>
<td>0.0137</td>
</tr>
<tr>
<td>MSE test</td>
<td>0.0504</td>
</tr>
</tbody>
</table>
</td>
<td class="c">
<table>
<thead>
<tr>
<th>Evaluation</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td>Precision</td>
<td>1.0000</td>
</tr>
<tr>
<td>Recall</td>
<td>0.9722</td>
</tr>
<tr>
<td>F-Score</td>
<td>0.9859</td>
</tr>
</tbody>
</table>
</td>
<td class="c">
<table>
<thead>
<tr>
<th class="c-center" colspan="2" rowspan="2">Confusion matrix</th>
<th class="c-center" colspan="2">True/Actual</th>
</tr>
<tr>
<td class="c-center">Man</td>
<td class="c-center">Kid</td>
</tr>
</thead>
<tbody>
<tr>
<td class="c-center" rowspan="2">Predicted</td>
<td class="c-center">Man</td>
<td class="c-center">70</td>
<td class="c-center">2</td>
</tr>
<tr>
<td class="c-center">Kid</td>
<td class="c-center">0</td>
<td class="c-center">216</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</center>