# Semi-supervised Learning
**Goal:** Using both labeled and unlabeled data to build better learners, than using each one alone.
## Tradition Methods<sup>1</sup>
* Self-Training:
* Assumption: One’s own high confidence predictions are correct.
* Disadvantages:
* Early mistakes could reinforce themselves.
* Cannot say too much in terms of convergence.
* Generative Methods
* Examples:
* Mixture of Gaussian distributions (GMM)
* Mixture of multinomial distributions (Naive Bayes)
* Hidden Markov Models (HMM)
* Disadvantages:
* Often difficult to verify the correctness of the model
* Model identifiability
* EM local optima
* Unlabeled data may hurt if generative model is wrong
* Semi-Supervised Support Vector Machine (S3VM)
* Assumption: Unlabeled data from different classes are separated with large margin.
* Disadvantages:
* Optimization difficult
* Can be trapped in bad local optima
* More modest assumption than generative model or graph-based methods, potentially lesser gain
* Graph-based Algorithms
* Assumption: A graph is given on the labeled and unlabeled data. Instances connected by heavy edge tend to have the same label.
* Disadvantages:
* Performance is bad if the graph is bad
* Sensitive to graph structure and edge weights
* Multiview Algorithms
* Assumptions:
* feature split x = [x<sup>(1)</sup>; x<sup>(2)</sup>] exists
* x<sup>(1)</sup> or x<sup>(2)</sup> alone is sufficient to train a good classifier
* x<sup>(1)</sup> and x<sup>(2)</sup> are conditionally independent given the class
## Deep Learning Methods <sup>2</sup>
1. [Semi-Supervised Learning Tutorial](https://pdfs.semanticscholar.org/55e9/52ecc655320446478ec0db941f3a8cd16919.pdf)
2. Bagherzadeh, J. and Asil, H., 2019. A review of various semi-supervised learning models with a deep learning and memory approach. Iran Journal of Computer Science, 2(2), pp.65-80. [Available Online](https://link.springer.com/article/10.1007/s42044-018-00027-6)