Semi-supervised Learning

# Semi-supervised Learning **Goal:** Using both labeled and unlabeled data to build better learners, than using each one alone. ## Tradition Methods1 * Self-Training: * Assumption: One’s own high confidence predictions are correct. * Disadvantages: * Early mistakes could reinforce themselves. * Cannot say too much in terms of convergence. * Generative Methods * Examples: * Mixture of Gaussian distributions (GMM) * Mixture of multinomial distributions (Naive Bayes) * Hidden Markov Models (HMM) * Disadvantages: * Often difficult to verify the correctness of the model * Model identifiability * EM local optima * Unlabeled data may hurt if generative model is wrong * Semi-Supervised Support Vector Machine (S3VM) * Assumption: Unlabeled data from different classes are separated with large margin. * Disadvantages: * Optimization difficult * Can be trapped in bad local optima * More modest assumption than generative model or graph-based methods, potentially lesser gain * Graph-based Algorithms * Assumption: A graph is given on the labeled and unlabeled data. Instances connected by heavy edge tend to have the same label. * Disadvantages: * Performance is bad if the graph is bad * Sensitive to graph structure and edge weights * Multiview Algorithms * Assumptions: * feature split x = [x(1); x(2)] exists * x(1) or x(2) alone is sufficient to train a good classifier * x(1) and x(2) are conditionally independent given the class ## Deep Learning Methods 2 1. [Semi-Supervised Learning Tutorial](https://pdfs.semanticscholar.org/55e9/52ecc655320446478ec0db941f3a8cd16919.pdf) 2. Bagherzadeh, J. and Asil, H., 2019. A review of various semi-supervised learning models with a deep learning and memory approach. Iran Journal of Computer Science, 2(2), pp.65-80. [Available Online](https://link.springer.com/article/10.1007/s42044-018-00027-6)