LA01_L07 - HackMD

###### tags: `Linear Algebra` `LA01` #### L07 Unsupervised Machine Learning and Clustering --- #### From week 5 and week 6 - Angles and Cosine Similarity - Application of Cosine Similarity - Supervised Maching learning: Knn --- ## This week - Unsupervised Machine Learning - Clustering and K-mean Clustering - Application of Clustering Algorithm --- ### 1. Unsupervised Machine Learning ![](https://drive.google.com/uc?export=view&id=1YfgiPkBkFb02ZdrR5S7lHn0T2nakwZsX) ---- ### 1.1 Definition of UL > Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a compact internal representation of its world and then generate imaginative content from it. -- Wikipedia ---- ![](https://drive.google.com/uc?export=view&id=1ea6FLqiTHTIWITLnjd1BMuiXJkU9llBi) --- ### 2. Clustering ![](https://drive.google.com/uc?export=view&id=1wnpqmBUADOqGIESm6txxLHb-55XKEgXj) ---- ### 2.1 Clustering Definition in Wiki: > clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). -- Wikipedia ---- ### 2.2 Types of Clustering: - Connectivity-based clustering (hierarchical clustering) - Centroid-based clustering (k-means clustering) - Distribution-based clustering - Density-based clustering - Grid-based clustering Our module only do Centroid-based Clustering ---- 2.3 K-means Clustering: ![](https://drive.google.com/uc?export=view&id=1D6SKWk5kNmJsYy3WEL4rl_smN6EGV1cG) ---- ### 2.3 K-means Clustering: Given a list of N vectors $x_1, x_2, \dots, x_n$, and an initial list of k group representative vectors $z_1, \dots, z_k$ repeat until convergence(?): 1. Partition the vectors into k groups. For each vector i = 1, . . . , N , assign $x_i$ to the group associated with the nearest representative. 2. Update representatives. For each group j = 1, . . . , k, set $z_j$ to be the mean of the vectors in group j. ? But, What is Convergence ? ---- ### 2.4 Convergence: $J^{clust} = (||x_1 - z_{c_1}||^2 + \dots + ||x_N - z_{c_N}||^2) / N$ where: $x_1, \dots, x_N$ is the actural data points $z_{c_1} \dots, z_{c_N}$ is the centroid of each data point in each round of iteration. Algorithm stops when $$J_{round\ n}^{clust} - J_{round\ n-1}^{clust} \le C $$ C is a predefined threshold. We say algorithm converge at C. --- ### 3. Application of Clustering: - Topic Discovery - Customer Market segmentation - Recommendation Engine Python Time!! ---