###### tags: `Linear Algebra` `LA01` #### L07 Unsupervised Machine Learning and Clustering --- #### From week 5 and week 6 - Angles and Cosine Similarity - Application of Cosine Similarity - Supervised Maching learning: Knn --- ## This week - Unsupervised Machine Learning - Clustering and K-mean Clustering - Application of Clustering Algorithm --- ### 1. Unsupervised Machine Learning ![](https://drive.google.com/uc?export=view&id=1YfgiPkBkFb02ZdrR5S7lHn0T2nakwZsX) ---- ### 1.1 Definition of UL > Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a compact internal representation of its world and then generate imaginative content from it. -- Wikipedia ---- ![](https://drive.google.com/uc?export=view&id=1ea6FLqiTHTIWITLnjd1BMuiXJkU9llBi) --- ### 2. Clustering ![](https://drive.google.com/uc?export=view&id=1wnpqmBUADOqGIESm6txxLHb-55XKEgXj) ---- ### 2.1 Clustering Definition in Wiki: > clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). -- Wikipedia ---- ### 2.2 Types of Clustering: - <font size = "6">Connectivity-based clustering (hierarchical clustering)</font> - <font size = "6">Centroid-based clustering (k-means clustering)</font> - <font size = "6">Distribution-based clustering</font> - <font size = "6">Density-based clustering</font> - <font size = "6">Grid-based clustering</font> <font size = "4"> Our module only do Centroid-based Clustering ---- 2.3 K-means Clustering: ![](https://drive.google.com/uc?export=view&id=1D6SKWk5kNmJsYy3WEL4rl_smN6EGV1cG) ---- ### 2.3 K-means Clustering: <font size = "4">Given a list of N vectors $x_1, x_2, \dots, x_n$, and an initial list of k group representative vectors $z_1, \dots, z_k$ repeat until convergence(?): 1. Partition the vectors into k groups. For each vector i = 1, . . . , N , assign $x_i$ to the group associated with the nearest representative. 2. Update representatives. For each group j = 1, . . . , k, set $z_j$ to be the mean of the vectors in group j. ? But, What is Convergence ? ---- ### 2.4 Convergence: <font size="5">$J^{clust} = (||x_1 - z_{c_1}||^2 + \dots + ||x_N - z_{c_N}||^2) / N$ where: <font size="3">$x_1, \dots, x_N$ is the actural data points <font size="3">$z_{c_1} \dots, z_{c_N}$ is the centroid of each data point in each round of iteration. Algorithm stops when $$J_{round\ n}^{clust} - J_{round\ n-1}^{clust} \le C $$ C is a predefined threshold. We say algorithm converge at C. --- ### 3. Application of Clustering: - Topic Discovery - Customer Market segmentation - Recommendation Engine Python Time!! ---
{"metaMigratedAt":"2023-06-17T04:49:07.184Z","metaMigratedFrom":"YAML","title":"LA01_L07","breaks":true,"description":"Supervised Machine Learning and Knn.","slideOptions":"{\"theme\":\"sky\"}","contributors":"[{\"id\":\"d8479402-2b3f-4751-92f6-b67f55f4b94f\",\"add\":4293,\"del\":1065}]"}
    118 views