###### tags: `Linear Algebra` `LA01`
#### L07
Unsupervised Machine Learning
and Clustering
---
####
From week 5 and week 6
- Angles and Cosine Similarity
- Application of Cosine Similarity
- Supervised Maching learning: Knn
---
## This week
- Unsupervised Machine Learning
- Clustering and K-mean Clustering
- Application of Clustering Algorithm
---
###
1. Unsupervised Machine Learning

----
###
1.1 Definition of UL
> Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a compact internal representation of its world and then generate imaginative content from it. -- Wikipedia
----

---
###
2. Clustering

----
###
2.1 Clustering Definition in Wiki:
> clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). -- Wikipedia
----
###
2.2 Types of Clustering:
- <font size = "6">Connectivity-based clustering (hierarchical clustering)</font>
- <font size = "6">Centroid-based clustering (k-means clustering)</font>
- <font size = "6">Distribution-based clustering</font>
- <font size = "6">Density-based clustering</font>
- <font size = "6">Grid-based clustering</font>
<font size = "4"> Our module only do Centroid-based Clustering
----
2.3 K-means Clustering:

----
###
2.3 K-means Clustering:
<font size = "4">Given a list of N vectors $x_1, x_2, \dots, x_n$, and an initial list of k group representative vectors $z_1, \dots, z_k$
repeat until convergence(?):
1. Partition the vectors into k groups. For each vector i = 1, . . . , N , assign $x_i$ to the group associated with the nearest representative.
2. Update representatives. For each group j = 1, . . . , k, set $z_j$ to be the mean of the vectors in group j.
? But, What is Convergence ?
----
###
2.4 Convergence:
<font size="5">$J^{clust} = (||x_1 - z_{c_1}||^2 + \dots + ||x_N - z_{c_N}||^2) / N$
where:
<font size="3">$x_1, \dots, x_N$ is the actural data points
<font size="3">$z_{c_1} \dots, z_{c_N}$ is the centroid of each data point in each round of iteration.
Algorithm stops when
$$J_{round\ n}^{clust} - J_{round\ n-1}^{clust} \le C $$
C is a predefined threshold. We say algorithm converge at C.
---
###
3. Application of Clustering:
- Topic Discovery
- Customer Market segmentation
- Recommendation Engine
Python Time!!
---
{"metaMigratedAt":"2023-06-17T04:49:07.184Z","metaMigratedFrom":"YAML","title":"LA01_L07","breaks":true,"description":"Supervised Machine Learning and Knn.","slideOptions":"{\"theme\":\"sky\"}","contributors":"[{\"id\":\"d8479402-2b3f-4751-92f6-b67f55f4b94f\",\"add\":4293,\"del\":1065}]"}