hw6 - HackMD

<style> H2{color:#E87765; font-weight;} H3{color:#89A85A; font-weight;} H4 {color:#4EA5C1;font-weight;} li li, p{color:Black; font-weight:normal; !important;} table, th {color:Black; font-weight:normal; font-size: 3px;!important;} </style> # hw6 ## Code ### Part1 Clustering algorithm and visualization #### Part1 - Kernal K-means Overview ```python class kmeasn: def train(self): img_records = [] converge_iter = 0 converge = 5 # 1. Initialization n, d = self.data.shape # (1) initializa centers centers = self.init_centers() # (2) initialize dist dist = cdist(self.data, centers, 'sqeuclidean') # (3) initialize clusters clusters = np.zeros((n, self.k)) clusters[np.arange(n), np.random.randint(self.k, size=n)] = 1 img_records.append(clusters) # assign data to the closest cluster for i in range(n): clusters[i, np.argmin(dist[i])] = 1 img_records.append(clusters) # 2. Lloyd's algorithm for iter in range(100): # E-step : classify all samples according to closet μk if self.kernal_mode: dist = self.__kernal_dist(self.data, clusters) else: dist = cdist(self.data, centers, 'sqeuclidean') # M-step : re-compute the centers μk in cluster Ck new_clusters = np.zeros_like(dist) for i in range(n): new_clusters[i, np.argmin(dist[i])] = 1 new_centers = (new_clusters.T @ self.data) / (np.sum(new_clusters, axis=0, keepdims=True).T + 1e-9) img_records.append(new_clusters) # Retrun upon convergence delta = np.linalg.norm(new_centers - centers) if delta < 1e-10: converge -= 1 if converge == 0: converge_iter = iter + 1 centers = new_centers clusters = new_clusters print(f'k-means : (k = {self.k}, {self.method}) ', bashc.GREEN +'is converged at iter [{converge_iter}' + bashc.ENDC) return img_records, converge_iter ``` Steps : 1. Read images and transform it to the gram matrix using the new rbf kernal * $k(x, x') = e^{-\gamma_s||S(x)-S(x')||} * e^{-\gamma_c||C(x)-C(x')||}$ ```python def new_kernal(x, gamma_s=1e-3, gamma_c=1e-3): # dist_s = ||S(x) - S(x')||^2 grid = np.indices((100, 100)).reshape((2, 10000, 1)) sx = np.hstack((grid[0], grid[1])) ds = cdist(sx, sx, 'sqeuclidean') # dist_c = ||C(x) - C(x')||^2 dc = cdist(x, x, 'sqeuclidean') # kenral = e^(-γs ds) * e^(-γs dc) kernel = np.exp(-gamma_s * ds) * np.exp(-gamma_c * dc) return kernel ``` 2. Initialize centers and clusters. * Check Part3 for the details of init_centers() 3. Apply ==Lloyd's algorithm== to perform k-means clustering, which recursively do EM steps. * E-step : calssify data to the closest $\mu_k$ * Calculate the distance from each data to the centers, saved as $dist_{nxk}$ * For each entry $s_{jk}$ of $dist_{nxk}$ : $\begin{split} s_{jk} &= \Vert \phi(x_j - \mu_k^\phi)\Vert \\ &= k(x_j, x_j) - {2\over |C_k|}\sum_{n}\alpha_{kn}k(x_j, x_n) + { 1\over |C_k|^2}\sum_{p}\sum_{q}\alpha_{kp}\alpha_{kq}k(x_p, x_q) \end{split}$ ```python def __kernal_dist(self, gram, ck): n, cnt_k = ck.shape dist = np.zeros_like(ck) # Ck = |Ck| Ck = np.sum(ck, axis=0) # Sjk = -2/|Ck| * ∑K(xj, xn) with xn∈ck Sjk = -2 * (gram @ ck) / Ck # Spq = 1/|Ck|^2 * ∑∑K(xp,xq) with xp, xq∈ck Spq = np.ones((n, n)) @ Sjk / (Ck ** 2 + 1e-9) # distance_jk = K(xj, xj) - 2∑K(xj, xn)/|Ck| + ∑∑K(xp,xq)/|Ck|^2 for j in range(n): for k in range(cnt_k): dist[j, k] = gram[j, j] + Sjk[j, k] + Spq[j, k] return dist ``` * M-step : Re-compute the center $\mu_k$ of each cluster Ck ```python new_clusters = np.zeros_like(dist) for i in range(n): new_clusters[i, np.argmin(dist[i])] = 1 new_centers = (new_clusters.T @ self.data) / (np.sum(new_clusters, axis=0, keepdims=True).T + 1e-9) img_records.append(new_clusters) ``` #### Part1 - spectral clustering * Overview ```python class spectral_clustering def train(self): # 1. Calculate Laplacian L self.__cal_laplacian() # 2. Do eigendecomposition to get eig_vals and eig_vecs k_lambs, U = self.__get_U(self.L, self.k) # 3. For i=1 to n, cluster yi ∈ Rk with k-means algorithm km_model = kmeans(U, k=self.k, method=self.kmeans_method, kernal_mode=False) img_records, iter = km_model.train() return img_records, k_lambs, U ``` * Gram matrix K serves as the similarity graph W. * To partition a more balanced graph, we use ratio / normalized cut as objective functions. * Approximating the cut minimization problems, we have the solutions: * Note : assume U be the first kth normalized eigenvectors * Ratio cut : * $\min\limits_{H \in R^{nk}} Tr(H'LH)$ subject to $H'H=I$ with ==Unnormalized Laplacian L = D - W== * Normalized cut : * $\min\limits_{T \in R^{nk}} Tr(T'LT)$ subject to $T'T=I$ with ==Normalized Laplacian L = Lsym = $D^{-0.5}LD^{-0.5T}$== * ```python def __cal_laplacian(self): # L = D - W self.D = np.diag(np.sum(self.W, axis=1)) self.L = self.D - self.W ``` * ```python def __eig(self, L): # Define file pathes ... # Calculate eig_vals and eig_vecs if(self.train_mode): # RatioCut L = self.L # NormalizedCut if self.normalize: # L = Dsym = D^(-1/2)LD^(-1/2) _D = np.zeros_like(self.D) for i in range(len(self.D)): _D[i, i] = self.D[i, i] ** -0.5 L = _D @ self.L @ _D eig_vals, eig_vecs = np.linalg.eigh(L) np.save(val_path, eig_vals) np.save(vec_path, eig_vecs) return eig_vals, eig_vecs # Load eig_vals and eig_vecs from files else: ... ``` * Important details : * We should pick the first kth eigenvectors with ==positive eigenvalue== * Either ratio or normalized method should do normalization on eigenvectors ```python def __get_U(self, L, k): # 1. Get the eig_vals and eig_vecs eig_vals, eig_vecs = self.__eig(L) cut = 'normalized' if self.normalize else 'ratio' # 2. Get the first k non-zero eigenvalues λ sort_idx = np.argsort(eig_vals) start = 0 while eig_vals[sort_idx[start]] <= 1e-9: start += 1 k_idx = sort_idx[start : start + self.k] # U ∈ R_nk containing k eigenvectors U = eig_vecs[:, k_idx] lambs = eig_vals[k_idx] # Normalized U (since H'H / T'T = I) U /= np.linalg.norm(U, axis=1).reshape(-1,1) # (additional experiment : visualize the eigenvalues λ) # plot_dot(np.arange(len(eig_vals)), eig_vals, f'{cut}_with_k={self.k}') return lambs, U ``` * According to tho following algorithm, we should cluster $y_i \in T_{R^k}$ using k-means algorithm in the last step. <img src='https://i.imgur.com/1tpB2Hb.png' width=500> ```python class spectral_clustering(): def train(self): # 1. Calculate Laplacian L self.__cal_laplacian() # 2. Do eigendecomposition to get eig_vals and eig_vecs k_lambs, U = self.__get_U(self.L, self.k) # 3. For i=1 to n, cluster yi ∈ Rk with k-means algorithm km_model = kmeans(U, k=self.k, method=self.kmeans_method, kernal_mode=False) img_records, iter = km_model.train() return img_records, k_lambs, U ``` #### Part1 - visualization * Visualize clusters in each iteration, and colorize each cluster with different colors ```python k_color = colors.to_rgba_array(['skyblue','teal','cornflowerblue','yellowgreen','orange','palevioletred','plum','gold','pink', 'tan']) # colorize each cluster with different colors def visualize_clusters(img_records, save_path, figsize=(100, 100, 4)): gif = [] # for each iteration for i in range(len(img_records)): # Get the cluster of each data point in the current img c_id = np.argmax(img_records[i], axis=1) n = c_id.shape[0] # Colorize each cluster with different colors img = np.zeros(figsize, dtype=np.uint8) for q in range(100): for r in range(100): cluster = c_id[q * 100 + r] img[q, r] = k_color[cluster] * 255 gif.append(img) imageio.mimsave(save_path, gif) return ``` ### Part2 #### Part2 - Try more clusters * Initialize kmeans and spectral_clustering classes with k from 2 to 10. * kmeans ```python # main() of k-means for i, kernal in enumerate(kernals): for j in range(9): # k = 2 to 10 k = 2 + j # kmeans model = kmeans(kernal, k, method='default') record, _ = model.train() visualize_clusters(record, f'output_kmeans/img{i+1}_kmeans(k={k}).gif') # kmeans++ model_kpp = kmeans(kernal, k, method='kmeans++') record_kpp, _ = model_kpp.train() visualize_clusters(record_kpp, f'output_kmeans/img{i+1}_kmeans++(k={k}).gif') print('-----------------------------------------------------------------------------') ``` * spectral clustering ```python # main() of spectral_clustering for i, similarity in enumerate(kernals): for k in range(2, 11): params = {'root': f'img{i+1}', 'normalized': False, 'train_mode': train_mode, 'kmeans_method': kmeans_method} for cut_method in ['normalized', 'ratio'] : print(f'Spectral_clustering {cut_method}cut with k = {k}') # set the parameters params['normalized'] = True if cut_method == 'normalized' else False filename = f'output_spectral/img{i+1}_{cut_method}({kmeans_method}={k})' # train the model model = spectral_clustering(similarity, k, params) records, eig_vals, eig_vecs = model.train() # visualization visualize_clusters(records, filename+'.gif') visualize_eigvecs(eig_vecs, records[-1], k, filename+'.png') print('-----------------------------------------------------------------------------') ``` ### Part3 * Try 2 methods to initialize the clusters ```python class kmeans(): def __init__(self, data, k=2, method='default', kernal_mode=True): self.init_centers = self.__kmeanspp if method == 'kmeanspp' else self.__random ``` * random * randomly pick k data as centers ```python def __random(self): idx = list(random.sample(range(0, self.data.shape[0]), self.k)) return np.array(self.data[idx]) ``` * kmeans++ * keep clustering data then picking the points with maximum distance from its centroid as a new center. ```python def __kmeanspp(self): n, d = self.data.shape centers = np.zeros((self.k, d)) centers[0] = self.data[np.random.randint(n)] for i in range(1, self.k): # dist to each center dist = cdist(self.data, centers[:i], 'sqeuclidean') # classify to the closet center dist = np.min(dist, axis=1) # pick the farest point as center centers[i] = self.data[np.argmax(dist, axis=0)] return centers ``` ### Part4 * Visualize the eigenspace coordinatesas as points ```python # plot eigenvectors def visualize_eigvecs(vecs, clusters, k, fig_path): fig = plt.figure() ax = fig.gca(projection="3d") n, d = vecs.shape # translate clusters from nxk to nx1 clusters_id = np.argmax(clusters, axis=1) for ki in range(k): # k_idx : select data within cluster ki k_idx = clusters_id == ki # visualize using the first 3 axes x = vecs[k_idx, 0] y = vecs[k_idx, 1] if d < 3: ax.scatter(x, y, marker='.', c=[k_color[ki]]) else: z = vecs[k_idx, 2] ax.scatter(x, y, z, marker='.', c=[k_color[ki]]) plt.savefig(fig_path) ``` ## Experiments & Discussion ### Part1 #### Show the clustering procedure * Image1 : k=2 |iter| 0 | 10 |last iter|gif| |----|---|----|---------|---| |Kmeans iter=43|![](https://i.imgur.com/bshAm6O.png)|![](https://i.imgur.com/Y6IVsTy.png)|![](https://i.imgur.com/XeAtkwV.png)|![](https://i.imgur.com/dTdFvcO.gif)| |Ratio iter=14|![](https://i.imgur.com/1vwcXgv.png)|![](https://i.imgur.com/cCV7pZH.png)|![](https://i.imgur.com/8419c2z.png)|![](https://i.imgur.com/NerDzFX.gif) |Normalized iter=15|![](https://i.imgur.com/4AmHPrP.png)|![](https://i.imgur.com/mfbjaM9.png)|![](https://i.imgur.com/FflLRLj.png)|![](https://i.imgur.com/44XAgd8.gif) * Image2 |iter| 0 | 10 |last iter|gif| |----|---|----|---------|---| |Kmeans iter=71|![](https://i.imgur.com/XLVrfy8.png)|![](https://i.imgur.com/K4GJKE6.png)|![](https://i.imgur.com/0CdKki8.png)|![](https://i.imgur.com/nO73dKN.gif) |Ratio iter=18|![](https://i.imgur.com/pqrx37X.png)|![](https://i.imgur.com/qC5RWM8.png)|![](https://i.imgur.com/EAiDm0s.png)|![](https://i.imgur.com/TxHjCqT.gif) |Normalized iter=20|![](https://i.imgur.com/MPSD74D.png)|![](https://i.imgur.com/BPK5lJG.png)|![](https://i.imgur.com/r9WJanr.png)|![](https://i.imgur.com/DMVIZ3Z.gif) #### Discussion * Every models can classify 2 clusters easily. * In comparison to spectral clustering, kmeans start from a more random state. * I guess this might because that kmeans randomly choose a 10000-dim point, while the kmeans step in spectral clustering choose a k-dim point. ### Part2 & Part3 #### Experiments : Try more clusters and different initialization method * Nameing rule * Ratio : Ratio + kmeans * Ratio++ : Ratio + kmeans++ * Color : * ![](https://i.imgur.com/nVhzaz3.png) * Order |Kmeans |Kmean++|Ratio|Ratio++|Normalized|Normalized++| |-|-|-|-|-|-| * ==image1==, $\gamma_s=1e-3, \gamma_c=1e-3$ * ![](https://i.imgur.com/63RbbSF.jpg) * | |Observation| |--|----------| |![](https://i.imgur.com/ytvZInk.png)|1. RatioCut can only identify few clusters in this picture. <br> 2. Except for RatioCut, all the otehr models make similar classificaitons. <br> 3. Comparing Kmeans++ with Normalized++ at k=10, Kmeans++ tends to seperate clusters more apart from each other. * |Part2|Comparison betweens k| |-|--------------------| |-----|1. Regions can be clearly identified at k<=4. <br> 2. As k increased, more regions are identified, such as plain, forest, beach, and island. * |Part3|Comparison betweens initialization methods| |--|----------| |-----|1. Kmeans initialize centers using random method, while kmeans++ initialize centers far from each other. We can see the difference especially at k=10. <br> 2. There seems no difference between 2 initialization methods, since the iteration I set(100) is large enough for convergence. * ==image2==, $\gamma_s=1e-3, \gamma_c=1e-3$ * ![](https://i.imgur.com/hrwzLcl.jpg) * | |Observation| |--|-----------| |![](https://i.imgur.com/7vGXhwL.png)|1. Frankly speaking, I can't tell signifnicant difference betweens those models.... <br> 2. The most noisy model is RatioCut with random initialization. <br> 3. k<=4 seperate objects and backgound clearly. <br> 4. k=6 seperates background in reasonable way, but when it comes to k=8, some clusters become fragmented. * |Part2|Comparison betweens k| |-----|---------------------| |-----|1. Regions can be clearly identified at k<=4. <br> 2. As k increased, more regions are identified, while the model become sensitive. <br> E.g., ambiguous parts such as the colorful(?) tree become noisy.| * |Part3|Comparison betweens initialization methods| |-----|---------------------| |-----|1. There seems no difference between 2 initialization methods, since the iteration is large enough for convergence. <br> 2. However, changing initialization method from random to kmeans++ will improve both RatioCut and NormalizedCut models a little bit.| * Other experiment : ==image2==, $\gamma_s=1e-4, \gamma_c=1e-3$ * ![](https://i.imgur.com/fxcEuD0.jpg) * | |Observation| |--|----------| |![](https://i.imgur.com/7vGXhwL.png)|1. According to the [tutorial]('https://arxiv.org/pdf/0711.0189.pdf'), ==spectral clustering can be quite sensitive to changes in the similarity graph== and to the choice of its parameters. <br> Thus, in order to identify the rabbit, I ==decreased γs==(emphasize on color) in kernal function, and finally, ==the rabbit is identified using Normalized models with k=10== <br> 2. Though the model can identify the rabbit, it's a disaster to detect the tree. #### Discussion * Part2 : What is the best cluster number? * According to the experiments, I will take k<=4 to identify backgound. * Yet, I will take k=8 when $\gamma_s=\gamma_c=1e-3$, and k=10 when $\gamma_s=1e-4, \gamma_c=1e-3$ to identify more objects. * According to the [tutorial]('https://arxiv.org/pdf/0711.0189.pdf') , eigengap may indicate the number of cuts. Thus, let's examine the eigenvalues(λs) : | |Ratio |Normalized | | -------- | -------- | -------- | |![](https://i.imgur.com/ytvZInk.png)|![](https://i.imgur.com/BnZyCaK.png)|![](https://i.imgur.com/vxz67Aw.png)| |Discussion| There are 2 λs close to 0, <br> so we conclude that the image can be classifed into 2 classes easily.| In addtion to the 2 λs, there is a gap at k=8| |![](https://i.imgur.com/7vGXhwL.png)|![](https://i.imgur.com/P4mHtPy.png)|![](https://i.imgur.com/Yd2tiag.png)| |Discussion|There is a gap at k=2.|There is a gap at k=3.| |Conclusion|k=2 |k=3| ### Part4 #### Observation * The data points within same cluster have different but close coordinates in the eigenspace of graph Laplacian. #### Experiment * image1 |K|Ratio|Normalized| |-|-----|----------| |2|![](https://i.imgur.com/k1cLPoo.png)|![](https://i.imgur.com/eMZyigP.png)| |3|![](https://i.imgur.com/eA53yg6.png)|![](https://i.imgur.com/xz3qrBx.png)| |10|![](https://i.imgur.com/vCiFRqq.png)|![](https://i.imgur.com/3duHKwI.png)| * image2 |K|Ratio|Normalized| |-|-----|----------| |2|![](https://i.imgur.com/QIit7yF.png)|![](https://i.imgur.com/FgYMB1d.png)| |3|![](https://i.imgur.com/qtmJnr7.png)|![](https://i.imgur.com/2KFnT3D.png)| |10|![](https://i.imgur.com/TwFXEQy.png)|![](https://i.imgur.com/XIqAtWd.png)| ## Observations & Discussion #### Performance * Kmeans requires more iteration and becomes slower as K increases. * Spectral clustering spends lots of time on eigendecomposition * However, if we save the eigenvalues/eigenvectors of the Laplacian matrix, it's not time-consuming. #### Observation * Based on the result of Part2&3 * Only 2 dimensions of the eigenvectors in Ratiocut are useful in most cases. * In contrast, at least 3 dimension are useful in NormalizedCut. * This may explain why RatioCut can't identify pictures with large k. * According to my experiments, NormalizedCut performs better than RatioCut on the 2 pictures. #### Problem * I encounter [this error](https://stackoverflow.com/questions/8765310/scipy-linalg-eig-return-complex-eigenvalues-for-covariance-matrix) using scipy.linalg.eig, and solved by using eigh() instead. * The first eigenvalue should be 0, but it turns out to be 1e-13~1e-15. I guess this is a floating point [issue](https://docs.python.org/3/tutorial/floatingpoint.html), so I take this number as 0 in my code. ###### tags: `Machine Learning`