School: National Chiao Tung University
Name: Shao-Wei Chiu
NCTU
, Machine Learning
, Kernel K-means
, Spectral Clustering
, GitHub
Note: This report was made on
Hackmd.io
and restricted by the.gif
animation would not display. Please view it on https://hackmd.io/@swchiu/BJwOjuc3U, thanks.
Kernel k-means is an approach to k-means algorithm, but mapping the data into higher degree dimentions.
And the mapping-function called Kernel
.
K-means algorithm is that after comparing the data similarity, we cluster the more similarity datas to the same group.
K
means that there are k number of group to cluster.
For the regular K-means, we use the following formula to compare and cluster data.
How about kernel k-means?
we transfrom the data to the higher degree, and do the same k-means algorithm on it.
Instead of update
Prepare image data for precompute gram matrix (kernel
)
def img_formater(img):
n = img.shape[0]*img.shape[1]
spatial_data = []
color_data = []
for i in range(img.shape[0]):
for j in range(img.shape[1]):
spatial_data.append([i, j])
color_data.append(img[i][j])
return np.array(spatial_data), np.array(color_data, dtype=int)
img = imageio.imread(img_path)
spatial_data, color_data = img_formater(img)
Compute gram matrix
At the first, I implement this part by for-loop which is a trivial way.
But I found that it is too time-consuming! Because there aredata points for our input data, and we need to calculate a gram matrix.
So I replace my for-loop implementation into matrix-computation for efficient.
Above formula is suitable for vector which is stroed in
But in this work, the vectors are stored in
And for the performance, I take all vectors into one matrix
So we could calculate
#Origianl trvial implementation
def rbf_img(u, v, g=0.0001):
s_dis = scipy.spatial.distance.euclidean(u[0], v[0])
c_dis = scipy.spatial.distance.euclidean(u[1], v[1])
return math.exp(-1*g*s_dis**2 - g*c_dis**2)
def gram_matrix(data, path, kernel=rbf_img):
gram = np.ones((len(data), len(data)))
for i in range(len(data)):
for j in range(i, len(data)):
gram[i][j] = kernel(data[i], data[j])
gram[j][i] = gram[i][j]
return gram
#---------------------------------------------------------------------#
#New implementation using matrix computation
def euclidean(self, u, v):
#This method is defined in kmeans.euclidean
return np.matmul(u**2, np.ones((u.shape[1],v.shape[0]))) \
-2*np.matmul(u, v.T) \
+np.matmul(np.ones((u.shape[0], v.shape[1])), (v.T)**2)
def rbf(u, v, g=10**-4):
return np.exp(-1*g*kmeans.euclidean(kmeans, u, v))
gram = rbf(spatial_data, spatial_data) \
* rbf(color_data, color_data)
Run Kernel K-means, recall the formula:
we compare for the
For each
Let dis
in the code segment ).
I want to do a matrix computation instead of
For the second term, we only need the diagonal elements, so in the python
code, I multiply use a diagonal identity matrix np.eye()
.
And there are many varies of initialization method, e.g. k-means++
which decide the initial cluster by maxmizing their distance. I would compare k-means++
and the traditional way at Result part.
def __get_init(self, method):
if method == 'kmeans++':
return self.__kmeanspp
elif method == 'default':
return self.__traditional
else:
print('ERROR: \'{}\' is not a pre-defined initialize method'.format(method))
return exit(0)
def __kmeanspp(self):
n, d = self.data.shape
centers = np.array([self.data[np.random.randint(n), :d]])
for i in range(self.k-1):
dist = self.euclidean(self.data, centers)
dist = np.min(dist, axis=1)
next_center = np.argmax(dist, axis=0)
centers = np.vstack((centers, self.data[next_center, :]))
return centers
def __traditional(self):
return np.array(self.data[np.random.choice(self.data.shape[0], size=self.k, replace=False), :])
def __kernel_trick(self, gram, ck):
c_count = np.sum(ck, axis=0)
dist = -2*np.matmul(gram, ck)/c_count + \
np.matmul(np.ones(ck.shape), (np.matmul(ck.T, np.matmul(gram, ck)))*np.eye(ck.shape[1]))/(c_count**2)
return dist
def run(self):
#initial cluster
centers = self.init()
dist = self.euclidean(self.data, centers)
ck = np.zeros((self.data.shape[0], self.k))
ck[np.arange(dist.shape[0]), np.argmin(dist, axis=1)] = 1
record = []
record.append(ck)
record_iter = 0
for i in range(self.max_iter):
#E-step
if self.is_kernel:
dist = self.__kernel_trick(self.data, ck)
else:
dist = self.euclidean(self.data, centers)
#M-step
update_ck = np.zeros(dist.shape)
update_ck[np.arange(dist.shape[0]),np.argmin(dist, axis=1)] = 1
delta_ck = np.count_nonzero(np.abs(update_ck - ck))
update_centers = np.matmul(update_ck.T, self.data)/np.sum(update_ck, axis=0, keepdims=True).T
record.append(update_ck)
if delta_ck == 0:
self.converge -= 1
if self.converge == 0:
record_iter = i+1
if self.keep_log == False:
break
ck = update_ck
centers = update_centers
return record, record_iter
Visualization
k_visual = colors.to_rgba_array(['tab:blue', \
'tab:orange', \
'tab:green', \
'tab:red', \
'tab:purple', \
'tab:brown'])
def visualizer(record, save_path, figsize=(100,100,4)):
gif = []
for i in range(len(record)):
c_id = np.argmax(record[i], axis=1)
img = np.zeros(figsize, dtype=np.uint8)
for j in range(c_id.shape[0]):
m, n = (int(j/100), int(j%100))
img[m][n] = 255*k_visual[c_id[j]]
gif.append(img)
imageio.mimsave(save_path, gif)
def merge_gifs(gifs, max_fram, id, method):
gif = []
for i in range(len(gifs)):
gif.append(imageio.get_reader('output/'+gifs[i]))
new_gif = imageio.get_writer('output/image'+str(id)+'_'+method+'.gif')
if max_fram + 5 < 100:
max_fram += 5
for frame_number in range(max_fram):
img = []
for i in range(len(gif)):
img.append(gif[i].get_next_data())
new_image = np.hstack(img)
new_gif.append_data(new_image)
for i in range(len(gif)):
gif[i].close()
new_gif.close()
image1 (k=2, 3, 4, 5, 6)
Original image:
image2 (k=2, 3, 4, 5, 6)
For the value of
The reason I thought is that the lower
Traditional v.s. K-means++
Both two image are
Traditional:
K-means++:
Traditional:
K-means++:
At first, I thought that the K-mean++ just more faster than the tradition according the number of iteration ( shown at Screen shot part ). But when the result of image2 produced, we could see that the sky area and the rabbit could be clustered in the different when k=6
!
So I believe that Kmeans++ is more powerful in this work.
Sepctral clustering groups data by similarity graph which could be computed via kernel function.
In the similarity graph, if two data are more similarity, then their distance ( i.e., weight of edge ) is larger. For example, if we compute similarity via Gaussian
distribution ( i.e., rbf
kernel ), the weight of edge illustrate that how likely are the two points.
But how do we group the data through similarity graph?
From the rich theorm supported, we could use Graph Lapacian
matrix, Rayleigh quotient
, and minimize the cut cost form similarity graph.
The cut cost is defined as follow:
But this might result in the unbalance numbers of cluster
And the name of spetral clustering are also differrent, one is called Unnormalize
another is called Normalize
.
The different not only relate to the cut categories, but also the Graph Lapacian
matrix ( denote as
Then we could compute the eigenvector and perform k-means
algorithm on matrix
Cause that computing the eigenvalues and eigenvectors from a
similarity matrix are too time consumimg, I only perform the result of k = 3
andk = 4
.
Prepare image data for precompute similarity matrix (kernel
) and compute similarity matrix.
This step is the same to
Kernel K-means
Step 1 and Step 2 two part, so I just ignore here.
Compute eigenvalues and eigenvectors and sort the eigenvectors according to the first k
eigenvalues.
class spectral_clustering():
def __init__(self, data_similarity , \
k=2, \
normalize=False, \
keep_log=False):
self.k = k
self.W = data_similarity
self.D = np.sum(data_similarity, axis=1, keepdims=True) * np.eye(data_similarity.shape[0])
self.normalize = normalize
self.keep_log = keep_log
def __eig(self, A):
if self.normalize:
sqrt_D = np.sqrt(self.D)
neg_sqrt_D = np.linalg.inv(sqrt_D)
N = np.matmul(np.matmul(neg_sqrt_D, A), sqrt_D)
eigenvals, eigenvecs = np.linalg.eig(N)
eigenvecs = np.matmul(neg_sqrt_D, eigenvecs.real)
return eigenvals, eigenvecs
else:
return np.linalg.eig(A)
def __get_sorted_k_eigen(self, A, k):
eigenvalues, eigenvectors = self.__eig(A)
sorted_idx = np.argsort(eigenvalues)
sorted_eigenvalues = []
sorted_eigenvectors = []
for i in range(k):
vector = eigenvectors[:, sorted_idx[i]]
sorted_eigenvectors.append(vector[:, None])
sorted_eigenvalues.append(eigenvalues[sorted_idx[i]])
sorted_eigenvalues = np.array(sorted_eigenvalues)
sorted_eigenvectors = np.concatenate(sorted_eigenvectors, axis=1)
return sorted_eigenvalues, sorted_eigenvectors
def run(self):
self.L = self.D - self.W
k_eigenvalues, k_eigenvectors = self.__get_sorted_k_eigen(self.L, self.k)
km = kmeans(k_eigenvectors, k=self.k, keep_log=self.keep_log)
return km.run(), k_eigenvalues, k_eigenvectors
Perform K-means
algorithm.
The
K-means
algorithm and method is discussed above, I just call the same python-code I written.
def run(self):
self.L = self.D - self.W
k_eigenvalues, k_eigenvectors = self.__get_sorted_k_eigen(self.L, self.k)
km = kmeans(k_eigenvectors, k=self.k, keep_log=self.keep_log)
return km.run(), k_eigenvalues, k_eigenvectors
Examine whether the data points within the same cluster do have the same coordinates in the eigenspace of graph Laplacian or not.
def reorder_by_cluster(c):
new_order = np.array([])
for i in range(c.shape[1]):
new_order = np.append(new_order, np.where(c[:,i]==1)[0])
new_order = new_order.astype('int32')
return new_order
def show_vectors_by_clusters(vectors, clusters, fig_path):
reorder_idx = reorder_by_cluster(clusters)
num_cluster = np.sum(clusters, axis=0, dtype=int)
iter_idx = 0
for k in range(clusters.shape[1]):
plt.subplot(1, clusters.shape[1], k+1)
for j in range(num_cluster[k]):
plt.plot(vectors[reorder_idx[iter_idx], :])
plt.title('cluster '+str(k+1))
iter_idx += 1
plt.savefig(fig_path)
plt.show(block = False)
plt.pause(1)
plt.close()
Visualize the result of spectral clusering.
This step is also the same to
Kernel K-means
Step 4 part.
The output log is too long as
Kernel K-means
, so I just upload part of the output.
image1 ( k = 4, 5 )
image2 ( k = 4, 5 )
Normalize:
eigenvectors:
Unnormalize:
eigenvectors:
According to the result shown above, we could not say that the data points within the same cluster do have the same coordinates in the eigenspace of graph Laplacian
matrix K-means
would group the similar vector to the same group.
Due to the result, the unormalize
seems to perform better than normalize
.