# October 15, 2020 Last day of RNA seq in julia Practical Ten -use kmeans or hclust (however many clusters you want) -Distance matrix from practical 4, use MDS from Multivarate Stats on this matrix -Use same matrix with clustering, color according to cluster --- https://juliastats.org/Clustering.jl/stable/kmeans.html add Clustering package ```julia= using Clustering X = rand(5,1000) R = kmeans(X, 20; maxiter=200, display=:iter) @assert nclusters(R) == 20 # verify the number of clusters a = assignments(R) # get the assignments of points to clusters c = counts(R) # get the cluster sizes M = R.centers # get the cluster centers #try with iris dataset using RDatasets, Clustering, Plots iris = dataset("datasets", "iris"); # load the data features = collect(Matrix(iris[:, 1:4])'); # features to use for clustering result = kmeans(features, 3); # run K-means for the 3 clusters # plot with the point color mapped to the assigned cluster index #CRC disabled display functions, so the below code won't work on the CRC scatter(iris.PetalLength, iris.PetalWidth, marker_z=result.assignments, color=:lightrainbow, legend=false) ``` ## hclust ```julia= D = rand(1000, 1000); D += D'; # symmetric distance matrix (optional) result = hclust(D, linkage=:single) cutree(result,k=3) #plotting #add StatsPlots package using StatsPlots plot(result) ```