# October 15, 2020
Last day of RNA seq in julia
Practical Ten
-use kmeans or hclust (however many clusters you want)
-Distance matrix from practical 4, use MDS from Multivarate Stats on this matrix
-Use same matrix with clustering, color according to cluster
---
https://juliastats.org/Clustering.jl/stable/kmeans.html
add Clustering package
```julia=
using Clustering
X = rand(5,1000)
R = kmeans(X, 20; maxiter=200, display=:iter)
@assert nclusters(R) == 20 # verify the number of clusters
a = assignments(R) # get the assignments of points to clusters
c = counts(R) # get the cluster sizes
M = R.centers # get the cluster centers
#try with iris dataset
using RDatasets, Clustering, Plots
iris = dataset("datasets", "iris"); # load the data
features = collect(Matrix(iris[:, 1:4])'); # features to use for clustering
result = kmeans(features, 3); # run K-means for the 3 clusters
# plot with the point color mapped to the assigned cluster index
#CRC disabled display functions, so the below code won't work on the CRC
scatter(iris.PetalLength, iris.PetalWidth, marker_z=result.assignments,
color=:lightrainbow, legend=false)
```
## hclust
```julia=
D = rand(1000, 1000);
D += D'; # symmetric distance matrix (optional)
result = hclust(D, linkage=:single)
cutree(result,k=3)
#plotting
#add StatsPlots package
using StatsPlots
plot(result)
```