Project work submission

--- title: Project work submission tags: report description: Introductoin to Big Data Science. --- # Project work submission ## Team - Takuya Sukegawa (s1260220) - Shihomi Hashimoto (m5251107) - Aoshi Suzuki (s1260241) ## Purpose Implement Hadoop/Spark-based Kmeans/Kmeans++ algorithms in PAMI. ## Source - kmeans.py - answer.txt - README.md (This file) ## Run ``` python3 kmeans.py ``` ## Result - The row and its cluster ID number <br> ![](https://i.imgur.com/ELq5xUB.png) - Cluster centers <br> ![](https://i.imgur.com/TIFjgdC.png) ## Reference - https://spark.apache.org/docs/latest/ml-clustering.html (MLib) - https://spark.apache.org/docs/latest/mllib-clustering.html (using RDD) - https://rsandstroem.github.io/sparkkmeans.html - https://github.com/seraogianluca/k-means-mapreduce - https://blog.imind.jp/entry/2019/09/14/141742 - https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.clustering.KMeans.html