CS1951A - HackMD

Assignment 6: Data Visualization
Meme about fancy data visualization Overview This assignment consists of two parts, one where you create various visualizations on two datasets and the other focusing on ethical questions around data visualization. Your work for both parts should go into writeup.md. Part 0: Set Up Getting the stencil You can click this link to get the stencil for this assignment. Important: Please view Appendix A for info regarding the structure of the stencil code.
lorenzods changed 2 months agoView mode Like Bookmark
Assignment 5: Machine Learning
Overview In an alternate universe, boomers have decided to explore machine learning because they are VERY tech-savvy and willing to learn. In this assignment, you will help them accomplish various tasks to stay one step ahead of their mortal enemies: Zoomers. The goal of this assignment is to introduce basic machine learning concepts and provide a foundation for how to cluster data. In this assignment we will begin to explore the power data has in informing machine learning decisions. The boomers require an effective way to manipulate data attributes. You will use the knowledge gained to help these boomers implement a K-means clustering algorithm. You will run this algorithm on two different datasets and implement sklearn's clustering algorithm on both data sets. Finally, you will make an elbow curve plot to determine the optimal number of clusters. Part 0: Set Up Getting the Stencil You can click here to get the stencil code for this homework. Reference this guide for more information about Github and Github Classroom.
lorenzods changed 2 months agoView mode Like Bookmark
Assignment 4: Statistics
Statistics meme DUE DATE: 03/21/2025 Overview This assignment consists of two parts, one on statistical tests and another on regression models. Accept the handout, complete the TODOs in the files regression.py,stats_tests.py and run_tests.py and write answers to the written questions in writeup.md. When submitting your files on Gradescope, the autograder will run for regression.py and stats_tests.py. To ensure compatibility with the autograder, you should not modify the stencil unless instructed otherwise. For this assignment, please write your solutions in the respective .py files and writeup.md. Failing to do so may hinder with the autograder and result in a low grade. Part 1: Regression
Mason Zhang changed 3 months agoView mode Like Bookmark
Untitled
lorenzods changed 3 months agoView mode Like Bookmark
Assignment 3: MapReduce
Out: February 26, 2025 Due: March 12, 2025 In this assignment, you will be designing and implementing MapReduce algorithms for a variety of common data processing tasks. Note that you may not use Spark functions such as distinct or join, as these would allow you to bypass much of the assignment. The purpose for this assignment is for you to gain a better understanding of how these functions perform under the hood. Unless otherwise indicated, please only use map, flatMap, reduceByKey, sortByKey, and filter. In Part 0 of this assignment, you will set up PySpark using one of the following options: Windows, MacOS, or Department Machine. In Part 1 of this assignment, you will solve two simple problems on small datasets. You will build the MapReduce pipelines and implement your mappers and reducers. In Part 2 of this assignment, you will implement a movie recommendation system. Part of the MapReduce pipeline is provided. You will design the remaining part. And you will also need to implement the mappers and reducers. There are two datasets in Part 2: small and big. For both datasets, you can directly run your program on the department machine or on your own device with PySpark set up.
SasMaj changed 3 months agoView mode Like Bookmark
Assignment 2: Webscraping
![](https://i.imgur.com/rThSnIi.png =500x) Out: February 12, 2025 Due: February 26, 2025 Overview For this assignment, you'll collect some stock data. We'll make use of investing.com to collect information on the most active stocks in the market, through web scraping. We'll supplement this with historical data about these stocks gathered through API requests. Part 0: Set Up You'll then be responsible for cleaning the data, creating a database from it, and analyzing stocks by querying your database.
lorenzods changed 4 months agoView mode Like Bookmark
Assignment 1: SQL
Twitter meme about database Overview You can click here to get the stencil code for Homework 1. Reference this guide for more information about Github and Github Classroom. The data is located in the /data folder. To ensure compatibility with the autograder, you should not modify the stencil unless instructed otherwise. For this assignment, please write each of your queries to its corresponding SQL file. Failing to do so may break the autograder and result in a low grade. Part 0: Setup Python Virtual Environment Option 1: Department Machine
Mason Zhang changed 4 months agoView mode Like Bookmark