---
title: Final Project
layout: 'post'
geometry: margin=2cm
tags: project
---
# Final Project
### Due: December 12, 2022 at 10 pm
# Overview
The final project in CS 100 is an opportunity to apply any and all of the skills you have learned during the course of the semester (e.g., statistics, machine learning, visualization) to tell a compelling story, backed by a data set of your choosing. You should pick a topic you are genuinely interested in, and should aim to produce something you are proud of. Working on this project should be fun, engaging, and rewarding.
## Groups
You should complete this project in a group of either 2 or 3. Groups of size 2 will only be required to use one data set, but groups of size 3 will need to use *at least* two related data sets to tell a compelling *and cohesive* story.
There is a thread on [EdStem](https://edstem.org/us/courses/28268/discussion/2097157) dedicated to searching for partners. Note that the project proposals are due on Tuesday, November 22, so if you don't already have a partner lined up, you should begin looking for one soon.
## Project Components
Your project will consist of three components: data, model and analysis, and visualization. We describe our expectations for each in turn.
### Data
The freedom in searching for data is as liberating as it can be time-consuming. Our advice is to sit down with your group, take a look at a few interesting data sets, and discuss what you might be able to do with them. One (long) list of potential data sources is available [here](https://www.data-is-plural.com/), but there are many others to choose from as well. Be sure to choose a sizable enough data set—typically one with hundreds of observations and tens of features. On the other hand, don’t choose a data set that is too large. CS 100 is not a *big* data class; neither you nor R are particularly well equipped to handle big data.
### Model and Analysis
Your initial study of your data should be exploratory and visual, as in the mini-project, and should assist you in formulating hypotheses about your data.
Your project should then include a more in-depth analysis than an EDA, in which you demonstrate proficiency modeling your data, and applying some of the analysis techniques (e.g., statistical inference and machine learning) you were exposed to over the course of the semester. There is no specific minimum number of techniques we are looking for you to apply in your project, but we expect your analysis to exhibit some degree of depth, meaning you should critique your initial models, and refine them as necessary.
In designing your models and conducting your analysis, you should not feel constrained by what we have covered in this class. Feel free to incorporate your own domain knowledge.
### Visualization
In addition to your model and analysis, an equally important part of your project is visualization: i.e., how you present your findings. Again, you should utilize the tools taught in class but should not feel constrained by them. We encourage you to use any visualizations (e.g., word clouds) supported by R libraries. Feel free to use the web to learn about various libraries.
## Mentor TA
Soon after you hand in your project proposal, you will be matched with a TA, who will act as your mentor during your project. Their goal is to help you surmount any hurdles you encounter along the way. Once assigned, feel free to reach out directly to your assigned TA, go to their office hours, etc.
Note that your mentor will *not* be grading you. So, your final project should be written in such a way as to be understood by someone who does not have any context for your work.
## Project Road Map
Once you have identified your group and data set(s), we suggest that you organize your project ideas into a list of goals and subgoals, along with corresponding estimates of how long you expect each task to take.
Here’s a suggested rule of thumb: *double your estimates*. Not only are you more likely to generate realistic estimates that way, but it’s always better to overestimate time required than underestimate. It’s likely that none of us would object to finding ourselves with a bit of unexpected free time.
### Project Proposal
Your first milestone is your project proposal. In your proposal, you should outline your plans for your project. Articulating your plans will allow both you and the course staff to understand what it is you intend to achieve, and how you will achieve it.
Concretely, please ensure that your proposal covers the following:
- The names of the members of your group
- A preliminary title for your project (for organizational purposes only)
- The topic and data set(s) that you will be using, including the source for your data set
- A driving question, or set of questions, that you plan to investigate based on your data set(s), as well as one or more hypotheses
- A rough description of a basic model, and the sorts of analyses you are imagining carrying out
- Some ideas for visualizations
- A list of tasks that (at first glance) seem necessary to conduct the analysis and create the visualizations
- A backup plan (or two) if your original ideas do not pan out
There is no need to have built a model or performed any analysis when you hand in your proposal, but ideally, you would do some preliminary (e.g., visual) analyses to help you outline your goals for a more in-depth analysis.
##### Deliverables
You should list three sets of deliverables in your proposal:
- 75%: This is the core of your project, the minimum set of deliverables that will enable you to say that you accomplished what you set out to do.
- 100%: This is what you hope to do for a good, solid project.
- 125%: These are stretch goals, what you hope to someday be able to do (and might complete over winter break).
##### Backup plan
In your project proposal, you should describe a backup plan, in case your original idea falls through. *This step is very important.* It may turn out that you are unable to achieve what you set out to do, and that is perfectly acceptable. We are not looking for a publication; a negative result is just as good as a positive result. However, it is not acceptable to simply say "our idea didn’t pan out, so we can’t present anything." If you run into issues, you will need to either work around them, or switch to "safety mode". *Aim for success, but plan for failure.*
### Handin
Ultimately, you’ll be turning in a write up that summarizes all the work that you have done for this project. You should create this write up in R Markdown, and be sure that it renders properly in HTML, so you can show it off to your friends and family!
In addition to the write up, you will also need to hand in all your code and your data set. Your code should include at least one runnable R script or Markdown file, from which we should be able to recreate all your results, including any exploratory data analyses you performed and visualizations you created.
You will present your work to TAs for interactive grading. These presentations should be about 8-10 minutes.
Finally, you and your partner(s) should prepare a 4-5 minute presentation in which you summarize your project. (Groups of 3 should prepare 7-8 minute presentations.) You should use google slides or similar as visual aids. Be sure the audience can walk away with 1-3 key takeaway messages after watching your presentation. Gradescope does not allow powerpoint uploads, so instead you will upload a PDF that contains a link to your presentation in Google slides.
### Example Projects
Here are some student projects from 2017:
[Trending videos on youtube](http://cs.brown.edu/courses/cs100/students/project7/)
[Asian data disaggregation](http://cs.brown.edu/courses/cs100/students/project1/)
[Speed dating](http://cs.brown.edu/courses/cs100/students/project3/)
Here are some student projects from 2019:
[2000s Billboard Top 100](https://cs.brown.edu/courses/cs100/projects/billboard_hot_100.pdf)
[Kickstarter Campaign Success](https://cs.brown.edu/courses/cs100/projects/kickstarter_success.html)
## Tentative Rubric
<table class="table">
<thead>
<tr>
<th>Milestone</th>
<th>Description</th>
<th>Percentage</th>
</tr>
</thead>
<tbody>
<tr>
<td>Overall work</td>
<td>A measure of the overall effort put into the project and the quality of the work.</td>
<td>20%</td>
</tr>
<tr>
<td>Data and preparation</td>
<td>An evaluation of the data set chosen and the steps taken to prepare it.</td>
<td>5%</td>
</tr>
<tr>
<td>Clearly stated hypotheses</td>
<td>How clearly the driving questions and hypotheses are stated, as well as how well suited your choice of data set is to testing these hypotheses.</td>
<td>10%</td>
</tr>
<tr>
<td>Methodologies</td>
<td>An evaluation of the model and the analysis, based on how well suited they are to testing your hypotheses.</td>
<td>15%</td>
</tr>
<tr>
<td>Execution</td>
<td>How well the methodologies are executed.</td>
<td>30%</td>
</tr>
<tr>
<td>Presentation</td>
<td>How well results are presented, including visualizations. Negative results are acceptable, but all modelling assumptions and conclusions must be fully explained.</td>
<td>20%</td>
</tr>
</tbody>
</table>
## Project Timeline
<table class="table">
<thead>
<tr>
<th>Due Date</th>
<th>Milestone</th>
</tr>
</thread>
<tbody>
<tr>
<td>Tuesday, November 22 (10 pm)</td>
<td>Project Proposals Due</td>
</tr>
<tr>
<td>Friday, December 2 (10 pm)</td>
<td>Meet with Mentor TA for the First Time</td>
</tr>
<tr>
<td>Friday, December 9 (10 pm)</td>
<td>Meet with Mentor TA for the Second Time</td>
</tr>
<tr>
<td>Monday, December 12 (10 pm)</td>
<td>Final Projects Due on Gradescope</td>
</tr>
<tr>
<td>Tuesday, December 13 to Thursday, December 15</td>
<td>Interactive Grading</td>
</tr>
<tr>
<td>Friday, December 15 (10 pm)</td>
<td>Final Project Presentations due on Gradescope</td>
</tr>
<tr>
<td>Friday, December 16 (9 am - 12 pm)</td>
<td>Final Project Presentations</td>
</tr>
<tr>
<td>Friday, December 16 (5 pm)</td>
<td>(Optional) Final Projects with Revisions Due on Gradescope</td>
</tr>
</tbody>
</table>
Your first project-related deadline is Tuesday, November 22, by which time you should have created the <a href="#Project-Proposal">project proposal</a>. Please note that only one person from each group needs to submit on Gradescope.
We will use this information to assign you a mentor TA to guide you throughout the project. Your entire group must meet with your mentor TA twice—once by 10 pm on Friday, December 2 and again by 10 pm on Friday, December 9. They will help you clean your data, advise you on your model and analysis, and critique your visualizations. The more you seek their help—the more you seek any outside help really (i.e., the more you bounce your ideas off of other people)—the better your project will be.
While we only require you to meet with your mentor TA twice, feel free to visit them during their office hours as often as you like!
You must submit your final projects on Gradescope by 10 pm on Monday, December 12. **Please note that late submissions will not be accepted; likewise, you may not use any late days on this project.**
Between Tuesday, December 13 and Thursday, December 15, you will sign up for interactive grading. Your projects will be graded by two TAs other than your mentor TA. Up to half of any points lost during your final grading may be earned back, if you correct the issues in time for your final presentation.
Finally, you must appear in person to present your final project during our final exam slot, namely Friday morning, December 16, from 9 am to 12pm.