Presentation Script

# Presentation Script Hi I'm Kahvi with Nolan, Jayden and Ahnaf. We worked on an IMDb Rating prediction system. ## Intro Imagine you're a studio executive and you'd like to know how much people will love or hate your Dark knight Sequel. Using our predictor, just submit the movie's attributes (like budget, duration, actor) and it will provide you with an audience rating, before the movie is ever released.  This idea was based on the Internet Movie Database (IMDb) which contains over 85,000 movies from 1906 to 2020. Our label vector (the audience ratings) contained values from 0 to 10 at intervals of 0.1. To measure our results, we used Mean Absolute Error, or MAE. For example, an model with an MAE of 1.5 means that its predicted rating will be in the range of +-1.5 of the actual audience's rating. ## Models Off the bat, Nolan constructed a simple 3 layered regressional neural network that achieved an *average* MAE of 0.74. It used 3 layers and Trenn's formula to give the optimal number of neurons in the hidden layer. The *best* MAE it achieved was 0.72 after validation. Nolan also converted this model into a classifer by spreading the rating range over 100 classes. This gave low accuracy, overfitting and flucuations in results (as you can see in this plot). Ahnaf worked on a deep learning model. It consisted of 5 activation layers that were decided through the use of extensive cross validation. But, likewise it couldn't break the 0.7 barrier we'd started with. Jayden refactored his Assignment 1 code to write a decision tree that supported multiple classes instead of just true/false values. He achieved an MAE of 0.95 using only three columns  Jayden also used sklearn to construct a random forest model. In total, he tested over 500 combinations of features using 50 to 100 trees. This gave a similar final MAE of 0.96. Still above what we started. ## Conclusion Add the end of this, we've arrived at a few conclusion: First, training a neural network or decision tree on so many categorical columns is difficult. In the dataset, columns like `writers` contained thousands of unique names that made constructing a decision tree infeasible. And given that *most* of the columns in the dataset were categorical, it made training a neural network difficult. Even with encoding, the categorical columns didn't make a significant difference in the resulting MAE. Second, this prediction is fundamentally about human behaviour. The audience rating of a movie depends as much on people's emotions and psychology as it does on the movie's actual attributes. People are difficult to predict. For future development, we would like to focus on different ways to encode categorical columns. Techniques like binary encoding or hash encoding. Thanks for listening.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.