rkunani - HackMD

Decision Trees and Random Forests
The goal of this guide is to introduce the high level concepts behind decision trees and random forests. :::info This guide is NOT a replacement for lecture. It is meant to be a supplement to lecture. ::: Motivating Decision Trees Nonlinear Decision Boundaries The ++decision boundary++ of a classifier is the "line" at which a classifier changes its prediction from one class to another.
Editorial Team changed 2 years agoView mode Like 1 Bookmark
Feature Engineering
This guide aims to motivate and introduce the basic concepts of feature engineering. This guide is not comprehensive; it is meant to supplement lecture, not replace it. What is a feature? A feature is an attribute of the data that a model uses to make predictions. For example, a person's age might be a feature in a model that predicts salary. In a table of data, the features are the columns of the table. Mathematically, when we propose a model $$ \hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + ... + \theta_p x_p$$ the $x_1 , ... , x_p$ are the features of the model.
rkunani changed 5 years agoView mode Like Bookmark
Hypothesis Testing
This guide aims to give a high level overview of the concepts behind hypothesis testing and walk through an example hypothesis test. At the end, I give some tips for how to approach Problem 3 on Homework 7. Motivation for Hypothesis Testing I think an understanding of what a hypothesis test is includes an understanding of the scenarios in which hypothesis tests are used. Suppose you're in a class of 20 students in which all the students have not studied at all for the upcoming multiple choice midterm. Thus, every student guesses randomly on every question. After the exam, you learn that 16 of the students scored higher than 50%. Naturally, you think that this number is pretty high if everyone truly guessed randomly. In this scenario, you would run a hypothesis test which tests the hypothesis that everyone guessed randomly. The goal of the hypothesis test is to obtain a measurement that captures how likely it is to see a number as extreme as 16. Our go-to tool for measuring likelihoods is probability, so our hypothesis test will output a probability that the number of students scoring above 50% is as extreme as 16. We call this probability the $p$ value of the hypothesis test.
rkunani changed 5 years agoView mode Like Bookmark