# AI Crash Course note
###### tags: `Machine Learning`
## Introduction to Machine Learning
* Learning Objectives
* Recognize the practical benefits of mastering machine learning
* Understand the philosophy behind machine learning
* A tool to reduce the time you spent to programming
* it don't need to use lots examples and rule with logic
* machine learning can get a more reliable program in a small fraction of the time
* it can simple customize the products
* suppose someone produced english spelling corrector by writing code by hand, and it need the different language version
* if use ml it just collecting data in that language and feeding it into exact same machine leanrning model
* it can do the problem for who have no idea to do by hand
* As a human being, people can recognize the friends, but it is completely buffled for programming by hand
* the ml algorithms do very well with these problems
## Framing
* Learning Objectives
* Refresh the fundamental machine learning terms.
* Explore various uses of machine learning
* What is (Supervised) Machine Learning?
* ML systems learn how to combine input to produce useful predictions on never-before-seen data
* Label is the variable we're predicting
* Typically represented by the variable y
* Features are input variables describing our data
* Typically represented by the variable $\{X_1, X_2, ..., X_n\}$
* Terminology: Examples and Models
* **Example** is a particular instance of data x
* **Labeled example** has {features, label}:(x, y)
* Used to train the model
* **Unlabeled example** has {features, ?}:(x, ?)
* Used for making predictions on new data
* **Model** maps examples to predicted labels: y'
* Defined by internal parameters, which are learned
* [Framing: Key ML Terminology](https://developers.google.com/machine-learning/crash-course/framing/ml-terminology)
* [Framing: Check Your Understanding](https://developers.google.com/machine-learning/crash-course/framing/check-your-understanding)
### Regression vs. classification
A regression model predicts continuous values. For example, regression models make predictions that answer questions like the following:
* What is the value of a house in California?
* What is the probability that a user will click on this ad?
A classification model predicts discrete values. For example, classification models make predictions that answer questions like the following:
* Is a given email message spam or not spam?
* Is this an image of a dog, a cat, or a hamster?
## Descending into ML
Linear regression is a method for finding the straight line or hyperplane that best fits a set of points. This module explores linear regression intuitively before laying the groundwork for a machine learning approach to linear regression.
**Learning Objectives**
* Refresh your memory on line fitting.
* Relate weights and biases in machine learning to slope and offset in line fitting.
* Understand "loss" in general and squared loss in particular
[Descending into ML: Linear Regression](https://developers.google.com/machine-learning/crash-course/descending-into-ml/linear-regression)
[Descending into ML: Training and Loss](https://developers.google.com/machine-learning/crash-course/descending-into-ml/training-and-loss)
## Reducing Loss
To train a model, we need a good way to reduce the model’s loss. An iterative approach is one widely used method for reducing loss, and is as easy and efficient as walking down a hill.
**Learning Objectives**
* Discover how to train a model using an iterative approach.
* Understand full gradient descent and some variants, including:
* mini-batch gradient descent
* stochastic gradient descent
* Experiment with learning rate.
**Weight Initialization**
* For convex problems, weights can start anywhere (say, all 0s)
* Convex: think of a bowl shape
* Just one minimum
* Foreshadowing: not true for neural nets
* Non-convex: think of an egg crate
* More than one minimum
* Strong dependency on initial values
**SGD & Mini-Batch Gradient Descent**
* Could compute gradient over entire data set on each step, but this turns out to be unnecessary
* Computing gradient on small data samples works well
* On every step, get a new random sample
* Stochastic Gradient Descent: one example at a time
* Mini-Batch Gradient Descent: batches of 10-1000
* Loss & gradients are averaged over the batch
* 這個是折衷的方案,不像 SGD 一次只用一個,也不是使用整個數據集,而是一次 10-1000 batches
**Reducing Loss: Stochastic Gradient Descent**
batch: 迭代一次所使用的資料
在 Google ,Data sets 常常有數以億計的資料,且有著大量的特徵,每個 batch 都很大,導致一次的迭代都需要花非常大量的時間計算ㄍ
大的 Data Sets 裡面常常有多餘的資料,多餘的資料導致 batch size 變大,一些冗余可能有助於消除雜亂的梯度,但超大的 batch 所得到的預測值不一定比大 batch 好。
通過在數據集中隨機選取樣本,可以通過小的 data sets 得出較大的平均值,而 SGD 每次迭代只使用一個樣本,但過程會非常雜亂。
## 使用 TensorFlow 的起始步骤 (First Steps with TensorFlow):工具包
tf.estimator API
我们将使用 tf.estimator 来完成机器学习速成课程中的大部分练习。您在练习中所做的一切都可以在较低级别(原始)的 TensorFlow 中完成,但使用 tf.estimator 会大大减少代码行数。
tf.estimator 与 scikit-learn API 兼容。 scikit-learn 是极其热门的 Python 开放源代码机器学习库,拥有超过 10 万名用户,其中包括许多 Google 员工。
## 使用 TensorFlow 的起始步骤 (First Steps with TensorFlow):编程练习
* [pandas](https://colab.research.google.com/notebooks/mlcc/intro_to_pandas.ipynb?utm_source=mlcc&utm_campaign=colab-external&utm_medium=referral&utm_content=pandas-colab&hl=zh-cn)
## Generalization: Peril of Overfitting
損失很低,但仍然是糟糕的模型?

* 以下三项基本假设阐明了泛化:
* 我们从分布中随机抽取独立同分布 (i.i.d) 的样本。换言之,样本之间不会互相影响。(另一种解释:i.i.d. 是表示变量随机性的一种方式)。
* 分布是平稳的;即分布在数据集内不会发生变化。
* 我们从同一分布的数据划分中抽取样本。
* 在实践中,我们有时会违背这些假设。例如:
* 想象有一个选择要展示的广告的模型。如果该模型在某种程度上根据用户以前看过的广告选择广告,则会违背 i.i.d. 假设。
* 想象有一个包含一年零售信息的数据集。用户的购买行为会出现季节性变化,这会违反平稳性。
如果违背了上述三项基本假设中的任何一项,那么我们就必须密切注意指标。