# AI Crash Course note ###### tags: `Machine Learning` ## Introduction to Machine Learning * Learning Objectives * Recognize the practical benefits of mastering machine learning * Understand the philosophy behind machine learning * A tool to reduce the time you spent to programming * it don't need to use lots examples and rule with logic * machine learning can get a more reliable program in a small fraction of the time * it can simple customize the products * suppose someone produced english spelling corrector by writing code by hand, and it need the different language version * if use ml it just collecting data in that language and feeding it into exact same machine leanrning model * it can do the problem for who have no idea to do by hand * As a human being, people can recognize the friends, but it is completely buffled for programming by hand * the ml algorithms do very well with these problems ## Framing * Learning Objectives * Refresh the fundamental machine learning terms. * Explore various uses of machine learning * What is (Supervised) Machine Learning? * ML systems learn how to combine input to produce useful predictions on never-before-seen data * Label is the variable we're predicting * Typically represented by the variable y * Features are input variables describing our data * Typically represented by the variable $\{X_1, X_2, ..., X_n\}$ * Terminology: Examples and Models * **Example** is a particular instance of data x * **Labeled example** has {features, label}:(x, y) * Used to train the model * **Unlabeled example** has {features, ?}:(x, ?) * Used for making predictions on new data * **Model** maps examples to predicted labels: y' * Defined by internal parameters, which are learned * [Framing: Key ML Terminology](https://developers.google.com/machine-learning/crash-course/framing/ml-terminology) * [Framing: Check Your Understanding](https://developers.google.com/machine-learning/crash-course/framing/check-your-understanding) ### Regression vs. classification A regression model predicts continuous values. For example, regression models make predictions that answer questions like the following: * What is the value of a house in California? * What is the probability that a user will click on this ad? A classification model predicts discrete values. For example, classification models make predictions that answer questions like the following: * Is a given email message spam or not spam? * Is this an image of a dog, a cat, or a hamster? ## Descending into ML Linear regression is a method for finding the straight line or hyperplane that best fits a set of points. This module explores linear regression intuitively before laying the groundwork for a machine learning approach to linear regression. **Learning Objectives** * Refresh your memory on line fitting. * Relate weights and biases in machine learning to slope and offset in line fitting. * Understand "loss" in general and squared loss in particular [Descending into ML: Linear Regression](https://developers.google.com/machine-learning/crash-course/descending-into-ml/linear-regression) [Descending into ML: Training and Loss](https://developers.google.com/machine-learning/crash-course/descending-into-ml/training-and-loss) ## Reducing Loss To train a model, we need a good way to reduce the model’s loss. An iterative approach is one widely used method for reducing loss, and is as easy and efficient as walking down a hill. **Learning Objectives** * Discover how to train a model using an iterative approach. * Understand full gradient descent and some variants, including: * mini-batch gradient descent * stochastic gradient descent * Experiment with learning rate. **Weight Initialization** * For convex problems, weights can start anywhere (say, all 0s) * Convex: think of a bowl shape * Just one minimum * Foreshadowing: not true for neural nets * Non-convex: think of an egg crate * More than one minimum * Strong dependency on initial values **SGD & Mini-Batch Gradient Descent** * Could compute gradient over entire data set on each step, but this turns out to be unnecessary * Computing gradient on small data samples works well * On every step, get a new random sample * Stochastic Gradient Descent: one example at a time * Mini-Batch Gradient Descent: batches of 10-1000 * Loss & gradients are averaged over the batch * 這個是折衷的方案,不像 SGD 一次只用一個,也不是使用整個數據集,而是一次 10-1000 batches **Reducing Loss: Stochastic Gradient Descent** batch: 迭代一次所使用的資料 在 Google ,Data sets 常常有數以億計的資料,且有著大量的特徵,每個 batch 都很大,導致一次的迭代都需要花非常大量的時間計算ㄍ 大的 Data Sets 裡面常常有多餘的資料,多餘的資料導致 batch size 變大,一些冗余可能有助於消除雜亂的梯度,但超大的 batch 所得到的預測值不一定比大 batch 好。 通過在數據集中隨機選取樣本,可以通過小的 data sets 得出較大的平均值,而 SGD 每次迭代只使用一個樣本,但過程會非常雜亂。 ## 使用 TensorFlow 的起始步骤 (First Steps with TensorFlow):工具包 tf.estimator API 我们将使用 tf.estimator 来完成机器学习速成课程中的大部分练习。您在练习中所做的一切都可以在较低级别(原始)的 TensorFlow 中完成,但使用 tf.estimator 会大大减少代码行数。 tf.estimator 与 scikit-learn API 兼容。 scikit-learn 是极其热门的 Python 开放源代码机器学习库,拥有超过 10 万名用户,其中包括许多 Google 员工。 ## 使用 TensorFlow 的起始步骤 (First Steps with TensorFlow):编程练习 * [pandas](https://colab.research.google.com/notebooks/mlcc/intro_to_pandas.ipynb?utm_source=mlcc&utm_campaign=colab-external&utm_medium=referral&utm_content=pandas-colab&hl=zh-cn) ## Generalization: Peril of Overfitting 損失很低,但仍然是糟糕的模型? ![](https://i.imgur.com/R3unQaj.png) * 以下三项基本假设阐明了泛化: * 我们从分布中随机抽取独立同分布 (i.i.d) 的样本。换言之,样本之间不会互相影响。(另一种解释:i.i.d. 是表示变量随机性的一种方式)。 * 分布是平稳的;即分布在数据集内不会发生变化。 * 我们从同一分布的数据划分中抽取样本。 * 在实践中,我们有时会违背这些假设。例如: * 想象有一个选择要展示的广告的模型。如果该模型在某种程度上根据用户以前看过的广告选择广告,则会违背 i.i.d. 假设。 * 想象有一个包含一年零售信息的数据集。用户的购买行为会出现季节性变化,这会违反平稳性。 如果违背了上述三项基本假设中的任何一项,那么我们就必须密切注意指标。