Learn More →
feel free to collaborate on this presentation
.gitignore
Software deployment is all of the activities that make a software system available for use.
notebooks are just for exploration
further reading:
cons
Data visualization is a key part of communicating your work to others
A color can be defined using three components (aka RGB channels)
A sequential palette ranges between two colours ranging from a lighter shade to a darker one. Same or similar hue are used and saturation varies.
A diverging palettes can be created by combining two sequential palettes (e.g. join them at the light colors and then let them diverge to different dark colors)
Process of solving a practical problem by
Machines don't learn
A learning machine finds a mathematical formula, which, when applied to a collections of input produces the desired output.
If you distort your data inputs, the output is very likely to become completely wrong
Arthur Lee Samuel was an American pioneer in the field of computer gaming and artificial intelligence.
He popularized the term "machine learning" in 1959 at IBM.
…Marketing reason…
The dataset is a collection of labeled examples
\(\{(x_{i}, y_{i})\}_{i=1}^{N}\)
\(x_{i}, i=1, \dots, N\) is called feature vector
\(y_{i}, i=1, \dots, N\) is called label or target
Goal: use a dataset to produce a model that takes a feature vector as input and outputs informations that allows deducing the label for this feature vector
The dataset id a collection of unlabeled exaples
\(\{(x_{i})\}_{i=1}^{N}\)
Goal: create a model that takes a feature vector as input and either trasforms it into another vector or into a value that can be used to solve a practical problem
Classification predictive modeling is the task of approximating a mapping function from input variables to discrete output variables.
A discrete output variable is a category, such as a boolean variable.
Example: Spam detection
Regression predictive modeling is the task of approximating a mapping function from input variables to a continuous output variable.
A continuous output variable is a real-value, such as an integer or floating point value.
Example: House price prediction
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
\(\{(\textbf{x}_{i}, y_{i})\}_{i=1}^{N}\)
\(\textbf{x}_{i}\) D-dimensional feature vector of sample \(i=1, \dots, N\)
\(y_{i}\in \mathbb{R}\) \(i=1, \dots, N\), \(x_{i}^{(j)}\in \mathbb{R},\; j=1,\dots, D\)
Model:
\[f_{\textbf{w}, b}(\textbf{x})= \textbf{wx} +b, \; \\ \textbf{w}\;\mbox{is a D-dimesional vector of parameter}, b\in \mathbb{R}\]
Goal:
predict the unknown \(y\) for a given \(\textbf{x}\)
\[y = f_{\textbf{w}, b}(\textbf{x})\] find the best set of parameters \((\textbf{w}^{*}, b^{*})\)
How:
Minimize the objective function
\[\dfrac{1}{N}\sum_{i=1}^{N}(f_{\textbf{w}, b}(\textbf{x}_{i})-y_{i})^{2}\]
\(\{(\textbf{x}_{i}, y_{i})\}_{i=1}^{N}\)
\(\textbf{x}_{i}\) D-dimensional feature vector of sample \(i=1, \dots, N\)
\(y_{i}\in \{0,1\}\) \(i=1, \dots, N\), \(x_{i}^{(j)}\in \mathbb{R},\; j=1,\dots, D\)
Model:
\[f_{\textbf{w}, b}(\textbf{x})= \dfrac{1}{1+\exp(-(\textbf{wx}+b))}\]
where \(\textbf{w}\) is a D-dimesional vector of parameter; \(b\in \mathbb{R}\)
Goal: maximize the likelihood of the training set
\[L_{\textbf{w}, b}= \prod_{i=1}^{N} f_{\textbf{w}, b}(\textbf{x}_{i})^{y_{i}}
(1- f_{\textbf{w}, b}(\textbf{x}_{i}))^{(1-y_{i})}\]
When \(y_{i}=1\) then \(f_{\textbf{w}, b}(\textbf{x})\)
When \(y_{i}=0\) then \((1- f_{\textbf{w}, b}(\textbf{x}))\)
No close solution, use numerical optimization via gradient descent
Problem of traforming raw data into a dataset
Everything measurable can be used as a feature
Define features with high predictive power
The rule of thumb
70% training set, 15% validation set, 15% test set
On Big Data: 95% training set, 2.5% validation set, 2.5% test set
When:
How to solve:
Qst: How good is my model on unseen data?
Mean squared error
\(MSE=\dfrac{1}{N}\sum_{i=1}^{N}(y_{i}- \hat{y}_{i})^{2}\)
Coefficient of determination
\(R^{2}=1-\dfrac{\sum_{i=1}^{N}(y_{i}- \hat{y}_{i})^{2}}{mean(y_{i})}\)
indication of the goodness of fit of a set of predictions to the actual values
Qst: How good is my model on unseen data?
hyperparameter tuning
model configuration argument specified by the developer to guide the learning process for a specific dataset
cross validation