or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
Hello
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →Patricio Reyes
You ?
Your Name here
Dadaist approach
feel free to collaborate on this presentation
Share your roadmap
Wise Apple Bowl 2020
A different approach
Let's collaborate
Next steps
Pre-requisites
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Tips for data scientists
How to structure a data science project
collaboration
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →1. "I work alone. I don't care"
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →you always need to collaborate
2. "I work on a team"
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Reproducibility
Data Science Template
Install the template
Directory Structure
Data is immutable
.gitignore
Data version control
Template for Workflows
AI Ethics
Data Analytics Tools
jupyter notebook
Deployment
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →papermill + nbconvert
parameters
papermill + nbconvert
further reading:
cons
webapps / Dashboarding
Maria Teresa Grifa
Data Science Steps
Data Preparation
Raw Data
Data Cleaning
Data Consolidation
Exploratory Data Analysis
Problem identification
Basic Statistics
Visualization
Quintessential rules
Data visualization is a key part of communicating your work to others
with a graph you are able to tell a story
Tips
A color can be defined using three components (aka RGB channels)
Sequential palettes
A sequential palette ranges between two colours ranging from a lighter shade to a darker one. Same or similar hue are used and saturation varies.
Viridis palette
Diverging palettes
A diverging palettes can be created by combining two sequential palettes (e.g. join them at the light colors and then let them diverge to different dark colors)
Icefire palette
Visualization packages
Hands-on
Machine Learning Intro
ML WorkFlow
Process of solving a practical problem by
Why the name Machine Learning?
Arthur Lee Samuel was an American pioneer in the field of computer gaming and artificial intelligence.
He popularized the term "machine learning" in 1959 at IBM.
…Marketing reason…
Two Types of Learning
Supervised Learning
The dataset is a collection of labeled examples
\(\{(x_{i}, y_{i})\}_{i=1}^{N}\)
\(x_{i}, i=1, \dots, N\) is called feature vector
\(y_{i}, i=1, \dots, N\) is called label or target
Unsupervised Learning
The dataset id a collection of unlabeled exaples
\(\{(x_{i})\}_{i=1}^{N}\)
Classification Problem
Classification predictive modeling is the task of approximating a mapping function from input variables to discrete output variables.
A discrete output variable is a category, such as a boolean variable.
Example: Spam detection
Regression Problem
Regression predictive modeling is the task of approximating a mapping function from input variables to a continuous output variable.
A continuous output variable is a real-value, such as an integer or floating point value.
Example: House price prediction
Machine Learning Map
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
Scikit-Learn
https://scikit-learn.org
Linear Regression
\(\{(\textbf{x}_{i}, y_{i})\}_{i=1}^{N}\)
\(\textbf{x}_{i}\) D-dimensional feature vector of sample \(i=1, \dots, N\)
\(y_{i}\in \mathbb{R}\) \(i=1, \dots, N\), \(x_{i}^{(j)}\in \mathbb{R},\; j=1,\dots, D\)
Model:
\[f_{\textbf{w}, b}(\textbf{x})= \textbf{wx} +b, \; \\ \textbf{w}\;\mbox{is a D-dimesional vector of parameter}, b\in \mathbb{R}\]
Goal:
predict the unknown \(y\) for a given \(\textbf{x}\)
\[y = f_{\textbf{w}, b}(\textbf{x})\] find the best set of parameters \((\textbf{w}^{*}, b^{*})\)
How:
Minimize the objective function
\[\dfrac{1}{N}\sum_{i=1}^{N}(f_{\textbf{w}, b}(\textbf{x}_{i})-y_{i})^{2}\]
Logistic Regression
\(\{(\textbf{x}_{i}, y_{i})\}_{i=1}^{N}\)
\(\textbf{x}_{i}\) D-dimensional feature vector of sample \(i=1, \dots, N\)
\(y_{i}\in \{0,1\}\) \(i=1, \dots, N\), \(x_{i}^{(j)}\in \mathbb{R},\; j=1,\dots, D\)
Model:
\[f_{\textbf{w}, b}(\textbf{x})= \dfrac{1}{1+\exp(-(\textbf{wx}+b))}\]
where \(\textbf{w}\) is a D-dimesional vector of parameter; \(b\in \mathbb{R}\)
Goal: maximize the likelihood of the training set
\[L_{\textbf{w}, b}= \prod_{i=1}^{N} f_{\textbf{w}, b}(\textbf{x}_{i})^{y_{i}} (1- f_{\textbf{w}, b}(\textbf{x}_{i}))^{(1-y_{i})}\]
When \(y_{i}=1\) then \(f_{\textbf{w}, b}(\textbf{x})\)
When \(y_{i}=0\) then \((1- f_{\textbf{w}, b}(\textbf{x}))\)
No close solution, use numerical optimization via gradient descent
Basic Practice
Feature Engineering
Problem of traforming raw data into a dataset
Everything measurable can be used as a feature
Define features with high predictive power
Feature creation
Feature Validation
Choose the right algo for your problem
Splitting Techniques
The rule of thumb
70% training set, 15% validation set, 15% test set
On Big Data: 95% training set, 2.5% validation set, 2.5% test set
Model Performace Visualization
Model Performace
Overfitting
When:
How to solve:
Underfitting
Model Performance Metrics i
Qst: How good is my model on unseen data?
Linear regression metrics examples
Mean squared error
\(MSE=\dfrac{1}{N}\sum_{i=1}^{N}(y_{i}- \hat{y}_{i})^{2}\)
Coefficient of determination
\(R^{2}=1-\dfrac{\sum_{i=1}^{N}(y_{i}- \hat{y}_{i})^{2}}{mean(y_{i})}\)
indication of the goodness of fit of a set of predictions to the actual values
Model Performance Metrics ii
Qst: How good is my model on unseen data?
Classification Metrics Example
ratio number of correct predictions on all predictions made
table presents predictions on the x-axis and accuracy outcomes on the y-axis
Improve Model Performace
hyperparameter tuning
model configuration argument specified by the developer to guide the learning process for a specific dataset
define a search space as a grid of hyperparameter values and evaluate every position in the grid
cross validation
Hands-on
1. exploratory data analysis
colab
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →2. scikit-learn (Colab)
colab
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →colab
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →project template
Thanks!
Tips
Acknowledgements
Thanks to all the contributors
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →References