# What is Deep learning?
Deep learning can be considered a process or method of building model(function) and deep learning model can be regarded as a function for prediction.
>Function
>Let X and Y be two sets. A function $f$ $:$ $X$ $\rightarrow$ $Y$ is a rule that assigns to each element $x$ $\in$ $X$ an element $f(x)$ $\in$ $Y$. It can be both one-to-one and one-to-many.
```mermaid
graph LR;
input(input)-->model(model)-->result(target)
```
Before embarking on this deep learning journey, let's contemplate a few fundamental scenarios. For instance, we want the model to classify whether an image contains a cat or a dog while we give it a random picture. Another example is that we let the model summarize the speech, and so on.
```mermaid
graph LR;
input(image)-->model(model)-->result(cat or dog)
input2(Audio)-->model2(model)-->result2(summary)
```
---
# Start Journey
In the previous paragraph, we mentioned that deep learning is a process of build model and model can be reckoned as predict function $f$ $:$ $X$ $\rightarrow$ $\hat{Y}$. where $X$ is input and $\hat{Y}$ is prediction. The primary objective is for the prediction $\hat{Y}$ is almost equal to the ground truth $Y$.
>Primary objective
>$$\lVert \hat{Y} - Y \rVert \leq \varepsilon \leq0$$
$\lVert \cdot \rVert$ represents loss function which measures the distance between $\hat{Y}$ and $Y$.
## A Classic Case
Define training data as $S = \{ (x_{i},y_{i})|i \in N\}$, model function form $f:=\beta_{1}x + \beta_{0}=\hat{Y}$,and loss function $\lVert \hat{Y} - Y \rVert$ as $L:=\sum_{i}^{N}(y-\hat{y})^2$,where $\{y_{i} \in Y,\hat{y_{i}} \in \hat{Y} \}\;\forall i \in N$
- Target $L \leq \varepsilon$
### step 1 init parapmeters $\beta_{1}$ and $\beta_{0}$
> $$N(0,1) \rightarrow \beta_{1},\beta_{0}$$
### step 2 calculate $\hat{Y}$ from $S$ with $\beta_{1}$ and $\beta_{0}$
> $$\hat{y_{i}} = \beta_{1}x_{i}+\beta_{0}\;\forall i \in N$$
### step 3 calculate loss function $f$
> $$L(\hat{Y},Y)=\sum_{i}^{N}(y_i-\hat{y_i})^2$$
### step 4 calculate gradient
> $$\frac{\partial L(\hat{Y},Y)}{\partial \beta_i}\; ,i \in \{1,0\} $$
### step 5 update $\beta_{1}$ and $\beta_{0}$ (Gradient descent)
>$$\beta_{i}^{(t+1)} \leftarrow \beta_{i}^{(t)} -\alpha \frac{\partial L(\hat{Y},Y)}{\partial \beta_i}$$
>where $\alpha$ represents learning rate,usually is extremely small number. e.g. 0.00001
### step 6 repeat prorcess
> repeat step 2~5 until $L(\hat{Y},Y) \leq \varepsilon$
# What does 'deep' mean in 'deep learning'?
In "A Classic Case"; we define the model as $f:=\beta_{1}x + \beta_{0}=\hat{Y}$, but $f$ can actually be more complicated like $f\circ f\cdots f$. It's easy to understand that the world "deep" is describes the complexity of model.
# other
- [computational graph](/2ytOCBRrRdSt9Rx7pkDWDw)