DeepLearning Concept for Newbi

# What is Deep learning? Deep learning can be considered a process or method of building model(function) and deep learning model can be regarded as a function for prediction. >Function >Let X and Y be two sets. A function $f$ $:$ $X$ $\rightarrow$ $Y$ is a rule that assigns to each element $x$ $\in$ $X$ an element $f(x)$ $\in$ $Y$. It can be both one-to-one and one-to-many. ```mermaid graph LR; input(input)-->model(model)-->result(target) ``` Before embarking on this deep learning journey, let's contemplate a few fundamental scenarios. For instance, we want the model to classify whether an image contains a cat or a dog while we give it a random picture. Another example is that we let the model summarize the speech, and so on. ```mermaid graph LR; input(image)-->model(model)-->result(cat or dog) input2(Audio)-->model2(model)-->result2(summary) ``` --- # Start Journey In the previous paragraph, we mentioned that deep learning is a process of build model and model can be reckoned as predict function $f$ $:$ $X$ $\rightarrow$ $\hat{Y}$. where $X$ is input and $\hat{Y}$ is prediction. The primary objective is for the prediction $\hat{Y}$ is almost equal to the ground truth $Y$. >Primary objective >$$\lVert \hat{Y} - Y \rVert \leq \varepsilon \leq0$$ $\lVert \cdot \rVert$ represents loss function which measures the distance between $\hat{Y}$ and $Y$. ## A Classic Case Define training data as $S = \{ (x_{i},y_{i})|i \in N\}$, model function form $f:=\beta_{1}x + \beta_{0}=\hat{Y}$,and loss function $\lVert \hat{Y} - Y \rVert$ as $L：＝\sum_{i}^{N}(y-\hat{y})^2$,where $\{y_{i} \in Y,\hat{y_{i}} \in \hat{Y} \}\;\forall i \in N$ - Target $L \leq \varepsilon$ ### step 1 init parapmeters $\beta_{1}$ and $\beta_{0}$ > $$N(0,1) \rightarrow \beta_{1},\beta_{0}$$ ### step 2 calculate $\hat{Y}$ from $S$ with $\beta_{1}$ and $\beta_{0}$ > $$\hat{y_{i}} = \beta_{1}x_{i}+\beta_{0}\;\forall i \in N$$ ### step 3 calculate loss function $f$ > $$L(\hat{Y},Y)=\sum_{i}^{N}(y_i-\hat{y_i})^2$$ ### step 4 calculate gradient > $$\frac{\partial L(\hat{Y},Y)}{\partial \beta_i}\; ,i \in \{1,0\} $$ ### step 5 update $\beta_{1}$ and $\beta_{0}$ (Gradient descent) >$$\beta_{i}^{(t+1)} \leftarrow \beta_{i}^{(t)} -\alpha \frac{\partial L(\hat{Y},Y)}{\partial \beta_i}$$ >where $\alpha$ represents learning rate,usually is extremely small number. e.g. 0.00001 ### step 6 repeat prorcess > repeat step 2~5 until $L(\hat{Y},Y) \leq \varepsilon$ # What does 'deep' mean in 'deep learning'? In "A Classic Case"; we define the model as $f:=\beta_{1}x + \beta_{0}=\hat{Y}$, but $f$ can actually be more complicated like $f\circ f\cdots f$. It's easy to understand that the world "deep" is describes the complexity of model. # other - [computational graph](/2ytOCBRrRdSt9Rx7pkDWDw)