Deep Learning for everyone

--- title: Deep Learning for everyone tags: APAC HPC-AI competition --- [TOC] [Python Module(有Numpy、Pandas)](https://hackmd.io/le4IndLnQqSPfzQldPJ8iw?view) ## Machine Learning Data -> Linear Algebra -> Answer ## How the Brain Works - Each neuron receives input from other neurons - Weight can be `+ | -` - Different areas perform different functions using same structure(**Modularity**) Our brain is very modular!! Able to adapt to various input. **Motivation: We are to build an artificial brain!** ## Some terms Some terms of input - Features - Prediction - Attributes - Predictable Variables Machine - Algorithm - Technique - Models Some terms of output - Classes - Responses - Targets - Dependant Variables ## Perceptron inputs->mutiply(the respective weights)->outputs ## Training Procedure - If **correct**, do nothing. - If **incorrectly** outputs 0, **add** inputs to weight vector. - If **incorrectly** outputs 1, **subtract** inputs to weight vector. **Guaranteed to converge**, if a correct set of weights exists. finding and choosing the right features -> ==MAGIC==!! --- ### Resources reference [Github stuff](https://github.com/DataForScience/DeepLearning) #### 1. Perceptron **Logic function** --- #### Q&A Q: **Why XOR requires 2 different layers?** A: (X AND Y) OR (NOT~X AND NOT~Y) > 2 steps(layer) to simulate XOR. --- ## Optimization Problem > Keep calm and start optimizing Have three distinct pieces: 1. The constraints 2. THe function to optimize 3. THe optimization algorithm (p.25) ## Linear Regression What is linear regression? > Linear Regression is a predictive algorithm which provides a Linear relationship between Prediction (Call it ‘Y’) and Input (Call is ‘X’). > (p.27) **(to be studied)** lasso vs. ridge vs. linear regression :::danger The line for linear regression must be straight!! ::: ## Learning Procedure ![](https://i.imgur.com/k7C6jcJ.png) ## Logistic Regression (Classification) - Predict the P of instance belonging to the given class - Error function - plug in the sigmoid function as ø() - Gradient **comparison** Linear Regression v.s. Logistic Regression Activation function - Sigmoid [Why We Use the Sigmoid Function in Neural Networks for Binary Classification](https://www.youtube.com/watch?v=WsFasV46KgQ) [video](https://www.youtube.com/watch?v=aircAruvnKk) Q: What are we doing in the hidden layer? A: Creates different representations of the input, transforming the input to something that's more abstract that still contains the information you need to improve your results by adding some processing layers. ### HW - [ ] Benchmark - [ ] [Prove Cross Entropy](https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a) - [ ] Gradient Descent (Time 156 min) --- neurons: 帶有數字，一般為0~1 -> (Data) -Input-> Function -Output-> (Number) What parameter should exists? - Weights - $w_1a_1 + w_2a_2 + ... w_na_n$ - 可能為任意數 - 使用activation fun壓進特殊範圍，例: sigmoid fun -> [0, 1] - Bias - 限制權重和要到哪個限度才有意義 - 像是一個開關二 9 14 17 2 8 neurons that fire together wire together