# <center><i class="fa fa-edit"></i> Understanding RNN, LTSM </center> ###### tags: `Internship` :::info **Goal:** To gain a basic understanding of the RNN, LTSM techniques. Focus on vocabulary and systems overview. - [x] Overview - [x] Easy Implementation **Resources:** [Towards Data Science Page](https://towardsdatascience.com/understanding-lstm-and-its-quick-implementation-in-keras-for-sentiment-analysis-af410fd85b47) [Neural Networks, RNN](https://hackmd.io/@j-chen/H1_3Hf_LP) [Machine Learning](https://hackmd.io/@Derni/HJQkjlnIP) ::: ### Overview - RNN: Recurrent neural network - Conventional NNs don't learn from previous events --> can't pass the information up to the next step - RNN: learns from **immediate** previous step - Does not have long-term dependency (can't learn from steps other than the immediate previous one) - Not practical - More loops = large updates to weights = accumulates error gradients = unstable network ![](https://i.imgur.com/l2vFip8.png) - LSTM: Long short term memory - Applications: - Speech recognition - Language modeling - Sentiment analysis - Text prediction - Variant of RNN - Uses gates to control memorizing process - Diagram below: - *X*: scaling of information - *+*: adding information - *sigma*: sigmoid layer - Sigmoid output: 0 or 1 - *tanh*: tanh layer - Overcomes vanishing gradient problem - Second derivative sustains for long range before going to zero - *h(t-1)*: output of last LSTM unit - *c(t-1)*: memory from last LSTM unit - *X(t)*: current input - *c(t)*: new updated memory - *h(t)*: current output ![](https://i.imgur.com/dF3ryie.png) - Three main components: - Forget unnecessary information: - Sigmoid layer takes input X9t) and h(t-1) - Removes old output by outputting 0 - Forget gate f(t) - Outputs f(t)*c(t-1) - Store information: - Takes new input X(t) - Store into cell state - Steps: - Sigmoid layer decides what to update or ignore - tanh layer creates vector of all possible values from new input - Multiply sigmoid and tanh layers to update cell state - Add new memory to old memory c(t-1) to give c(t) - Decide output: - Decided by a sigmoid layer - Put cell state through tanh to generate all possible values and multiply it by output of sigmoid gate - If sigmoid has 0, then multiplication yields 0 ### Simple implementation: - Tokenizer to vectorize text and covert to sequence of integers - pad_sequences to convert sequences into 2D numpy array - LSTM network: - Hyper parameters: - embed_dim: embedding layer encodes input sequence into sequence of dense vectors of dimension embed_dim - lstm_out: LSTM transforms vector sequence into single vector of size lstm_out, containing information about the entire sequence - drouput - batch_size - softmax: activation function