Attention mechanism

# Attention mechanism Attention is one of the most influential ideas in the Deep Learning. This mechanism is used in various problems like image captioning. ## What is Attention? When we think about the English word “Attention”, we know that it means directing your focus at something and taking greater notice. A neural network is considered to be an effort to mimic human brain actions in a simplified manner. Attention Mechanism is also an attempt to implement the same action of selectively concentrating on a few relevant things, while ignoring others in deep neural networks. Attention in deep learning can be broadly interpreted as a vector of importance weights: in order to predict or infer one element, such as a pixel in an image or a word in a sentence, we estimate using the attention vector how strongly it is correlated with other elements and take the sum of their values weighted by the attention vector as the approximation of the target. There are 2 different major types of Attention: * Bahdanau Attention * Luong Attention ### What’s Wrong with Seq2Seq Model? The seq2seq model aims to transform an input sequence (source) to a new one (target) and both sequences can be of arbitrary lengths. Examples of transformation tasks include machine translation between multiple languages in either text or audio, question-answer dialog generation. Drawback of this fixed-length context vector design is incapability of remembering long sentences. Often it has forgotten the first part once it completes processing the whole input. The attention mechanism resolve this problem. ## Bahdanau Attention The first type of Attention improve the sequence-to-sequence model in machine translation by aligning the decoder with the relevant input sentences and implementing Attention. The entire step-by-step process of applying Attention: * Producing the Encoder Hidden States * Calculating Alignment Scores * Softmaxing the Alignment Scores * Calculating the Context Vector * Decoding the Output * The process (steps 2-5) repeats itself for each time step of the decoder until an token is produced or output is past the specified maximum length ## Luong Attention The second type process: * Producing the Encoder Hidden States * Decoder RNN * Calculating Alignment Scores * Softmaxing the Alignment Scores * Calculating the Context Vector * Producing the Final Output * The process (steps 2-6) repeats itself for each time step of the decoder until an token is produced or output is past the specified maximum length