Attention in Deep Learning

# Attention in Deep Learning Attention is a mechanism used for neural machine traslation, using Seq2Seq models. A Seq2Seq model is composed of an encoder, that processes the input into a compressed context vector, and a decoder, that uses the context vector to produce a translated output. The main idea of attention is to focus on a smaller set of words or values belonging to the input for predicting an output. The "attention" is restricted to a limited part of the input, in order to have better performance, especially with long sentences. Its implementation is found in an interface connecting the decoder to the encoder: when passing the context vector to the decoder, information about all encoder hidden states is added. In this way the model can focus on a selected part of the input and learn the associations. In order to implement attention, we first have to prepare the encoder hidden states and the first hidden state of the decoder. Then we will have to compute a score for each encoder hidden state; there are different score functions that we could use: Luong dot, Luong multiplicative and Bahdanau score functions. Next we will have to add all the scores to a softmax layer, in such a way the scores will represent the attention distributions. Once we obtained the distribution scores, we multiply them by the original encoder hidden state scores, this step will make us obtain the alignment vectors that will be summed in order to give out the contex vector. Once we have the context vector, containing all the alignment vectos, we can give it in input to the decoder that will use it for the translation.