HackMD - Collaborative Markdown Knowledge Base

**Attention ** How does the concept of attention work? Why do we need it and how does it work? The first step is to look at Seq2Seq model ![](https://i.imgur.com/mgsF1rw.png) The Seq2Seq models consist is simple as it consists of encoder anf decoder. The encoder takes the input of sequence of items, where relevant information is captured and later on represented in context, also known as a fixed size vector. The items of sequence can be in the form of numbers or words. The decoder on the other hand uses the vector in order to generate an output with the current information. An example could be translations. Attention is however proposed by Bahnadau and Luong 2014 and 2015, as it works with intepreting the information the correct way. As some things may be represent incorrectly. For instance, translations as it can sometimes be translated but lose the meaning. **Cognitive idea behind the concept of Attention. ** The way our brain works is that it knows what, when and where the attention should be directed to. The process is cognitive and focus on the selective concentration, which means that we focus on a couple of things and ignore the surrounding / the rest The idea behind the deconder and encoder is to focus on the relevant features, to be able to carry out the desired task, such as, detection and translation. **How it works** The encoder consists of cells that generate a state, used for the following cells to be able to produce the context vector output. To conclude, not all encoders is passed to the decoder, as only the final state is passed and accepted to the decoder. The attention is therefore implemented once all the intermediete states are passed through the decoder. With this step, comes hidden names as all these states are given hidden names. The context vector will now take the cells outputs as an input to compute the probability distribution. This in order to lead the way to a better quality in produced input. The process is obtained through a weighed sum of the hidden states, that belongs to the encoder. ![](https://i.imgur.com/P5REwSX.jpg) Bahnadau and Luong Score Functions are needed in both in trainable weights as well as alighning wiht previos and current states of the encoder. ![](https://i.imgur.com/AC1t2jT.png) The Luong approach uses simple and only 1 weight matrix. This approach also uses multiplicative style, while Bahdanaus style uses additive style. Moreover Bahndanau use two weight matrices and hidden layer network model. To conclude it all, attention is the technique of which the encoding is used that defines and uses the past and current state of the encoder in order to compile the relevant information about input etc. The input is later on directed to the decoder.