# What is Attention? The definition according to the Oxford English dictionary goes as follows: "Notice taken of someone or something; the regarding of someone or something as interesting or important." We can translate this from our intuitive human understanding to what it means for a computer, or an AI to "pay attention". Unlike humans, an AI will only pay attention when turned on and instructed when and how to do so. We have to define what is worth noticing or important so an AI can use that knowledge. First, imagine if humans could not pay attention. We would still be walking around taking in all of the input from our senses, but imagine if we had to process every single pixel from every single that we see before deciding if/how to take an action. Every feeling we have from our toes to the top of our head at every second of the day before we could decide if we were hungry or needed a nap. Or every single distinguishable background sound at a party has to be consciously processed by your brain before you can understand and respond to the person asking in your ear if you want another drink. My point is you would go thirsty at the party, and probably still go home with a headache. The AI equivalent to not having an attention mechanism causing headaches is context. Context is an encoding of a sentence. The problem with context is it tries to represent all of the input information in a single vector and try to create a variable length output with a single reference of context. If our input is large, the important details might get drowned out in the context vector or it may take longer to learn/complete the task. What's the cure for a headache? Attention. Attention allows our AI to take notice of the important information in the process of decoding our context. It creates a context at each unit of input using a hidden state created by the encoder at each point of the input. There is also another hidden state taking into account all of the input. These vectors are multiplied (sometimes with a trainable weight matrix e.g. in the Bahdanau attention scoring technique), and usually pushed through a softmax layer. The output of the softmax layer results in scores for the input, the higher the score of the unit of input means the more significant the unit of the input is. This is a very significant realisation as this now allows the AI to drown out the noise at a given point of processing the input and focus on whats important for decoding. The Attention mechanism is a breakthrough in artificial intelligence that is again quite amazing, like how neural networks mimic the brain with neurons and how activation functions can be compared to action potention, the attention mechanism can be compared to our own ability to notice more important things necessary for us to complete a given task. You could even say that attention in Deep Learning brings us one step closer to true Artificial intelligence