LSTM and GRU

Brief Outline

Problem

  • The state of an RNN records information from all previous time steps.
  • At each new timestep the old information gets morphed by the current input
  • One could imagine that after many time steps the information stored at a later time step might get completely morphed so much that it would be impossible to extract the original information stored.

Key Idea

To get more flexibility and have a better modelling choice:

  • Selective read: Selectively read required information
  • Selective write: Selectively write information
  • Selective forget: Selectively forget useless information.

Recall

ct=ฯƒ(Wctโˆ’1+Uxt+b)

Selective Write

  • We don't want to write the whole to
    stโˆ’1
    , we just want to write selective portions of that into
    ct
    .
  • We introduce a vector
    otโˆ’1
    which decides what fraction should be passed. ie. Selective Write =
    ctโˆ’1โ‹…otโˆ’1
  • But how does the RNN know what should be the values of
    otโˆ’1
    ? - We introduce parameters.
  • We compute
    otโˆ’1
    and
    htโˆ’1
    as
    • otโˆ’1=ฯƒ(Wohtโˆ’2+Uoxtโˆ’1+bo)
    • htโˆ’1=ctโˆ’1โŠ™otโˆ’1
  • ot
    is known as the output gate.

LSTM

Selective Read

  • Now that we have
    htโˆ’1
    , which contains only selectively written values from
    ctโˆ’1
    .
  • We may not want to pass this along with
    xt
    directly to
    ct
    as
    xt
    also may contain irrelevant information. Hence, we define an intermediate step
    s~t
    • c~t=ฯƒ(Whtโˆ’1+Uxt+b)
  • Then Selective Read =
    s~tโ‹…it
    where
    it
    is defined as
    • it=ฯƒ(Wihtโˆ’1+Uixt+bi)

Selective Forget

  • We now have
    ctโˆ’1
    and
    c~tโ‹…it
    , and have to combine these to get
    ct
    , ie. the new hidden state.
  • One simple way of doing this is adding both the above terms. However, we may want to forget some parts of
    stโˆ’1
    instead of passing it directly. We introduce another gate
    • ft=ฯƒ(Wfhtโˆ’1+Ufxt+bf)

Update new state

  • Finally after combing all the above three gates, we get the new state as
    • ct=c~tโŠ™it
      +
      ctโˆ’1โŠ™ft

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

GRU