Thanmay Jayakumar
Seq2Seq with Attention
Try โ
โHackMD
โ
Thanmay Jayakumar
ยท
Follow
Last edited by
Thanmay Jayakumar
on
Jan 22, 2022
Linked with GitHub
Contributed by
Edit
0
Comments
Feedback
Log in to edit or delete your comments and be notified of replies.
Sign up
Already have an account? Log in
There is no comment
Select some text and then click Comment, or simply add a comment to this page from below to start a discussion.
Discard
Send
Seq2Seq with Attention
Brief Outline
Previously, we used to take the encoder state of the entire input sentence and use that every time in the decoder step.
However, at every decoder time step, we don't require the entire encoder state as the word at that time step does not depend on the entire sentence
This would also overload the decoder
Can we have a weighted sum of the encoder states at each time steps instead, to tell which encoder states are important?
The answer is, attention.
Attention
To enable attention, we define a function
e
j
t
=
f
A
T
T
(
s
t
โ
1
,
h
j
)
This quantity captures the importance of the
j
t
h
input word for decoding the
t
t
h
output word.
Since
e
j
t
needs to sum up to one, we apply the softmax function.
ฮฑ
j
t
=
e
x
p
(
e
j
t
)
โ
k
=
1
M
e
x
p
(
c
k
t
)
One of many possible choices of
f
A
T
T
is
f
A
T
T
=
V
a
t
t
T
tanh
(
U
a
t
t
s
t
โ
1
+
W
a
t
t
h
j
)
Where
h
j
โ
โ
d
1
ร
1
s
t
โ
โ
d
2
ร
1
And
V
a
t
t
โ
โ
d
1
ร
1
U
a
t
t
โ
โ
d
1
ร
d
2
W
a
t
t
โ
โ
d
1
ร
d
1
Clearly,
ฮฑ
j
t
will result in a scalar.
These parameters will be learned along with the other parameters of the encoder and decoder.
Architecture
Image Not Showing
Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More โ
Forward propagation
Encoder
x
j
=
Word Embeddings
โ
โ
e
1
ร
1
h
j
=
R
N
N
(
h
j
โ
1
,
x
j
)
โ
โ
d
1
ร
1
Attention
e
j
t
=
V
a
t
t
T
tanh
(
U
a
t
t
s
t
โ
1
+
W
a
t
t
h
j
)
ฮฑ
j
t
=
s
o
f
t
m
a
x
(
e
j
t
)
c
t
=
โ
j
=
1
T
ฮฑ
j
t
h
j
This
c
t
is the encoder hidden state that will be passed to the decoder at every timestep
t
to get the decoder hidden state
s
t
Decoder
s
t
=
R
N
N
(
s
t
โ
1
,
c
t
)
l
t
=
s
o
f
t
m
a
x
(
V
s
t
+
b
)
Seq2Seq with Attention
Brief Outline
Attention
Architecture
Forward propagation
Encoder
Attention
Decoder
Expand all
Back to top
Go to bottom
Seq2Seq with Attention
Brief Outline
Attention
Architecture
Forward propagation
Encoder
Attention
Decoder
Expand all
Back to top
Go to bottom
ร
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up
Comment