Try   HackMD

CS224N (2019) Lecture 1 Further Readings

Gradient for Skip-Gram Model (p30)

For a center word

c and a context word
o
, how to calculate
vclogp(o|c)
?

vclogp(o|c)

=vclogexp(uoTvc)w=1Vexp(uwTvc)

=vclogexp(uoTvc)vclogw=1Vexp(uwTvc)
(Rule for logarithm division)

=vcuoTvcvclogw=1Vexp(uwTvc)
(Logarithm and exponential are canceled out)

=uovclogw=1Vexp(uwTvc)
(Useful basics in p28)

=uo1w=1Vexp(uwTvc)vcw=1Vexp(uwTvc)
(Chain rule:
yvc=yzzvc,y=logz,z=w=1Vexp(uwTvc),yz=zlogz=1z
)

=uo1w=1Vexp(uwTvc)w=1Vvcexp(uwTvc)

=uo1w=1Vexp(uwTvc)w=1Vexp(uwTvc)vcuwTvc
(Chain rule:
yvc=yzzvc,y=exp(z),z=uwTvc,yz=zexp(z)=exp(z)
)

=uo1w=1Vexp(uwTvc)w=1Vexp(uwTvc)uw
(Useful basics in p28)

=uow=1Vexp(uwTvc)w=1Vexp(uwTvc)uw
(Rearange)

=uow=1Vp(w|c)uw
(From the definition:
p(w|c)=w=1Vexp(uwTvc)w=1Vexp(uwTvc)
)

Conclusion

The gradient is the observed representation of the context word minus what the model thinks that the context should look like. The context that should look like is the weighted average of the representations of each word.

Continuous Bag of Words (CBOW) Model (p32)

Source: cs224 lecture 1 note

Predicting a center word from the surrounding context.

For each word, we want to learn 2 vectors:

  • v
    : (input vector) when the word is in the context
  • u
    : (output vector) when the word is in the center

Notation

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Model Architecture

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Forward Feed

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Loss Function

Use cross entropy cross entropy

H(y^,y) as our loss function:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Because

y is a one-hot vector, where
yi
= 1 and other is 0, then the loss can be simplified as:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

So, for center word

c, what we want to minimize is:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Evaluation of Word2Vec

Source: word2vec paper

Define a comprehensive test set that contains five types of semantic questions, and nine types of syntactic questions.

Example table:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

The questions in each category were created in two steps:

  1. a list of similar word pairs was created manually
  2. a large list of questions is formed by connecting two word pairs

Question is assumed to be correctly answered only if the closest word to the vector computed using the above method is exactly the same as the correct word in the question; synonyms are thus counted as mistakes.

Example evaluation result:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Hierarchical Softmax

Source: cs224 lecture 1 note

Hierarchical softmax is an efficient way to calculate the probability between words.

The method uses a binary tree to represent all words in the vocabulary

  • Each leaf of the tree is a word, and there is a unique path from root to leaf
  • No output representation for words
  • Each node of the graph (except the root and the leaves) is associated to a vector that the model is going to learn

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

The probability of a word

w given a vector
wi
:
P(w|wi)
, is equal to the probability of a random walk starting in the root and ending in the leaf node corresponding to
w
.

Example: (source)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

P(Im|C)=0.570.680.72=0.279072

The computing cost is only

O(log(|V|)), more efficient than original softmax, which is
O(|V|)

Notation

  • L(w)
    : the number of nodes in the path from the root to the leaf
    w
  • n(w,i)
    : the i-th node on this path with associated vector
    vn(w,i)
  • ch(n)
    : the left child node for inner node
    n

Formula

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

There are some properties:

  1. For any value of :
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  2. For any word wi:
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Example:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Speedup

Use a binary Huffman tree, which assigns frequent words shorter paths in the tree.

Word2Vec Online Demo

https://turbomaze.github.io/word2vecjson/