As the Skipgram architecture tries to model a classifier in a Euclidean Space, non-linear activation functions for the hidden layer will not be required as this will defeat the purpose.
In the final layer, the SotfMax function will be applied to to get
The SoftMax function is defined as follows
Forward Propagation
We have the input which is of dimension[V,1], where V is the vocabulary size.
Take the dot product with matrix to get
Take the dot product of and to get
Apply SoftMax function to to get .
Compute Loss
Cross Entropy Loss
Cost Function
Gradient Change
With respect to
[V,d] = [V,1][1,d]
With respect to
[d,V] = ([1,V][V,d]) [V,1]
Optimization
Updating weights and by using the gradients calculated above. Here is the learning rate.