# <center><i class="fa fa-edit"></i> NN Example on Tensorflow </center>
###### tags: `Internship`
:::info
**Goal:**
- [x] Create a simple 3-layer network
- [x] Use MNIST dataset of grayscale images of hand-written digits
- [x] Each image is grayscale 28 x 28 pixels
- [x] 55,000 training rows, 10,000 testing rows, and 5,000 validation rows
**Resources:**
- [Python TensorFlow Tutorial](https://adventuresinmachinelearning.com/python-tensorflow-tutorial/)
- [LSTM Implementation in Python/Numpy](https://gist.github.com/tmatha/f1c7082acdc9af21aade33b98687f2c6)
- [LSTM Implementation in TensorFlow eager execution](https://gist.github.com/tmatha/905ae0c0d304119851d7432e5b359330)
:::
### Neural Network Example
Load data:
```
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
```
- ```one_hot = True``` ensures that labels of a number (ex: '4') aren't all just the number itself. The vector just has "one hot" node of '4' and the other nodes are all zero.
Set up Python optimization variables:
```
learning_rate = 0.5
epochs = 10
batch_size = 100
```
Declare training data placeholders
- Input x: 28 x 28 pixels = 784
- Output: 10 digits
```
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
```
*There are always L-1 number of weights/bias tensors, with L = layers.*
Since this is a 3-layer neural network, we must declare weights and biases for the two layers.
- 300 nodes in hidden layer
- Random weights in normal distribution of mean 0 and standard deviation 0.03
Declare weights connecting input to hidden layer:
```
W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random_normal([300]), name='b1')
```
Declare weights connecting hidden layer to output:
```
W2 = tf.Variable(tf.random_normal([300, 10], stddev=0.03), name='W2')
b2 = tf.Variable(tf.random_normal([10]), name='b2')
```
Calculate the output of the hidden layer.
Multiply weights W1 by input vector x using tf.matmul, then add bias b1 using tf.add:
```hidden_out = tf.add(tf.matmul(x, W1), b1)```
Apply rectified linear unit activation function using tf.nn.relu:
```hidden_out = tf.nn.relu(hidden_out)```
:::success
The above operations can be mathematically represented as such:

:::
Set up output layer y_:
- Use softmax activation, denoted by tf.nn.softmax
```y_ = tf.nn.softmax(tf.add(tf.matmul(hidden_out, W2), b2))```
Use cross entropy cost function:
1) Sum up everything across all output nodes j
2) Take mean across all training samples m
```
y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999)
cross_entropy = -tf.reduce_mean(tf.reduce_sum(y * tf.log(y_clipped)
+ (1 - y) * tf.log(1 - y_clipped), axis=1))
```
- tf.clip_by_value() clips output y_ so that it's limited within that range.
- Prevents a case of log(0), which would break operation.
- tf.reduce_sum() takes sum of given axis of tensor
- Must perform first sum over second axis (axis=1)
:::success
**Mathematical representation of cross entropy cost function:**
$$
J = -\frac{1}{m} \sum_{i=1}^m \sum_{j=1}^n y_j^{(i)}log(y_j\_^{(i)}) + (1 – y_j^{(i)})log(1 – y_j\_^{(i)})
$$
:::
Add optimizer:
- TensorFlow takes care of gradient descent and backpropagation
```optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cross_entropy)```
Set up initialization operator:
```init_op = tf.global_variables_initializer()```
Define accuracy assessment operation:
```
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
```
- ```correct_prediction``` returns boolean for whether the predicted and actual maximum value vector/tensor are equal
- To perform ```tf.reduce_mean```, we must cast boolean to tf.float32
Set up training:
```
with tf.Session() as sess:
sess.run(init_op)
total_batch = int(len(mnist.train.labels) / batch_size)
for epoch in range(epochs):
avg_cost = 0
for i in range(total_batch):
batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size)
_, c = sess.run([optimiser, cross_entropy],
feed_dict={x: batch_x, y: batch_y})
avg_cost += c / total_batch
print("Epoch:", (epoch + 1), "cost =", "{:.3f}".format(avg_cost))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels}))
```
- tf.Session() starts session
- sess.run(init_op) initializes variables
- total_batch: mini-batch scheme, so we have to calculate how many batches to run in each epoch
- Loop through epochs
- avg_cost keeps track of average cross entropy cost for each epoch
- batch_x and batch_y are randomly extracted batches of samples from MNIST training dataset
- next_batch function makes it easy to extract
- Run opimizer and cross_entropy using sess.run
- Feed operations batch_x and batch_y, respectively
- We only care about result of cross_entropy
- Print average cost
- After training is done, run accuracy
Output:
```
Epoch: 1 cost = 0.586
Epoch: 2 cost = 0.213
Epoch: 3 cost = 0.150
Epoch: 4 cost = 0.113
Epoch: 5 cost = 0.094
Epoch: 6 cost = 0.073
Epoch: 7 cost = 0.058
Epoch: 8 cost = 0.045
Epoch: 9 cost = 0.036
Epoch: 10 cost = 0.027
Training complete!
0.9787
```