# <center><i class="fa fa-edit"></i> NN Example on Tensorflow </center> ###### tags: `Internship` :::info **Goal:** - [x] Create a simple 3-layer network - [x] Use MNIST dataset of grayscale images of hand-written digits - [x] Each image is grayscale 28 x 28 pixels - [x] 55,000 training rows, 10,000 testing rows, and 5,000 validation rows **Resources:** - [Python TensorFlow Tutorial](https://adventuresinmachinelearning.com/python-tensorflow-tutorial/) - [LSTM Implementation in Python/Numpy](https://gist.github.com/tmatha/f1c7082acdc9af21aade33b98687f2c6) - [LSTM Implementation in TensorFlow eager execution](https://gist.github.com/tmatha/905ae0c0d304119851d7432e5b359330) ::: ### Neural Network Example Load data: ``` from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) ``` - ```one_hot = True``` ensures that labels of a number (ex: '4') aren't all just the number itself. The vector just has "one hot" node of '4' and the other nodes are all zero. Set up Python optimization variables: ``` learning_rate = 0.5 epochs = 10 batch_size = 100 ``` Declare training data placeholders - Input x: 28 x 28 pixels = 784 - Output: 10 digits ``` x = tf.placeholder(tf.float32, [None, 784]) y = tf.placeholder(tf.float32, [None, 10]) ``` *There are always L-1 number of weights/bias tensors, with L = layers.* Since this is a 3-layer neural network, we must declare weights and biases for the two layers. - 300 nodes in hidden layer - Random weights in normal distribution of mean 0 and standard deviation 0.03 Declare weights connecting input to hidden layer: ``` W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1') b1 = tf.Variable(tf.random_normal([300]), name='b1') ``` Declare weights connecting hidden layer to output: ``` W2 = tf.Variable(tf.random_normal([300, 10], stddev=0.03), name='W2') b2 = tf.Variable(tf.random_normal([10]), name='b2') ``` Calculate the output of the hidden layer. Multiply weights W1 by input vector x using tf.matmul, then add bias b1 using tf.add: ```hidden_out = tf.add(tf.matmul(x, W1), b1)``` Apply rectified linear unit activation function using tf.nn.relu: ```hidden_out = tf.nn.relu(hidden_out)``` :::success The above operations can be mathematically represented as such: ![](https://i.imgur.com/270tFSv.png) ::: Set up output layer y_: - Use softmax activation, denoted by tf.nn.softmax ```y_ = tf.nn.softmax(tf.add(tf.matmul(hidden_out, W2), b2))``` Use cross entropy cost function: 1) Sum up everything across all output nodes j 2) Take mean across all training samples m ``` y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999) cross_entropy = -tf.reduce_mean(tf.reduce_sum(y * tf.log(y_clipped) + (1 - y) * tf.log(1 - y_clipped), axis=1)) ``` - tf.clip_by_value() clips output y_ so that it's limited within that range. - Prevents a case of log(0), which would break operation. - tf.reduce_sum() takes sum of given axis of tensor - Must perform first sum over second axis (axis=1) :::success **Mathematical representation of cross entropy cost function:** $$ J = -\frac{1}{m} \sum_{i=1}^m \sum_{j=1}^n y_j^{(i)}log(y_j\_^{(i)}) + (1 – y_j^{(i)})log(1 – y_j\_^{(i)}) $$ ::: Add optimizer: - TensorFlow takes care of gradient descent and backpropagation ```optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cross_entropy)``` Set up initialization operator: ```init_op = tf.global_variables_initializer()``` Define accuracy assessment operation: ``` correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) ``` - ```correct_prediction``` returns boolean for whether the predicted and actual maximum value vector/tensor are equal - To perform ```tf.reduce_mean```, we must cast boolean to tf.float32 Set up training: ``` with tf.Session() as sess: sess.run(init_op) total_batch = int(len(mnist.train.labels) / batch_size) for epoch in range(epochs): avg_cost = 0 for i in range(total_batch): batch_x, batch_y = mnist.train.next_batch(batch_size=batch_size) _, c = sess.run([optimiser, cross_entropy], feed_dict={x: batch_x, y: batch_y}) avg_cost += c / total_batch print("Epoch:", (epoch + 1), "cost =", "{:.3f}".format(avg_cost)) print(sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels})) ``` - tf.Session() starts session - sess.run(init_op) initializes variables - total_batch: mini-batch scheme, so we have to calculate how many batches to run in each epoch - Loop through epochs - avg_cost keeps track of average cross entropy cost for each epoch - batch_x and batch_y are randomly extracted batches of samples from MNIST training dataset - next_batch function makes it easy to extract - Run opimizer and cross_entropy using sess.run - Feed operations batch_x and batch_y, respectively - We only care about result of cross_entropy - Print average cost - After training is done, run accuracy Output: ``` Epoch: 1 cost = 0.586 Epoch: 2 cost = 0.213 Epoch: 3 cost = 0.150 Epoch: 4 cost = 0.113 Epoch: 5 cost = 0.094 Epoch: 6 cost = 0.073 Epoch: 7 cost = 0.058 Epoch: 8 cost = 0.045 Epoch: 9 cost = 0.036 Epoch: 10 cost = 0.027 Training complete! 0.9787 ```