PW 09 Bütikofer Jaggi

# PW 09 Bütikofer Jaggi ## 1. The Perceptron and the Delta rule ### 1_activation_function ```python= def relu(neta): '''the activation function of a rectified Linear Unit (ReLU)''' output = neta * (neta > 0) d_output = 1.0 * (neta > 0) return (output, d_output) ``` ![](https://i.imgur.com/XCSQFBa.png) The sigmoid has two horizontal asymptote in y=0 and in y = 1, the hyperbolic tangent in y =-1 and y=1 and the linear function doesn't have asymptote. The derivative of the sigmoid and the hyperbolic function look like a Gaussian distribution and the linear function derivative is a straight. ### 4_1_delta_rule_points #### When well defined ![](https://i.imgur.com/38mpYnS.png) The peceptron defines well 2 different classes. The error goes with a few iteration near 0 #### When classes overlap ![](https://i.imgur.com/9YHHVV1.png) Because of the overlap the error function doesn t go near 0 but near 0.4 (because of the overlapping points). It doesn t take more iteration to converge to 0.4. There is only one oscillation, no significant. #### Not with single line ![](https://i.imgur.com/SOePTCP.png) The error converge near 1.0. When there is not a single line separation, the number iteration increase and the error is higher. Local minima are found after fewer iteration than the global minima, we converge to the global minima. ## 2. Backpropagation ### 5_backpropagation ### When well defined ![](https://i.imgur.com/4FaJQNr.png) The error converges to 0. It converge with a few iteration. #### When classes overlap ![](https://i.imgur.com/xSABvFO.png) Because of the overlap the error function doesn t go near 0 but near 0.4 (because of the overlapping points). It doesn t take more iteration to converge to 0.4. There is only one oscillation, no significant #### Not with single line ![](https://i.imgur.com/1jetQVd.png) It's possible to separate the dataset with 2 lines. The error converge to 0 with more iteration. There is no local minima #### Separated in subgroups (blobs) ![](https://i.imgur.com/PiZp44Y.png) It's possible to separate the classes in 2 groups. The error converge to 0 and it needs only a few iteration ```python= class MLP: ... def init_weights(self): ''' This function creates the matrix of weights and initialiazes their values to small values ''' self.weights = [] # Start with an empty list self.delta_weights = [] for i in range(1, len(self.layers) - 1): # Iterates through the layers # np.random.random((M, N)) returns a MxN matrix # of random floats in [0.0, 1.0). # (self.layers[i] + 1) is number of neurons in layer i plus the bias unit self.weights.append((2 * np.random.random((self.layers[i - 1] + 1, self.layers[i] + 1)) - 1) * 0.25) self.delta_weights.append(np.zeros((self.layers[i-1] +1, self.layers[i] +1 ))) # delta_weights are initialized to zero # Append a last set of weigths connecting the output of the network self.weights.append((2 * np.random.random((self.layers[i] + 1, self.layers[i + 1])) - 1) * 0.25) self.delta_weights.append(np.zeros((self.layers[i] + 1, self.layers[i + 1]))) def fit(self, data_train, data_test=None, learning_rate=0.1, momentum=0.0 ,epochs=100): ''' Online learning. :param data_train: A tuple (X, y) with input data and targets for training :param data_test: A tuple (X, y) with input data and targets for testing :param learning_rate: parameters defining the speed of learning :param epochs: number of times the dataset is presented to the network for learning ''' ... # Update for i in range(len(self.weights)): # Iterate through the layers layer = np.atleast_2d(a[i]) # Activation delta = np.atleast_2d(deltas[i]) # Delta # Compute the weight change using the delta for this layer # and the change computed for the previous example for this layer self.delta_weights[i] = (-learning_rate * layer.T.dot(delta)) + (momentum * self.delta_weights[i]) self.weights[i] += self.delta_weights[i] # Update the weights error_train[k] = np.mean(error_it) # Compute the average of the error of all the examples if data_test is not None: # If a testing dataset was provided error_test[k], _ = self.compute_MSE(data_test) # Compute the testing error after iteration k if data_test is None: # If only a training data was provided return error_train # Return the error during training else: return (error_train, error_test) # Otherwise, return both training and testing error ... ``` ## 4. Crossvalidation ![](https://i.imgur.com/DYCbgAP.png) The results vary a lot. We can see that as the spread increase the error rate logically increase too. ![](https://i.imgur.com/dJKCjgw.png) We can see that similar as the hold ou validation, when the spread increase the error rate increase too. If we compare the results of the two methods, we see that the hold out can give the best result in some case but the worst too. the k-fold give a bit worse result on average but the results doesn't vary as much. ## 5. Model building ### Spread (0.3, 0.5, 0.7) ![](https://i.imgur.com/eZ9WoPA.png) ![](https://i.imgur.com/xZSgX45.png) The final result is 4 hidden neurons with 60 epochs. We can see that even if the spread is 0.7 we have a pretty good result. With 4 neurons the results are good and we have less computation time.