NN - Multiplayer Perceptrons (MLP)

# NN - Multiplayer Perceptrons (MLP) ## Summary * [Overview](#overview) * [Structure of MultiLayer Perceptron Neural Network ](#structure-of-multiLayer-perceptron-neural-network) * [Backpropagation](#backpropagation) * [The key differences between a multilayer perceptron and a convolutional neural network](#the-key-differences-between-a-multilayer-perceptron-and-a-convolutional-neural-network) * [Applications of multilayer perceptrons in machine learning](#applications-of-multilayer-perceptrons-in-machine-learning) * [Conclusion](#conclusion) ## Overview A Multi-Layer Perceptron (MLP) is an artificial neural network comprising multiple layers of neurons or nodes organized in a hierarchical structure. It is a widely used and straightforward type of neural network, especially in supervised learning tasks like classification and regression. The fundamental principle underlying the operation of an MLP is backpropagation, a crucial algorithm employed for training the network. During backpropagation, the network iteratively adjusts its weights and biases by propagating the error backward from the output layer to the input layer. This iterative process allows the model's parameters to be fine-tuned, progressively enhancing its predictive capabilities. ![image2_fb47b41f2f (1)](https://hackmd.io/_uploads/SJaprTV_A.png) ## Structure of MultiLayer Perceptron Neural Network This network has three main layers that combine to form a complete Artificial Neural Network. These layers are as follows: - **Input layer**: Receives input data and passes it on to the hidden layers. The number of neurons in the input layer is equal to the number of input features. - **Hidden layer**s: Consist of one or more layers of neurons that perform computations and transform the input data. The number of hidden layers and neurons within each layer can be adjusted to optimize the network’s performance. - **Activation function**: Applies a non-linear transformation to the output of each neuron in the hidden layers. Common activation functions include sigmoid, hyperbolic tangent (tanh), and rectified linear unit (ReLU). - **Output layer**: Produces the final output of the network, such as a classification label or a regression target. The number of neurons in the output layer depends on the specific task, such as the number of classes in a classification problem. - **Weights and biases**: Adjustable parameters that determine the strength of the connection between neurons in adjacent layers and the bias of each neuron. These parameters are learned during the training process to minimize the difference between the network’s predictions and the actual target values. - **Loss function**: Measures the discrepancy between the network’s predictions and the actual target values. Common loss functions for MLPs include mean squared error for regression tasks and cross-entropy for classification tasks. MLPs are trained using an optimization algorithm, such as gradient descent, to iteratively adjust the weights and biases based on the gradient of the loss function. This process continues until the network converges to an optimal set of parameters that minimize the loss function. The term “multi-layer perceptron” is often used interchangeably with “deep neural network,” although some sources may consider MLPs as a specific type of deep neural network. The terminology can be confusing, but in general, an MLP refers to a specific architecture of a deep neural network, characterized by its fully connected layers and use of backpropagation for training. There are a few limitations to consider when employing MLPs: - **Computational cost**: Training MLPs can be computationally expensive, especially with large datasets or complex architectures. - **Tuning hyperparameters**: Finding the optimal number of hidden layers, neurons, and activation functions can require extensive experimentation. ## Backpropagation Backpropagation is short for “backward propagation of errors.” In the context of backpropagation, SGD involves updating the network's parameters iteratively based on the gradients computed during each batch of training data. Instead of computing the gradients using the entire training dataset (which can be computationally expensive for large datasets), SGD computes the gradients using small random subsets of the data called mini-batches. Here’s an overview of how backpropagation algorithm works: - **Forward pass**: During the forward pass, input data is fed into the neural network, and the network's output is computed layer by layer. Each neuron computes a weighted sum of its inputs, applies an activation function to the result, and passes the output to the neurons in the next layer. - **Loss computation**: After the forward pass, the network's output is compared to the true target values, and a loss function is computed to measure the discrepancy between the predicted output and the actual output. - **Backward Pass (Gradient Calculation)**: In the backward pass, the gradients of the loss function with respect to the network's parameters (weights and biases) are computed using the chain rule of calculus. The gradients represent the rate of change of the loss function with respect to each parameter and provide information about how to adjust the parameters to decrease the loss. - **Parameter update**: Once the gradients have been computed, the network's parameters are updated in the opposite direction of the gradients in order to minimize the loss function. This update is typically performed using an optimization algorithm such as stochastic gradient descent (SGD), that we discussed earlier. - **Iterative Process**: Steps 1-4 are repeated iteratively for a fixed number of epochs or until convergence criteria are met. During each iteration, the network's parameters are adjusted based on the gradients computed in the backward pass, gradually reducing the loss and improving the model's performance. ## The key differences between a multilayer perceptron and a convolutional neural network MLPs are more general-purpose networks, while CNNs are specialized for tasks involving spatial data like images. When dealing with visual data, CNNs are typically the preferred choice due to their efficiency in capturing spatial features and superior performance in computer vision tasks. |Feature| Multi-Layer Perceptron (MLP) |Convolutional Neural Network (CNN)| |----|----|-----| |Architecture |Fully connected |Convolutional| |Data Handling |Flattened data |Grid-like data (images)| |Feature Learning| General non-linear activation functions |Convolutional layers and pooling| |Applications |General purpose tasks |Computer vision tasks| ## Applications of multilayer perceptrons in machine learning MLPs are versatile tools used in various tasks, including: - **Image recognition**: Classifying images into different categories like cats, dogs, or cars. - **Speech recognition**: Converting spoken language into text. - **Natural language processing**: Understanding the meaning of text and performing tasks like sentiment analysis or machine translation. - **Time series forecasting**: Predicting future values based on past data, such as stock prices or weather patterns. ## Conclusion The initial Deep Learning algorithm was relatively basic compared to the advanced techniques available today. The Perceptron, a neural network consisting of a single neuron, was capable of understanding only linear relationships between input and output data. However, the Multilayer Perceptron revolutionized the field by enabling the incorporation of multiple layers of neurons, thus unlocking the ability to learn more intricate and complex patterns. I hope you found this exploration of algorithms enjoyable and informative!