# MEL457 AI for Engineers (till midsems) <center>B SLOT</center> <br><hr> Here, Machine Learning and Deep Learning concepts are discussed. **Libraries**: Tensorflow and Keras Books: <a href="https://www.deeplearningbook.org/">Goodfellow</a> <- Mathematical <a href="">Francois Chollet </a> <- Suggested for Begginers <a href="">Rest</a> ## Units 1. Introduction to AI 2. Fundamentals of ML 3. Fundamentals of DL 4. Model Development and <br> Programming with Python 5. DL and open-source Keras 6. Problem Based Learning 7. Group Project ## Evaluation 20 - Mid Term 50 - Final Term 30 - Daily 1 mark quiz + Assignments Stuff ## Lecture Notes #### Lecture 1 - A computer program is said to learn from experience *E*, w.r.t. a task *T* , and preformance measure, *P* , if its performance on *T*, as measured by *P*, improves with experience , *E*. #### Lecture 2 - <b>Experience ( *E* )</b> -> Refers to the data that is needed to be fed into the machine. - <b>Task( *T* )</b> -> Type of decision machine need to perform. - <b>Performance Measure ( *P* )</b> -> The measure of how well the machine is performing the given task. With ML, Scientists can extract info that human eyes cannot process fast enough on huge datasets. <diagram> Read from ppt => What is AI? - **Turing Test** -> It is a method of inquiry in AI for determining whether a computer us capable of thinking like a human being - Named after *Alan Turing* (1950) . #### Lecture 3 - Discussion about project - Terminology of AI - Model -> Mathematical representation of real world process. - Feature -> Measurable property or parameter of a dataset. - Feature Vector -> Set of multiple numeric features combined into single representation. - Training - Algorithm uses training data to learn patterns and train the model. - Prediction -> Trained model generates predicted outputs for new input data. - Target -> The value model aims to predict - Overfitting -> model learns from noise/ inaccurate data. Fails to generalize with new data. - Underfitting -> model fails to capture underlying trends in data. Lack of accuracy. #### Lecture 4 - Missed #### Lecture 5 - Date: 8-8-24 - Python Libraries for AI - Scikit-Learn - Tensorflow - PyTorch - NLTK - SpaCy - OpenCV - Pandas - NumPy - Keras - Matplotlib - Steps for Machine Learning - <u>*Data Collection*</u> : Gather relevant data form various sources. - <u>*Data Preprocessing*</u> : Cleaning and preparing the data to remove noise, missing values and in consistencies. - <u> *Feature Engineering*</u> : Selecting and creating features that best represent the data. - <u> *Data Splitting* </u> : Dividing the dataset into training, validation and test sets. - <u> *Model Selection* </u> : CHoosing appropriate machine learning algorithm based on the problem type and data characteristics. - <u> *Model training* </u> : Feeding the training data into the chosen algorithm to adjust model parameters. - <u> *Model Evaluation* </u> : Assessing model performance on validation set to measure on various evaluation metrics such as accuracy, precision, recall, F1 score and so on. - <u> *Hyperparameter Tuning* </u> : Fine tuning model hyperparameters to optimize performance. Common techniques include grid search, random search, and Bayesian optimization. - Need to revise : - Linear Algebra for data analysis (vectors,scalars,matrices) - Mathematical Analysis (derivatives and gradients) - Probability theory and Statistics - Multivariate Calculus - Algorithms and Complex optimizations - AI Techniques:: - Supervised Learning - Labelled Data - Learn a mapping fuction to accurately predict output for unseen inputs. - Key Concepts : Classification, Regression, - Application : Image Classification, NLP, Medical Diagnosis, Recommendation System, Financial Forecasting. - Unsupervised Learning - Unlabelled Data - Aims to find pattern and stuctures within the data and forms clusters. - Clustering : Similar data points are grouped together based on inherent patterns - Dimesionality Reductions : Reduce number of features or variables in data while retaining its essential characteristics. - Density Estimation : Estimate underlying probability distribution of the data. - Anomaly Detection : Identify rare or abnormal instances in the data. - Self organising Maps : Map high dimensional data onto lower dimensional data to lower dimensional grid. - Applications : Customer Segmentation, Image COmpression, Anomaly Detection, Topic Modelling. - Reinforcement learning - Agent learns to make decisions by interacting with the environment, using rewards and penalties. - Objective : To maximize cumulative reward. - Policy : Strategy that agent uses to determine next function. - Reward function : Feedback to agent after each action. - Value Function - Exploration- Exploitation Tradeoff - Markov - Deep Learning - Subset of Ml. - Enables computers to learn from large amounts of data and perform complex tasks. - ML-> 1 hidden layer - Concepts : Artificial Neural Networks, Activation Functions, Backpropogation, #### Lecture 6 - 12/8/24 - **Convolutional Neural Networks** : DL models good at image and video processing. - Conv. Layers : Kernels that slide over input image to extract features. - Feature Maps : High level patterns im imput image. - Pooling Layers : Layers to reduce dimension size while retaining important information. - Stride and Padding : Step size and extra pixels for filters. - Benefits of CNN :- - Hierarchal feature learning - Translation Invariance - Parameter sharing - Local connectivity - Applications :- - Image Classification, Object Detection, Segmentation, etc. - **GANs** : - Generative Adversial Networks : Consists of 2 NNs , generator and discriminator/critic. Used for generative tasks. - Generator - Generates fake data to "fool" discriminator. - Discriminator - Classifies fake and real data and provides feedback to generator. - Adversial process : G n D are trained simultaneously. - Application: Image Generation, Super resolution, Style Transfer,etc. - **Transformer** - Introduced in paper "Attention is all you need". - SOTA performance in NLP tasks. - Self attention mechanism : Capture dependencies between different words in a sentence. weigh importance of each word w.r.t. other words for better context understanding. - Encoder - Decoder Architecture : Encoder processes input, decoder generates output. Highly parallelizable. - Multi-head Attention, Positional Encoding, Feed-Forward neural networks. - Applications : MAchine Translation, Language modelling, speech recognition, Q A , Sentiment Analysis. - Question :: Find, from literature, applications other than listed. - **Causal Learning**: - Interpreting the insides of model. - Fundamental concept that focuses on understanding the cause and effect relationship between variables. - Refer ppt. - **Explainable AI** : - Refer ppt. #### Lecture 7 - 14-08-2024 - <a href="https://colab.research.google.com/drive/15wc9HtGztd62BgX-6dmZ-yecUIFBtqRp" >Colab</a> - Scalar : Only magnitude - Vector : Dimensional Scalars - Tensors : n Dimensional Vectors - Linear Algebra - Calculus - Python, tf & keras ,pytorch ```python def loss_function(x): import sympy as sp # Define the symbolic variable x = sp.symbols('x') # Define the loss function def loss_function(x): return 3*x**2 + 4*x + 6 # Define the derivative of the loss function symbolically def derivative_loss_function(x): return sp.diff(loss_function(x), x) # OR sp.lambdify() # Gradient descent learning_rate = 0.1 iterations = 100 initial_x = 0 # Precompute the derivative gradient_function = derivative_loss_function(x) for i in range(iterations): # Substitute the current value of initial_x into the derivative gradient = gradient_function.subs(x, initial_x) initial_x -= learning_rate * gradient print("Optimized value of x:", initial_x) ``` #### Lecture 8 - 19-08-24 - Probability and Statistics - Code for Gradient Descent using Tensorflow ```python import numpy as np import tensorflow as tf def loss_function(x): return x**2 + 2*x + 1 x = tf.Variable(0.0) learning_rate = 0.1 optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate) iterations = 100 for i in range(iterations): with tf.GradientTape() as tape: loss = loss_function(x) gradients = tape.gradient(loss, x) optimizer.apply_gradients([(gradients, x)]) print("Optimized value of x:", x.numpy()) ``` #### Lecture 9 - 21-08-24 - 4 basic aspects of OOPs: Encapsulation Inheritance Polymorphism Abstraction - __init__ : constructor - set initial values for attributes of object - self: it is a reference to the current instance of the class #### Lecture n - 02-09 - Data Preprocessing with Sci-kit Learn look at the max , min value and then scale . If variation is very large then the dataset needs to be processed,so that variation is not that much. - Data Encoding : Categorical Variables need to be encoded for ML models. - *Features of Tensorflow*: - Flexible and efficient - Scalable - Support for multiple platforms - Extensive Ecosystem - *Tensors*: - Multidimensional arrays with uniform type - dtype : type of all elements in tensor - shape : size along all axes - Rank : Scaler -> 0, Vector -> 1 #### Lecture n+1 - 04-09 - *tf.GradientTape* : Automatic Differentiation and gradient computing with inputs of *tf.Variables* . - *Back-Propogation* : Calculation of gradients using chain rule to propogate grads backward. - Use of gradientTape Code. [Colab Notebook](https://colab.research.google.com/drive/1kYrOF0i57xcfM-gmlw6GM-jgE-lDA8Zk#scrollTo=GRwv7h7ZYiOV) . #### Lecture n+2 - 05-09 - Coding 2 layer grad descent. - tf.function - *Keras* - Open Source Higher level API. - Designed to quickly build and experiment with DL Models. - 2 APIs.<br> (1) Sequential API : Simpler, but less flexible (2) Functional API : Complex, but more flexibility #### Lecture n+3 - <a href="https://colab.research.google.com/drive/1SHcoQRXkmozlqpSslQbl6ebDQ1MeAkF4">Colab Notebook</a> ### Pyq solutions - **Sessional 2 (2022)** ![WhatsApp Image 2024-09-24 at 2.45.41 PM](https://hackmd.io/_uploads/BkEUY7bR0.jpg) --- ### Q1. Consider the neural network with one hidden layer with 26 neurons and sigmoid activation function. Also, consider output layer with one neuron. Answer the following questions: #### a) Draw the computational graph for the given network. **Answer:** The computational graph would involve the following layers and components: 1. **Input Layer**: 15 features (input dimension is 15). 2. **Hidden Layer**: 26 neurons with sigmoid activation. Formula: \( h = \sigma(W_1 X + b_1) \) where \( W_1 \in \mathbb{R}^{26 \times 15} \), \( b_1 \in \mathbb{R}^{26} \), \( \sigma \) is the sigmoid function. 3. **Output Layer**: 1 neuron with no activation specified. Formula: \( y_{\text{out}} = W_2 h + b_2 \) where \( W_2 \in \mathbb{R}^{1 \times 26} \), \( b_2 \in \mathbb{R}^{1} \). #### b) If input has 15 features and batch size of 20, write the shape of each output after every function in the forward pass. Also, use MSE as a loss function. **Answer:** - **Input Layer**: Shape = (20, 15) (20 is the batch size, 15 is the number of features). - **Hidden Layer**: Shape = (20, 26) (26 is the number of neurons in the hidden layer). - **Output Layer**: Shape = (20, 1) (1 output neuron for binary classification or regression). - **MSE Loss Calculation**: Mean Squared Error will compare the predicted output \( \hat{y} \) of shape (20, 1) with the true labels \( y \) of shape (20, 1). #### c) For the given network, create the back-propagation and write the gradient of loss with each input to every function including the shape of the array. **Answer:** For backpropagation: - **Output Layer (Loss Gradient w.r.t. Output)**: \( \frac{\partial L}{\partial \hat{y}} = \hat{y} - y \), where \( \hat{y} \) is the predicted output and \( y \) is the true label. **Shape of \( \frac{\partial L}{\partial \hat{y}} \)**: (20, 1) - **Hidden Layer (Gradient of Output w.r.t. Hidden Layer)**: \( \frac{\partial \hat{y}}{\partial W_2} = h \) and \( \frac{\partial \hat{y}}{\partial h} = W_2 \). **Shape of \( \frac{\partial h}{\partial W_2} \)**: (1, 26) - **Backpropagation into Hidden Layer**: \( \frac{\partial L}{\partial h} = \frac{\partial L}{\partial \hat{y}} \cdot W_2^T \). **Shape of \( \frac{\partial L}{\partial h} \)**: (20, 26) - **Gradient of Hidden Layer w.r.t. Weights**: \( \frac{\partial h}{\partial W_1} \) depends on the input \( X \), and the activation function \( \sigma'(z) \). \( \frac{\partial L}{\partial W_1} = \frac{\partial L}{\partial h} \cdot \frac{\partial h}{\partial z_1} \cdot X^T \). **Shape of \( \frac{\partial W_1}{\partial L} \)**: (26, 15) #### d) To create the code for forward and backward propagation how many total number of operations is needed for the network with two hidden layers. **Answer:** For two hidden layers, the total number of operations can be broken down into: - Forward pass: 3 layers (Input -> Hidden Layer 1 -> Hidden Layer 2 -> Output Layer). - Backward pass: Gradients w.r.t. each layer’s weights, biases, and activations. - For each layer, the key operations are matrix multiplication (dot products), activation function (e.g., sigmoid), and element-wise operations. Hence, 3 forward passes and 3 backpropagation passes are needed for a two-hidden-layer network. --- ### Q2. Answer the following questions: #### a) Why the non-linear activation function is needed in the neural network? **Answer:** Non-linear activation functions (like Sigmoid, ReLU) are essential because they allow the network to model complex, non-linear relationships. Without non-linearity, the network would behave as a linear model, limiting its ability to solve tasks like image recognition, language processing, etc. #### b) What do you mean by deep learning models? **Answer:** Deep learning models refer to neural networks with multiple layers between the input and output. These layers allow the model to learn hierarchical representations, making deep learning suitable for tasks like image classification, natural language processing, etc. #### c) What are the functions you should have when you design the Operation as a base class? **Answer:** When designing an `Operation` as a base class in a neural network, the following methods are typically required: - **Forward method**: Defines how the input data is transformed as it passes through the layer. - **Backward method**: Defines how to compute gradients during backpropagation. - **Update method**: (Optional) for updating parameters like weights and biases. #### d) Why did we need ParamOperation as a class where we already have Operation as a base class? **Answer:** The `ParamOperation` class extends the `Operation` class to include operations that have trainable parameters, such as weights and biases. While `Operation` might only perform basic operations (like activation functions), `ParamOperation` includes the logic for updating parameters during backpropagation. #### e) Write in order the operations are needed in the fully connected layers. **Answer:** In a fully connected layer, the following operations occur: 1. **Linear Transformation**: \( Z = W X + b \) 2. **Activation Function**: Apply activation (e.g., sigmoid, ReLU). 3. **Loss Calculation**: Compare output with true labels using a loss function. 4. **Backpropagation**: Compute gradients for the weights and biases. 5. **Parameter Update**: Adjust weights and biases using optimization algorithms like SGD, Adam, etc. #### f) How do we combine NeuralNetwork and Optimizer class? **Answer:** The `NeuralNetwork` class holds the architecture (layers and operations), while the `Optimizer` class is responsible for updating the weights based on computed gradients. We combine them by passing the neural network’s parameters to the optimizer, which then uses the gradients to update the parameters after each training step. --- ### Q3. Answer the following questions: #### a) Explain the following lines from the fit function in the Trainer class: ```python for ii, (X_batch, y_batch) in enumerate(batch_generator): self.net.train_batch(X_batch, y_batch) self.optim.step() ``` **Answer:** This code snippet is part of the training loop: - **batch_generator**: Yields batches of input and output data for mini-batch training. - **self.net.train_batch(X_batch, y_batch)**: Performs a forward pass, computes the loss, and calculates the gradients for the current batch of data. - **self.optim.step()**: Updates the model's parameters using the gradients calculated during the backpropagation step. #### b) Explain the following lines from the forward function in the NeuralNetwork class: ```python def forward(self, X_batch: ndarray) -> ndarray: x_out = X_batch for layer in self.layers: x_out = layer.forward(x_out) return x_out ``` **Answer:** This function performs a forward pass through the neural network: - **x_out = X_batch**: Initializes the input to the first layer. - **for layer in self.layers**: Loops through each layer in the network. - **x_out = layer.forward(x_out)**: Feeds the input through each layer, updating \( x_out \) with the output of the current layer. - **return x_out**: Returns the final output after passing through all layers. --- - **Midsem 2023** [Paper](https://drive.google.com/file/d/1QuEl-jZM6gfbJTosrlBavs-UI7LkGFki/view?usp=sharing) Soution: ### Q1: **a)** **Find the derivatives $$ ( \frac{df}{dx} \ ) and ( \frac{df}{dy} ) $$ for the function $$ ( f(x, y) = \text{ReLU}(x \cdot y) ) $$:** - The ReLU (Rectified Linear Unit) function is defined as \( \text{ReLU}(z) = \max(0, z) \). Therefore, \( f(x, y) = \text{ReLU}(x \cdot y) \). - For \( x \cdot y > 0 \), \( f(x, y) = x \cdot y \), so: $$ \frac{df}{dx} = y, \quad \frac{df}{dy} = x $$ - For \( x \cdot y \leq 0 \), \( f(x, y) = 0 \), so: $$ \frac{df}{dx} = 0, \quad \frac{df}{dy} = 0 $$ **b)** **Write important operations in proper order for Dense layer (Fully connected layer):** 1. Matrix multiplication between input and weights. 2. Add bias. 3. Apply activation function (if any). 4. Produce the output. **c)** **When do we say that a computer program is learning?** - A computer program is said to be learning if its performance improves on a task over time as it processes more data, typically by updating parameters to minimize a cost or loss function. **d)** **What are the two neural networks that comprise Generative Adversarial Networks (GANs)?** - **Generator Network**: Produces fake data samples. - **Discriminator Network**: Distinguishes between real and fake data samples. **e)** **What is the primary focus of causal learning in artificial intelligence?** - Causal learning aims to identify and understand the cause-and-effect relationships between variables, which is more informative than simply detecting correlations. **f)** **Using Gradient Descent, find the optimum value of \( x \) that minimizes the function \( f(x) = x^3 - 4x^2 + 2x + 1 \):** 1. Compute the derivative: $$ f'(x) = 3x^2 - 8x + 2 $$ 2. Set the derivative to 0 and solve for \( x \): $$ 3x^2 - 8x + 2 = 0 $$ Using the quadratic formula: $$ x = \frac{8 \pm \sqrt{64 - 24}}{6} = \frac{8 \pm \sqrt{40}}{6} \approx 2.05, 0.32 $$ These are the critical points. **g)** **Given a rank-4 tensor with the shape [3, 2, 4, 5], what could be a possible interpretation of each dimension in a deep learning context?** - **3**: Batch size (number of samples in the batch). - **2**: Channels (e.g., RGB image would have 3 channels). - **4**: Height (dimension of the data, e.g., image height). - **5**: Width (dimension of the data, e.g., image width). **h)** **What TensorFlow API is designed for automatic differentiation, and what does it commonly operate on?** - The TensorFlow API for automatic differentiation is **`tf.GradientTape`**, which commonly operates on tensors to compute gradients during backpropagation. **i)** **What mathematical rule is utilized in the gradient calculation process and what is its function?** - The **chain rule** is utilized in gradient calculation. It helps in computing the derivative of composite functions by multiplying the derivatives of inner and outer functions, which is essential in backpropagation for neural networks. **j)** **In the context of creating a neural network for regression analysis, which activation function should ideally be utilized in the output layer?** - For regression tasks, typically, a **linear activation function** (or no activation function) is used in the output layer to predict continuous values. --- ### Q2: **Consider a scenario where we have an input matrix \( X \) with dimensions \( 1 \times 4 \) (representing 1 sample with 4 features) and a weight matrix \( W \). The output \( N \) is obtained by multiplying \( X \) and \( W \). Demonstrate that the gradient of \( N \) with respect to \( W \) (denoted as \( \frac{dN}{dW} \)) will be equal to the transpose of \( X \) (denoted as \( X^T \)):** - The output \( N = X \cdot W \). - The gradient \( \frac{dN}{dW} \) is the partial derivative of the product of \( X \) and \( W \). Given that the dimensions are compatible for multiplication, the gradient of the matrix product \( XW \) with respect to \( W \) is \( X^T \) (the transpose of \( X \)). --- ### Q3: Given the code snippet and assuming \( X \) has the shape \( (500, 13) \): - **a)** **What will be the input dimension for the neural network, and how is it determined?** The input dimension is **13**, as this is determined by the number of features in the input matrix \( X \), which has 13 columns. - **b)** **Determine the shape of the weight matrix for the first hidden layer:** The first hidden layer has **64 neurons** and the input dimension is 13. Therefore, the weight matrix will have the shape **(13, 64)**. - **c)** **What do the 500 and 13 represent in the shape of \( X \) (500, 13)?** - **500**: The number of samples in the dataset. - **13**: The number of features for each sample. - **d)** **If a bias term is added to each neuron in the hidden layers and output layer, what will be the total number of parameters in the first hidden layer?** For the first hidden layer: - The number of parameters (weights) is \( 13 \times 64 = 832 \). - Adding biases (one for each of the 64 neurons), the total number of parameters is \( 832 + 64 = 896 \). - **e)** **How many neurons are there in the second hidden layer and what is their activation function?** There are **32 neurons** in the second hidden layer, and the activation function is **ReLU**. - **f)** **What kind of model is being created in the code snippet?** The model being created is a **feedforward neural network** (fully connected neural network), used for regression or classification, depending on the loss function applied later. --- ### Q4: **Consider the function defined as \( f(x, y, z) = (x + y) \cdot z \), where \( x, y, z \) are 2D matrices with shapes \( 2 \times 2 \):** - **a)** **Represent the function in a computational graph clearly denoting the operations and dimensions of matrices at each node:** - Operation 1: \( x + y \) (both \( 2 \times 2 \)). - Operation 2: Multiply \( (x + y) \) by \( z \) (both \( 2 \times 2 \)). - **b)** **Calculate the following partial derivatives using the given values of \( x, y, z \):** 1. \( \frac{dg}{dz} \): Differentiate \( g \) with respect to \( z \). 2. \( \frac{dg}{dx} \): Differentiate \( g \) with respect to \( x \). 3. \( \frac{dg}{dy} \): Differentiate \( g \) with respect to \( y \). --- ### Estimated Syllabus Based on the Midsem Exam Paper: 1. **Partial Derivatives and Functions (ReLU, Gradient Descent)**: - Partial differentiation (e.g., \( \frac{df}{dx}, \frac{df}{dy} \)). - ReLU function and its properties. - Gradient Descent and optimization techniques. 2. **Neural Networks (Fully Connected Layers, Generative Adversarial Networks)**: - Dense (fully connected) layers in neural networks. - Operations in neural networks (forward pass, backpropagation). - Generative Adversarial Networks (GANs). 3. **Causal Learning and Tensor Representation**: - Causal learning and its applications in AI. - Tensors and their dimensions in deep learning. - Rank, shape, and operations on tensors. 4. **TensorFlow and Automatic Differentiation**: - TensorFlow APIs (e.g., `tf.GradientTape` for automatic differentiation). - Gradient calculation using the chain rule. 5. **Neural Networks for Regression Analysis**: - Building neural networks for regression. - Activation functions in regression tasks. 6. **Matrix Operations and Gradients**: - Matrix multiplication and gradients in neural networks. - Calculating gradients with respect to weight matrices. --- ### Additional Questions for Each Topic: #### 1. **Partial Derivatives and Functions**: 1. Find the partial derivatives \( \frac{\partial f}{\partial x} \) and \( \frac{\partial f}{\partial y} \) for \( f(x, y) = \sin(x \cdot y) \). 2. For the function \( f(x, y) = \max(0, x^2 + y) \), calculate \( \frac{\partial f}{\partial x} \) and \( \frac{\partial f}{\partial y} \). 3. Using gradient descent, minimize the function \( f(x) = 5x^2 - 4x + 2 \). 4. What is the derivative of the function \( f(x, y) = e^{x \cdot y} \) with respect to \( x \) and \( y \)? 5. How does ReLU affect the derivative of a neural network during backpropagation? 6. Find the gradient of \( f(x) = 4x^2 - 2x + 3 \) at \( x = 1 \). 7. Calculate the partial derivative \( \frac{\partial f}{\partial x} \) for \( f(x, y) = \log(x \cdot y) \). 8. Derive the gradient descent update rule for minimizing \( f(x) = x^2 - 6x + 8 \). 9. What happens to the gradient if we apply ReLU to a negative value? 10. Find the critical points of \( f(x) = x^4 - 4x^3 + 6x^2 \). #### 2. **Neural Networks**: 1. Describe the sequence of operations in a dense layer with an input of size 10 and 5 neurons. 2. What role does the activation function play in a neural network? 3. How do you determine the number of neurons in a hidden layer? 4. Explain the difference between the Generator and Discriminator in a GAN. 5. What are the key components of a neural network used for classification tasks? 6. How does backpropagation help update weights in a fully connected layer? 7. Describe how a GAN can be used to generate synthetic images. 8. How is the loss function calculated in a neural network with a softmax output? 9. What is the advantage of using ReLU over sigmoid activation in hidden layers? 10. Explain the concept of weight sharing in Convolutional Neural Networks (CNNs). #### 3. **Causal Learning and Tensor Representation**: 1. What is the difference between correlation and causation in causal learning? 2. How can causal inference be applied in a healthcare AI model? 3. Interpret the shape of a rank-3 tensor in a deep learning model. 4. Explain the significance of tensor rank in representing multidimensional data. 5. How does the dimensionality of tensors affect the computational cost in deep learning? 6. What is a rank-5 tensor, and how might it be used in a neural network? 7. Describe a scenario where causal learning could improve the performance of a recommendation system. 8. What techniques are used to estimate causal relationships in AI models? 9. How do tensors represent video data in deep learning models? 10. How do you calculate the derivative of a function involving a rank-4 tensor? #### 4. **TensorFlow and Automatic Differentiation**: 1. How does TensorFlow’s `tf.GradientTape` facilitate automatic differentiation? 2. What are the benefits of using `tf.GradientTape` for training neural networks? 3. Explain how gradients are computed in TensorFlow during backpropagation. 4. Describe a scenario where automatic differentiation is essential for optimizing a model. 5. How does TensorFlow handle gradients for complex models with multiple outputs? 6. Why is automatic differentiation critical for deep learning optimization? 7. What is the role of `tf.Variable` in TensorFlow’s gradient calculation? 8. How would you modify a TensorFlow model to include a custom loss function? 9. Explain the concept of gradient clipping and why it is used. 10. How can you monitor gradient flow through a TensorFlow model during training? #### 5. **Neural Networks for Regression Analysis**: 1. What activation function is ideal for the output layer of a regression neural network? 2. How does Mean Squared Error (MSE) serve as a loss function for regression models? 3. Why would you avoid using a sigmoid activation function in regression problems? 4. Describe how you would construct a neural network for predicting housing prices. 5. How is the output of a regression model different from that of a classification model? 6. What are the consequences of not normalizing inputs for a regression neural network? 7. How would you modify a neural network model to perform multivariate regression? 8. What is the purpose of using a linear activation function in the output layer? 9. Explain how L2 regularization helps reduce overfitting in regression networks. 10. What metrics would you use to evaluate a regression neural network's performance? #### 6. **Matrix Operations and Gradients**: 1. What is the derivative of the matrix product \( XW \) with respect to \( W \)? 2. Explain how matrix multiplication works in the context of a neural network forward pass. 3. How is the gradient of a loss function propagated through layers during backpropagation? 4. Demonstrate how to compute the gradient of a matrix function using the chain rule. 5. What is the role of the transpose in calculating gradients for matrix multiplications? 6. How do matrix dimensions affect the computational complexity of a neural network? 7. Explain the process of calculating gradients in a network with matrix inputs. 8. What are the properties of matrix operations in deep learning frameworks like TensorFlow? 9. Why is it important to understand matrix differentiation when building deep learning models? 10. How does gradient descent optimize weights in matrix form for linear regression? #### Lecture 1 (After mids) - Midsem Solution Discussion