# Assignment 1: Setup and Mathematical Foundations In this assignment we are going to go through all the setup and background we need for this course! We've divided it into 3 sections as follows: 1. Setting up our python environment. 2. Review some Python constructs we will rely on 3. Review math that we will use throughout this course. :::info This homework should be completed by **Thursday, September 18, 2025 at 10:00 PM EST** ::: ## 1. Environment Setup ### Roadmap In order to complete programming assignments for this course, you will need a way to code, run, and debug your own Python code. While you are always free to use department machines for this (they have a pre-installed version of the course environment that every assignment has been tested against), you are also free to work on your own machines. Below we give you some information that is helpful for either of these situations. ### A. Configuring Environment #### Recommended Workspace Structure The setup will automatically create the virtual environment (set up below) in the parent directory of your assignment repo. This creates a clean, organized structure: ```python csci1470-course/ ← Parent directory (create this) ├── csci1470/ ← Virtual environment (will be generated) ├── HW0/ ← Assignment repos (you clone here) ├── HW1/ ├── HW2/ └── ... ``` **This means you are creating a parent directory to host all of the assignments you will clone from Github Classroom to keep your virtual environment accessible!** ### Getting started 1. Let's create your parent directory that will be the home of all your future assignment repositories. First, make sure you are in the correct working directory (i.e. `Desktop`, `Documents`, etc) ```bash mkdir csci1470 cd csci1470 ``` 2. Please click <ins>[here](https://classroom.github.com/a/krLV2FRR)</ins> to get the stencil code. - *Reference this <ins>[guide](https://hackmd.io/gGOpcqoeTx-BOvLXQWRgQg)</ins> for more information about GitHub and GitHub Classroom.* 3. Once you have cloned the Github Classroom assignment for this assignment, you can proceed to the following section to set up your virtual environment. #### Developing Locally In order to set up your virtual environment for this course, we use Python's built-in venv (virtual environment) system. This ensures compatibility with the GPU cluster and provides a more streamlined setup process. ### Virtual Environment 1. Verify you have **Python** installed on your computer and your version is **3.11.x** (IT MUST BE 3.11.x) by running the following command in your terminal ```bash python --version # Should show: Python 3.11.x ``` :::danger If you get an error along the lines of `python: command not found` or a version other than 3.11.x, you may need to install **Python** manually. However, first try running the following command and see if you get an output: ```bash python3.11 --version # Should show: Python 3.11.x ``` If you get an output, replace any instances of `python` with `python3.11` in the following code blocks. If you do not see an output, you can proceed to the following cells based on your operating system ::: :::spoiler **Mac Users** 1. Check if you have Hombrew or install it (package manager): ```bash brew --version # If you do not see an output or get an error, run # Install Homebrew only if you do not see a version /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" ``` 2. Install Python 3.11: ```bash brew install python@3.11 ``` 3. Verify Python installation: ```bash python3.11 --version # Should show: Python 3.11.x ``` ::: :::spoiler **Windows Users** 1. Install Python 3.11: - Go to python.org/downloads - Download Python 3.11.x for Windows - IMPORTANT: Check "Add Python to PATH" during installation - Choose "Install for all users" if you have admin rights 2. Verify Python installation: ```bash python3.11 --version # Should show: Python 3.11.x ``` ::: 2. Open a new terminal window and navigate to the root of the **cloned assignment** (*HW0-Setup*), not the parent directory, in a terminal (such as the one in VSCode) using `cd` and `ls`. You are now going to set up your virtual environment named `csci1470` on your computer. **If you run into any issues, refer below.** (See below.) Run the following command: ```bash python3.11 env_setup/setup.py ``` :::warning If you are the above procedure doesn't work for you, here are some alternate pathways :::spoiler Common Issues and Solutions - **"Python 3.13 not supported by TensorFlow 2.15"** - TensorFlow 2.15 only supports Python 3.11-3.12 - **macOS**: `brew install python@3.11` then use python3.11 - **Windows/Linux**: Download Python 3.11 from python.org - **Check version**: `python3.11 --version` should show 3.11.x - **"Permission denied" errors** - **macOS/Linux**: Use `sudo` with the setup script or `chmod +x <script>` - **Windows**: Run Command Prompt as Administrator if needed - **TensorFlow installation fails** - Try updating pip first: `pip install --upgrade pip` - Use the no-cache option: `pip install --no-cache-dir tensorflow==2.15.*`` - On Apple Silicon Macs, ensure you have the latest macOS - **"Microsoft Visual C++ 14.0 is required" (Windows)** - Download from [Microsoft Visual Studio](https://visualstudio.microsoft.com/visual-cpp-build-tools/) - Install "C++ build tools" workload - **Import errors for scientific packages** - **macOS**: Install system dependencies: `brew install hdf5 pkg-config` - **Linux**: Install development headers: `sudo apt install libhdf5-dev pkg-config` - **Platform-Specific Notes** - **Apple Silicon (M1/M2/M3) Macs** - Some packages may install `x86_64` versions that run under `Rosetta 2` - TensorFlow 2.15 has native Apple Silicon support - If you encounter issues, try: `arch -arm64 python3 setup_environment.py` ::: :::danger If all else fails, you can run the following steps :::spoiler 1. **Create virtual environment**: Navigate to the parent repository and run the following command. This will create a new virtual environment caled `cs1470`. ```bash python -m venv csci1470 # Windows python3 -m venv csci1470 # macOS/Linux ``` 2. **Activate Environment** You need to activate the environment before you can install any of the packages ```bash source csci1470/bin/activate # macOS/Linux csci1470\Scripts\activate # Windows ``` 4. **Install the following packages** in the virtual environment. Make sure to update your pip first, and then install the packages ```bash pip install --upgrade pip # update pip pip install tensorflow==2.15.* numpy pandas matplotlib scipy Pillow tqdm h5py PyYAML pytest ipywidgets scikit-learn pandas-datareader imageio kaggle wandb gradescope_utils tensorflow-datasets python -m ipykernel install --user --name csci1470 --display-name "DL-S25 (3.11)" ``` ::: 3. Run the following command to activate your environment depending on your operating system. **You will need to do this step before working on every assignment in this course. Read more in section B**. ```bash # Activate environment source ../csci1470/bin/activate # macOS/Linux ..\csci1470\Scripts\activate # Windows ``` Once this is complete, you should have a local environment to use for the course! #### Department Machines :::info **Note:** Sometimes even if you set up your local environment correctly, you may experience unexpected bugs and errors that are unique to your local setup. To prevent this from hindering your ablity to complete assignments, we **highly recommend** that you familiarize yourself with the department machines, even if you expect to usually be working locally. ::: Department machines serve as a common, uniform way to work on and debug assignments. There are a variety of ways in which you can use department machines: 1. **In Person.** If you are in the CIT, you can (almost) always head into the Sunlab/[ETC] and work on a department machine. 2. **FastX**. FastX allows you to VNC into a department machine from your own computer, from anywhere! A detailed guide to getting FastX working on your own computer can be found [here](https://cs.brown.edu/about/system/connecting/fastx/). 3. **SSH**. The department machines can also be accessed by SSH (Secure Shell) from anywhere, which should allow you to perform command line activities (cloning repositories, running assignment code). You can check out an SSH guide [here](https://cs.brown.edu/about/system/connecting/ssh/). When using the department machines, you can activate the course virtual environment (which we have already installed) using: ``` source /course/cs1470/cs1470_env/bin/activate ``` Which will activate the course virtual environment. From here, you should be able to clone the repository (see a GitHub guide here for more information on using Git via the command line), and work on your assignment. :::info **Note**: Python files using `tensorflow` may require a little more time on startup to run on department machines (likely because it is pulling files from the department filesystem), but they should all run nonetheless. ::: ### B. Test your environment #### What is an environment? Python packages, or libraries, are external sets of code written by other industry members which might prove really helpful! (Imagine coding how to draw a graph in Python every single time) However, different classes, tasks, and even projects, might require different sets of Python packages. We can manage these as different virtual environments which have different sets of packages installed. #### Virtual Environment Specifics When you activate your virtual environment, you'll notice the `(csci1470)` prefix in your terminal. This signifies that you're using the CSCI1470 course environment with all the correct packages installed. :::info **Key Commands** - **Activate environment:** Depending on your environment, you can run: ```bash source ../csci1470/bin/activate # macOS/Linux ..\csci1470\Scripts\activate # Windows ``` - **Deactivate environment:** ```bash deactivate ``` - **Check if environment is active:** - Look for ``(csci1470)`` at the start of your terminal prompt ::: :::success **TODO: Running a Test Network** To make sure all your packages are installed correctly, run the following command to train a small neural network. This will ensure your TensorFlow and NumPy are ready for the next assignment. Follow the steps below: 1. make sure you are in the root directy for this assignment, `hw1-setup-<github username>`. 2. Make sure your virtual environment is activate (`csci1470` prefrix in your terminal) 3. Adjust the number of epochs on line 10 of `code/test_network.py` from 75 to 10 (this will make it train faster) 4. run the following command ```python! python code/test_network.py ``` 5. Your network will take a few minutes to train (no longer than 5, if it does, you have an issue). You can then observe your network's learned decision boundaries in the image `neural_galaxy.png` in your root directory. ::: ## 2. Python Review ### A. Basic Python Review For an overview of python syntax and common python uses we recommend checking out <ins>[this](https://www.w3schools.com/python)</ins> python tutorial for a refresher. ### B. Advanced Python Throughout this section, you should be filling in the corresponding classes and methods in `python_tutorial.py`. #### I. Dunder Methods Double underscore (Dunder) methods [overload operators](https://en.wikipedia.org/wiki/Operator_overloading) for python objects (e.g., `+`, `*`, `[]`). While operator overloading can be [controversial](https://medium.com/@rwxrob/operator-overloading-is-evil-8052a8ae6c3a), it is commonly used in tensorflow, numpy, and many other common libraries. ##### Constructors :::info **Task:** Write up a Python class for a `Square` whose constructor (the `__init__` method) takes in a string `name` and numeric `length` field. You can use this code to verify functionality: ```python square1 = Square("square1", 5) square1.name == "square1" square1.length == 5 ``` ::: ##### Calling We can give a class some special interaction patterns with other Dunder methods. If we give a class, say a `Multiplier`, a `__call__` method, then we can specify what happens when we call an instance of the class! For example: ```python multer = Multiplier() multer(5, 10) ``` This is effectively the same as calling: ```python multer.__call__(5, 10) ``` :::info **Task:** Write up a Python class `Multiplier` which, when called on two integers, returns the product of the two integers Use this code to check: ```python multer = Multiplier() multer(5, 10) == 50 ``` ::: #### II. Classes and OOP :::warning Throughout this course, we will be working with Python extensively. Though you won't need to be an OOP expert, we do expect some basics which help Deep Learning libraries work in organized and efficient ways. ::: ##### Objects vs Classes OOP (Object Oriented Programming) strongly focuses on objects, which encompass any value, varaible, *etc*. It's really any "tangible" thing. ```python thing1 = "i am an object!" thing2 = 1234567 ... ``` However, we might find it useful to organize these things into classes of things, with properties like instance variables or methods shared across all things that are members of the same class. You might be familiar with Python's `str`, `int`, and `float` classes, for example. When working with Python, you'll almost always be working with objects which are instances of classes. You can check the type of a variable with the built-in `type` method! ##### Inheritance If classes are sets and objects are set elements, then we also need subsets and supersets! In Python, we can make a "child" class inherit all the methods from a "parent" class like so: ```python class ChildClass(ParentClass1): ``` ##### Class-level vs Instance Variables Consider the instance of our `Square` earlier. ```python square1 = Square("square1", 5) square1.name == "square1" square1.length == 5 ``` :::success Notice that the `name` and `length` variables are instance variables! ::: In contrast with instance variables, class level variables are shared across all instances of the class. Say in the declaration of `Square`, we included this: ```python class Square: shape = "square" def __init__(...): ... ``` Then, if we checked `square1.shape` or `square2.shape`, you'll notice that they both return the string `"square"`. This also applies to checking the class directly: `Square.shape` will also return `"square"`! `shape` is a class level variable because it is shared across the whole class! :::warning Class variables can be redefined from any instance (*e.g.* `square1.shape = "rectangle"`) or directly through the class (*e.g.* `Square.shape = "rectangle"`), but we **strongly** recommend doing it through the class directly. ::: ##### Putting it Together :::info __Task:__ Make a parent class named `Logger`. Above the constructor, include this line: ```python logging_tape: LoggingTape | None = None ``` This will be our log of things that happen! The `: LoggingTape | None` indicates that the variable `logging_tape` will either be of type `LoggingTape` (which we'll make in just a second), or `None` ::: #### IV. Context Managers Context managers in Python are a great tool for temporarily defined things. For instance, you have probably seen ```python with open('file.txt', 'r') as f: # do things with the file f #f is now closed, do other things! ``` You'll notice that `f` is only properly defined as being the file opened with read permissions while in the "context" of the `with` statement (within the `with` statement's indent block) This is a context manager! Context managers derive their functionality from special Dunder Methods `__enter__` and `__exit__`. Let's work through an example! Say we want to set the **class variable** `Logger.logging_tape` to be a new `LoggingTape`, but only temporarily. Sounds perfect for a `with` statement, huh? Here's a starter: ```python class LoggingTape: def __init__(self): ... def __enter__(self): ... def __exit__(self, *args): ... def add_to_log(self, new_log): ... def print_logs(self): for log in self.logs: print(log) ``` We might see some code using the `LoggingTape` like ```python= with LoggingTape() as tape: ... ... ``` On line 1, `LoggingTape`'s `__enter__` method is called to enter the `with ` statement. Then, after the indent block (so after line 2 but before line 3), `LoggingTape`'s `__exit__` method is called to exit from the `with` statement. :::info __Task:__ In `LoggingTape`'s constructor, make an empty list called `logs`. We'll store strings as messages of logs of whatever happened Then, in `__enter__`, set `Logger.logging_tape = self` and `return self`. We're setting a class level variable! Next, in `__exit__`, set `Logger.logging_tape = None` In `add_to_log`, append `new_log` to the end of `logs`. ::: Now, check it out this code block: ```python= with LoggingTape() as tape: #runs LoggingTape's __enter__() #Logger.logging_tape is now defined as tape (from line 1)! tape.add_to_log("Hi!") #runs LoggingTape's __exit__() #Now Logger.logging_tape is defined as None ``` This might seem a little trivial now, but what this enables us to do is have any `Logger` class record to `tape` while inside the `with` statement (lines 2-3 in the example)! Say we have a car class: ```python= class Car(Logger): def travel(self, distance): self.logging_tape.add_to_log(f"Traveled Distance {distance}") ``` ```python= car = Car() with LoggingTape() as tape: car.travel(5) tape.print_logs ``` :::success The output will be "Traveled Distance 5". The LoggingTape kept track of the logged item automatically for us. I wonder if this will be useful in Homework 3... ::: ## 3. Math Review :::info **Task 0:** In each of the following sections, please read though the review and complete the problems in each of the blue boxes. You are welcome to write these up however you'd like although we encourage you to use LaTeX. Please include your solutions to your submission on gradescope. ::: ### A. Matrix Multiplication 1. Given two column vectors $a \in \mathbb{R}^{m \times 1}, \; b \in \mathbb{R}^{n \times 1}$ the _outer product_ is $$\mathbf{a}\times \mathbf{b}^T = \begin{bmatrix}a_0 \\ \vdots \\ a_{m-1}\end{bmatrix} \times \begin{bmatrix}b_0 \\ \vdots \\ b_{n-1}\end{bmatrix}^T = \begin{bmatrix} a_0 b^T\\ \vdots \\ a_{m-1} b^T\\ \end{bmatrix} = \begin{bmatrix} a_0 b_0 & \cdots & a_0 b_{n-1}\\ \vdots & \ddots & \vdots \\ a_{m-1} b_0 & \cdots & a_{m-1} b_{n-1}\\ \end{bmatrix} \in \mathbb{R}^{m\times n} $$ 2. Given two column vectors $\mathbf{a}$ and $\mathbf{b}$ both in $\mathbb{R}^{r\times 1}$, the _inner product_ (or the _dot product_) is defined as: $$ \mathbf{a} \cdot \mathbf{b} = \mathbf{a}^T\mathbf{b} = \begin{bmatrix} a_0\ \cdots\ a_{r-1} \end{bmatrix} \begin{bmatrix}b_0 \\ \vdots \\ b_{r-1}\end{bmatrix} = \sum_{i=0}^{r} a_i b_i $$ where $\mathbf{a}^T$ is the _transpose_ of a vector, which converts between column and row vector alignment. The same idea extends to matrices as well. 3. Given a matrix $\mathbf{M} \in \mathbb{R}^{r\times c}$, and a vector $x\in \mathbb{R}^c$ let $M_i$ be the i'th row of the $M$. The matrix product is defined as: $$\mathbf{Mx} \ =\ \mathbf{M}\begin{bmatrix} x_0\\ \vdots \\ x_{c-1}\\ \end{bmatrix} \ =\ \begin{bmatrix} \mathbf{M_0}\\ \vdots \\ \mathbf{M_{r-1}}\\ \end{bmatrix}\mathbf{x} \ =\ \begin{bmatrix} \ \mathbf{M_0 \cdot x}\ \\ \vdots \\ \ \mathbf{M_{r-1} \cdot x}\ \\ \end{bmatrix} $$ Further, given a matrix $N \in \mathbb{R}^{c\times m}$ we define $$ MN = \begin{bmatrix} \mathbf{M_0\cdot N^T_0} \cdots\mathbf{M_0\cdot N^T_{m-1}} \\ \vdots \ddots \vdots \\ \mathbf{M_{r-1}\cdot N^T_0} \cdots \mathbf{M_{r-1}\cdot N^T_{m-1}}\end{bmatrix} $$ And we have $MN \in \mathbb{R}^{r\times m}$ 4. $\mathbf{M} \in \mathbb{R}^{r\times c}$ implies that the function $f(x) = \mathbf{Mx}$ can map $\mathbb{R}^{c\times 1} \to \mathbb{R}^{r\times 1}$. 5. $\mathbf{M_1} \in \mathbb{R}^{d\times c}$ and $\mathbf{M_2} \in \mathbb{R}^{r\times d}$ implies $f(x) = \mathbf{M_2M_1x}$ can map $\mathbb{R}^c \to \mathbb{R}^r$. :::info **Task 1:** Given this and your own knowledge, try solving these: - __Prove that $(2) + (3)$ implies $(4)$__. In other words, use your understanding of the inner and matrix-vector products to explain why $(4)$ has to be true. - __Prove that $(4)$ implies $(5)$__ ::: ### B. Differentiation Recall that differentiation is finding the rate of change of one variable relative to another variable. Some nice reminders: $$\begin{align} \frac{df(x)}{dx} & \text{ is how $f(x)$ changes with respect to $x$}.\\ \frac{\partial f(x,y)}{\partial x} & \text{ is how $f(x,y)$ changes with respect to $x$ (and ignoring other factors)}.\\ \frac{dz}{dx} &= \frac{dy}{dx} \cdot \frac{dz}{dy} \text{ via chain rule if these factors are easier to compute}. \end{align}$$ Some common derivative patterns include: $$\frac{d}{dx}(2x^3 + 4x + 5) = 6x^2 + 4 $$$$\frac{\partial}{\partial y}(x^2y^3 + xy + 5x^2) = 3x^2y^2 + x % $$$$\frac{d}{dx}(x^3 + 5)^3 = 3(x^3 + 5)^2 \times (3x^2) $$$$\frac{d}{dx}\ln(x) = \frac{1}{x} $$ :::info **Task 2:** Given this and your own knowledge: Use (and internalize) the log properties to solve the following: $$\frac{\partial}{\partial y}\ln(x^5/y^2)$$ The properties are as follows: $$\log(x^p) = p\log(x)$$ $$\log(xy) = \log(x) + \log(y)$$ $$\log(x/y) = \log(x) - \log(y)$$ Solve the following partial for a valid $j$ and all valid $i$: $$\frac{\partial}{\partial x_j}\ln\bigg[\sum_i x_iy_i\bigg]$$ Consider using the chain rule. Let $g_1(x) = \sum_i x_iy_i$... ::: ### C. Jacobians Now, the previous examples focused on scalar functions (functions that output a single number), but many functions output vectors. For example, consider the function: $$ f(x,y)= \begin{bmatrix} x^2+y \\ 2xy \\ x−y^2 \end{bmatrix}$$ This function takes two inputs $(x,y)$ and produces three outputs. When we want to understand how this vector function changes with respect to its inputs, we organize all the partial derivatives into a matrix called the **Jacobian**. For a function mapping $\mathbb{R}^n \to \mathbb{R}^m$, the Jacobian is **always** an $m \times n$ matrix—it has as many rows as outputs and as many columns as inputs. The Jacobian matrix $\mathbf{J}$ has the form: $$\mathbf{J} = \frac{\partial \mathbf{f}}{\partial (x,y)} = \begin{bmatrix} \frac{\partial f_1}{\partial x} & \frac{\partial f_1}{\partial y} \\ \frac{\partial f_2}{\partial x} & \frac{\partial f_2}{\partial y} \\ \frac{\partial f_3}{\partial x} & \frac{\partial f_3}{\partial y} \end{bmatrix}$$ Each **row** corresponds to one output component, and each **column** corresponds to one input variable. The entry in row $i$, column $j$ tells us how output $i$ changes with respect to input $j$. Returning to our example above, we can define our component functions as: - $f_1(x,y) = x^2 + y$ - $f_2(x,y) = 2xy$ - $f_3(x,y) = x - y^2$ Therefore, the complete Jacobian is: $$\mathbf{J} = \begin{bmatrix} 2x & 1 \\ 2y & 2x \\ 1 & -2y \end{bmatrix}$$ :::info **Task 3:** Higher-Dimensional Jacobians and Multi-Stage Function Analysis Consider the following vector functions: $$\mathbf{f}(s,t) = \begin{bmatrix} s^2t \\ s+e^t \\ \ln(1+s^2) \end{bmatrix}$$ $$\mathbf{g}(u,v,w) = \begin{bmatrix} e^{u} + \ln(1+e^{v}) \\ w(u^2+1) \\ \frac{1}{1+e^{-v}} + w^3 \end{bmatrix}$$ $$\mathbf{p}(a,b) = \begin{bmatrix} a^2 + b^2 \\ 2ab \end{bmatrix}$$ **Part A: Jacobian Computation and Analysis** 1. Compute the complete Jacobian matrix $\mathbf{J}_g$ for function $\mathbf{g}$. Show all partial derivatives explicitly. 2. Evaluate $\mathbf{J}_g$ at the point $(u,v,w) = (1,0,1)$. What do you notice about the values in the first column versus the other columns? **Part B: Function Composition and Chain Rule** 1. Consider the potential composition $\mathbf{p}(\mathbf{g}(\mathbf{f}(s,t)))$. Is this composition valid? If not, explain what's wrong in terms of dimensional compatibility. If it is valid, what would be the dimensions of the resulting Jacobian when applying the chain rule? 2. Analyze the composition $\mathbf{g}(\mathbf{f}(s,t))$: - Write out the result of the composition explicitly as a function of $(s,t)$ - Find $\frac{\partial}{\partial s}\mathbf{g}(\mathbf{f}(s,t))$ using direct differentiation - Now compute the same derivative using the chain rule: $\mathbf{J}_g(\mathbf{f}(s,t)) \cdot \mathbf{J}_f(s,t)$ - Verify that both methods yield identical results ::: A special class of vector functions applies the same scalar function to each component independently: An element-wise function $\mathbf{h}: \mathbb{R}^n \to \mathbb{R}^n$ has the form: $$\mathbf{h}(\mathbf{x}) = \begin{bmatrix} h(x_1) \\ h(x_2) \\ \vdots \\ h(x_n) \end{bmatrix}$$ Since each output depends only on its corresponding input, the Jacobian is **diagonal**: $$\mathbf{J}_h = \begin{bmatrix} h'(x_1) & 0 & \cdots & 0 \\ 0 & h'(x_2) & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & h'(x_n) \end{bmatrix}$$ :::info **Task 4: Element-wise Functions** 1. Answer the following questions where $\mathbf{r}(\mathbf{x}) = \max(0, \mathbf{x})$: a) Find the scalar derivative $\mathbf{r}'(\mathbf{x})$. b) Write the Jacobian $\mathbf{J}_{\mathbf{r}}$ for $\mathbf{x} = [x_1, x_2, x_3]^T$. 2. Compare the following functions for the input, $\mathbf{x}$: - **X (Input)**: $\mathbf{x} = [x_1, x_2]^T$ - **A (Element-wise):** $\mathbf{f}_A(\mathbf{x}) = \begin{bmatrix} x_1^2 \\ x_2^2 \end{bmatrix}$ - **B (Non-element-wise):** $\mathbf{f}_B(\mathbf{x}) = \begin{bmatrix} x_1^2 + x_2 \\ x_1 + x_2^2 \end{bmatrix}$ Compute both Jacobians $\mathbf{J}_A$ and $\mathbf{J}_B$. Which one is diagonal and why? ::: ### D. Probability #### Fundamental Concepts **Random Variables**: A random variable $X$ is a function that assigns numerical values to the outcomes of a random experiment. We write $X \sim P(x)$ to indicate that $X$ follows probability distribution $P$. **Independence**: Events $A$ and $B$ are independent if $P(A \cap B) = P(A)P(B)$. For random variables, $X$ and $Y$ are independent if knowing the value of $X$ tells us nothing about the probability distribution of $Y$. **Conditional Probability**: $P(A|B) = \frac{P(A \cap B)}{P(B)}$ represents the probability of $A$ given that $B$ has occurred. :::info **Task 5:** Given this and your own knowledge: - You're trying to train a cat/dog classifier which takes in an image $x$ from our dataset, X, and outputs a prediction, $\hat{y}\in \{0, 1\}$ (0 if the image is a cat, 1 if it is a dog). Let $\hat{Y}(x)$ be a random variable that represents our classifier. Suppose that the dataset of cats and dogs is balanced (i.e. there are an equal number of cat and dog examples). Your friend argues that since the dataset is balanced, the classifier should ignore the input data and produce each prediction with equal probability $$\mathbb{P}[\hat{Y}=0] = \mathbb{P}[\hat{Y}=1]$$ - If your friend's assumption were correct, what value of $\mathbb{P}[\hat{Y}=0]=\mathbb{P}[\hat{Y}=1]$ would make this a valid probability distribution? - Is your friend's assumption correct? Why or why not? ::: #### Expectation and Variance: The Computational Foundation The **expectation** (or mean) of a random variable $X$ is: $$\mathbb{E}[X] = \begin{cases} \sum_{x} x \cdot P(X = x) & \text{if } X \text{ is discrete} \\ \int_{-\infty}^{\infty} x \cdot p(x) \, dx & \text{if } X \text{ is continuous} \end{cases}$$ The **variance** measures spread around the mean: $$\mathbb{V}[X] = \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$$ **Key Properties** (these will be crucial for optimization algorithms): - **Linearity of expectation**: $\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]$ (even if $X, Y$ are not independent!) - **Variance of scaled variables**: $\mathbb{V}[aX] = a^2\mathbb{V}[X]$ - **Independence and variance**: If $X, Y$ independent, then $\mathbb{V}[X + Y] = \mathbb{V}[X] + \mathbb{V}[Y]$ **Normal Distribution**: $X \sim \mathcal{N}(\mu, \sigma^2)$ has the familiar bell curve shape. - **Density**: $p(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ - **Expectation**: $\mathbb{E}[X] = \mu$ - **Variance**: $\mathbb{V}[X] = \sigma^2$ **Standard Normal**: $\mathcal{N}(0, 1)$ is the normal distribution with mean 0 and variance 1. **Key Transformation**: If $X \sim \mathcal{N}(\mu, \sigma^2)$, then $\frac{X - \mu}{\sigma} \sim \mathcal{N}(0, 1)$. :::info **Task 6**: Essential Probability Calculations **Part A: Basic Computations** 1. A random variable $Z$ takes values $\{-2, 0, 2\}$ with probabilities $\{0.3, 0.4, 0.3\}$. - Compute $\mathbb{E}[Z]$ and $\mathbb{V}[Z]$ - What is $\mathbb{E}[Z^2]$? Verify that $\mathbb{V}[Z] = \mathbb{E}[Z^2] - (\mathbb{E}[Z])^2$ 2. If $X \sim \mathcal{N}(2, 9)$ and $Y \sim \mathcal{N}(-1, 4)$ are independent: - What is the distribution of $X + Y$? - What is the distribution of $3X - 2Y + 5$? **Part B: Matrix-Vector Products with Random Matrices** 3. Consider a $3 \times 2$ matrix $\mathbf{A}$ where each entry $A_{ij}$ is independent with $\mathbb{E}[A_{ij}] = 0$ and $\mathbb{V}[A_{ij}] = 1$. - For the deterministic vector $\mathbf{v} = [1, -1]^T$, compute $\mathbb{E}[\mathbf{A}\mathbf{v}]$ - What is $\mathbb{V}[(\mathbf{A}\mathbf{v})_1]$? (Note: $(\mathbf{A}\mathbf{v})_1 = A_{11} \cdot 1 + A_{12} \cdot (-1)$) - More generally, if $\mathbf{v}$ is any vector with $\|\mathbf{v}\|^2 = c$, what is $\mathbb{V}[(\mathbf{A}\mathbf{v})_i]$ for any component $i$? **Part C: Optimization from Probabilistic Assumptions** 4. Suppose you observe noisy measurements: $y_i = 2x_i + 3 + \epsilon_i$ where each $\epsilon_i \sim \mathcal{N}(0, 1)$ independently. - Given data points $(x_1, y_1) = (1, 4.8)$, $(x_2, y_2) = (2, 7.2)$, $(x_3, y_3) = (3, 9.1)$, what's the probability density of observing $y_1 = 4.8$ given $x_1 = 1$? - If you want to find the best estimates $\hat{a}$ and $\hat{b}$ for the model $y = ax + b$, explain why minimizing $\sum_{i=1}^3 (y_i - ax_i - b)^2$ makes sense from a probabilistic perspective. **Part C: Averaging Independent Quantities** 5. You measure the same quantity 16 times, getting independent measurements $M_1, M_2, \ldots, M_{16}$ where each $M_i$ has $\mathbb{E}[M_i] = \mu$ (the true value) and $\mathbb{V}[M_i] = \sigma^2$. - What is $\mathbb{E}[\bar{M}]$ where $\bar{M} = \frac{1}{16}\sum_{i=1}^{16} M_i$? - What is $\mathbb{V}[\bar{M}]$? - How many measurements would you need to make the variance of your average 4 times smaller? ::: ## Submission All you need to submit for this assignment are your typed or written answers to the Math review section on Gradescope. The assignment is linked [here](https://www.gradescope.com/courses/1107451)! This assignent is graded on completion and a good faith effort! ## Congratulations 🥳 You have just completed the first assignments for deep learning! You are now set up to begin working on the course. Good luck :)