Assignment 1: Setup and Warmup

# Assignment 1: Setup and Mathematical Foundations In this assignment we are going to go through all the setup and background we need for this course! We've divided it into 3 sections as follows: 1. Setting up our python environment. 2. Review some Python constructs we will rely on 3. Review math that we will use throughout this course. :::info This homework should be completed by **Thursday, Februrary 5, 2026 at 11:59 PM EST** ::: ## 0. Onboarding Form Ok, enough about all of the work you are GOING to do. Here's some real work for you to do now. Please take a moment to let us collect your data and fill out this [form](https://forms.gle/L6rD64vDKzFogpJ39). ## 1. Environment Setup ### Getting started Please click <ins>[here](https://classroom.github.com/a/nTxTB6pN)</ins> to get the stencil code. Reference this <ins>[guide](https://hackmd.io/gGOpcqoeTx-BOvLXQWRgQg)</ins> for more information about GitHub and GitHub Classroom. ### Roadmap In order to complete (programming) assignments for this course, you will need a way to code, run, and debug your own Python code. While you are always free to use department machines for this (they have a pre-installed version of the course environment that every assignment has been tested against), you are also free to work on your own machines. Below we give you some information that is helpful for either of these situations. ### A. Configuring Environment #### Developing Locally In order to set up your virtual environment for this couse, we _highly recommend_ (and only formally support) the use of Anaconda to create and manage your Python environment. Once you have cloned the Github Classroom assignment for this assignment, you can do the following to setup your virtual environment: 1. Download the **Anaconda** installer from [here](https://www.anaconda.com/download#downloads), and install it on your computer. We recommend using the Graphical Installer for the correct system (Windows / (Intel) Mac / Mac M1). :::info **Note:** If you have an existing Anaconda or Miniconda installation (such as from CS200), then you don't need to reinstall, and can just use that! You can tell if you have an existing install if the command `conda --version` is recognized. ::: :::warning **Windows**: When installing using the graphical installer, **be sure to check the box which adds `conda` to your `PATH`**. ::: 2. Open a new terminal window and navigate to the root of the cloned assignment in a terminal (such as the one in VSCode) using `cd` and `ls`, and run `./env_setup/Other/conda_create.sh`. This should set up a virtual environment named `csci1470` on your computer. **If you have an Apple Silicon chip (M1, M2, M3), this script will be different.** (See below.) :::info You may need to restart your terminal after installing Anaconda in order for this to work. ::: :::warning **Note:** This might be slightly different depending on your platform: - **Apple Silicon**: We provide a slightly different script which you can call from `./env_setup/Apple_Silicon/conda_create_silicon.sh` for those who have Apple silicon. - **Windows and Others**: If you are using Windows Powershell, then you can just run `./env_setup/Other/conda_create.sh` (forward slashes), but if you are using *Command Prompt*, then you need to run `.\env_setup\Other\conda_create.sh` (backslashes). :::spoiler Troubleshooting Tips - **If you are experiencing problems running the `*.sh` files entirely.** - If you are getting a permission issue, try running `chmod a+x <script>` in the command line from inside your repo. - If you are getting an unexpected token/syntax error, try running prepending the script command using `dos2unix <script>`. Then run the script again without `dos2unix`. If you run into the error `command not found`, follow this <ins>[guide](https://formulae.brew.sh/formula/dos2unix)</ins>. - If those do not work, you can just open the `conda_create.sh` script in a text editor (such as VSCode), and run each line individually in your terminal. ::: 3. Run `conda activate csci1470`. **You will need to do this in every shell where you want to use the virtual environment**. :::warning If you are the above procedure doesn't work for you (although we **highly recommend trying to troubleshoot that first**), here is another method that does not rely on `conda` commands: :::spoiler 1. **Install Python 3.11** (or Python 3.9, we have not found any differences in functionality between them for our projects). 2. **Install the following packages** in either a virtual environment or your main python environment ipython==8.8.0 matplotlib==3.5.3 numpy==1.23.5 Pillow==9.4.0 scipy==1.9.3 tensorflow==2.15.0 tqdm==4.64.1 - We suggest you create a virtual environment. You can do so by running the following commands: 1. Create a folder where you plan to keep all your homework assignments. 2. In that folder, run `python -m venv cs1470`. This will create a new virtual environment caled `cs1470`. 3. Activate your virtual environment by running `cs1470/Scripts/activate` on Windows machines or `source cs1470/bin/activate` on Mac and Linux machines. Do this before starting any homework asssignment. 4. You can deactivate the virtual environment by typing `deactivate`. - You can install individual packages using `pip` commands (ie `pip install ipython==8.8.0`) - You can install a group of packages by pasting them into a requirements.txt file and running `pip install -r requirements.txt` ::: Once this is complete, you should have a local environment to use for the course! #### Department Machines :::info **Note:** Sometimes even if you set up your local environment correctly, you may experience unexpected bugs and errors that are unique to your local setup. To prevent this from hindering your ablity to complete assignments, we **highly recommend** that you familiarize yourself with the department machines, even if you expect to usually be working locally. ::: Department machines serve as a common, uniform way to work on and debug assignments. There are a variety of ways in which you can use department machines: 1. **In Person.** If you are in the CIT, you can (almost) always head into the Sunlab/[ETC] and work on a department machine. 2. **FastX**. FastX allows you to VNC into a department machine from your own computer, from anywhere! A detailed guide to getting FastX working on your own computer can be found [here](https://cs.brown.edu/about/system/connecting/fastx/). 3. **SSH**. The department machines can also be accessed by SSH (Secure Shell) from anywhere, which should allow you to perform command line activities (cloning repositories, running assignment code). You can check out an SSH guide [here](https://cs.brown.edu/about/system/connecting/ssh/). When using the department machines, you can activate the course virtual environment (which we have already installed) using: ``` source /course/cs1470/cs1470_env/bin/activate ``` Which will activate the course virtual environment. From here, you should be able to clone the repository (see a GitHub guide here for more information on using Git via the command line), and work on your assignment. :::info **Note**: Python files using `pytorch` may require a little more time on startup to run on department machines (likely because it is pulling files from the department filesystem), but they should all run nonetheless. ::: ### B. Test your environment #### What is an environment? Python packages, or libraries, are external sets of code written by other industry members which might prove really helpful! (Imagine coding how to draw a graph in Python every single time) However, different classes, tasks, and even projects, might require different sets of Python packages. We can manage these as different virtual environments which have different sets of packages installed. #### Conda Specifics If you are using `conda`, you might notice the `(base)` prefix in your terminal. This signifies that you're in the default (hence `(base)`) environment. To access CSCI1470's virtual environment, you can use `conda activate csci1470`. You should now see the `(csci1470)` prefix in your terminal! To return back to the base environment, you can use `conda deactivate`. :::success **TODO: Running a Test Network** To make sure all your packages are installed correctly, run the following command to train a small neural network. This will ensure your PyTorch and NumPy are ready for the next assignment. Follow the steps below: 1. make sure you are in the root directy for this assignment, `hw1-setup`. 2. Make sure your conda environment is active (`csci1470` prefrix in your terminal) 3. Adjust the number of epochs on line 81 of `code/test_network.py` from 75 to 10 (this will make it train faster). You can adjust this later to yield better results. 4. run the following command ```python! python code/test_network.py ``` 5. Your network will take a few minutes to train (no longer than 5, if it does, you have an issue). You can then observe your network's learned decision boundaries in the image `neural_galaxy.png` in your root directory. ::: ## 2. Python Review Welcome to the Python review session! This 30-45 minute tutorial covers essential Object-Oriented Programming (OOP) concepts that you'll need for this course. While we won't be discussing specific deep learning libraries or concepts yet, the patterns you learn here will form the foundation for building neural networks, optimizers, and other ML components that you will **need** in the next assignment. ### A. Basic Python Review For an overview of python syntax and common python uses we recommend checking out <ins>[this](https://www.w3schools.com/python)</ins> python tutorial for a refresher. ### B. Advanced Python Throughout this section, you should be filling in the corresponding classes and methods in `python_tutorial.py` in the repository you cloned. You can test each of these methods/classes you implement by running the file as we have provided you with test cases at the bottom. ## Section Breakdown ### Section 1: Classes and Objects OOP (Object Oriented Programming) strongly focuses on objects, which encompass any value, variable, *etc*. It's really any "tangible" thing. ```python thing1 = "i am an object!" thing2 = 1234567 ... ``` However, we might find it useful to organize these things into classes of things, with properties like instance variables or methods shared across all things that are members of the same class. You might be familiar with Python's `str`, `int`, and `float` classes, for example. When working with Python, you'll almost always be working with objects which are instances of classes. This is just a quirk of Python being so "versatile". You can check the type of a variable with the built-in `type` method! The main advantage of classes are they allow to bundle data attributes and functionality together. This allows for a much cleaner and easier to work with model where we can work with class objects directly that keep track of their state and properties internally. Most mainstream ML systems and frameworks require objects that maintain state, like parameters of the network, while providing computational functionality. Classes give you this structure. ## Section 2: Inheritance and Polymorphism If classes are sets and objects are set elements, then we also need **subsets and supersets**! In Python, we can make a "child" class **inherit** all the methods from a "parent" class like so: ```python class ChildClass(ParentClass): pass # Inherits everything from ParentClass ``` This allows us to create **specialized versions** of classes while maintaining a common interface. The child class gets all the parent's methods and attributes automatically, but can also: - **Add new methods** that the parent doesn't have - **Override methods** to change their behavior - **Extend methods** by calling the parent's version with `super()` **Polymorphism** is the companion concept. From your intro courses, it means different classes can be used interchangeably if they share a common interface. This is like having different types of vehicles (car, motorcycle, truck) that all have a `drive()` method, you can work with any vehicle generically. **Why This Matters** Throughout the course, you'll build many types of transformations, layers, and operations that share common behaviors but differ in specific implementations. Inheritance provides **code reuse** (don't repeat yourself!), while polymorphism lets you treat different objects **uniformly**. ## Section 3: Special Methods (Dunder Methods) **Special methods** (also called "magic methods" or "dunder methods" because of the **d**ouble **under**scores) let you define how your objects behave with Python's built-in operations. Want your custom class to work with `len()`? Implement `__len__()`. Want to support `obj[index]`? Implement `__getitem__()`. Want to add two objects with `+`? Implement `__add__()`. (This is known as "desugaring" and there is a lot more to learn in CS1730). These methods make your objects feel **native** to Python . Using dunders, they integrate seamlessly with the language's syntax rather than requiring awkward method calls like `obj.get_length()` or `obj.add(other)`. **Common Special Methods** ```python class MyClass: def __init__(self, data): # Constructor: MyClass(data) self.data = data def __str__(self): # str(obj) or print(obj) return f"MyClass with {self.data}" def __repr__(self): # repr(obj) return f"MyClass(data={self.data})" def __len__(self): # len(obj) return len(self.data) def __getitem__(self, index): # obj[index] return self.data[index] def __setitem__(self, index, value): # obj[index] = value self.data[index] = value def __add__(self, other): # obj1 + obj2 return MyClass(self.data + other.data) def __mul__(self, scalar): # obj * number return MyClass(self.data * scalar) def __call__(self, arg): # obj(arg) to make object callable return self.process(arg) ``` Implementing these special methods make you components feel natural and Pythonic to use. **Example: Vector Math** ```python v1 = Vector([1, 2, 3]) v2 = Vector([4, 5, 6]) # Instead of: v1.add(v2).multiply(2) # You can write: result = (v1 + v2) * 2 # Much cleaner! ``` *This is a part of what is going on behind the scenes in NumPy!* ## Section 4: Properties Sometimes you want attributes that **look** like regular variables but actually **compute** their values on the fly, or perform validation when set. Properties give you this capability. **Convention**: Python uses a single underscore (`_variable`) to indicate "this is internal/private; don't access directly." Properties provide the public interface. **Syntax** ```python class Rectangle: def __init__(self, width, height): self._width = width # Store in "private" variable with _ self._height = height @property def width(self): # Getter return self._width @width.setter def width(self, value): # Setter if value <= 0: raise ValueError("Width must be positive") self._width = value @property def area(self): # Computed property (read-only) return self._width * self._height # Usage looks like regular attributes! rect = Rectangle(5, 3) print(rect.area) # 15 (computed) rect.width = 10 # Validation happens automatically print(rect.area) # 30 (recomputed) ``` **Why This Matters** In ML, you'll often have: - **Computed shapes**: `output_shape` based on `input_shape` and layer parameters - **Parameter counts**: `num_parameters` computed from layer dimensions - **Validation**: Ensuring hyperparameters stay in valid ranges - **Derived values**: Normalized versions of stored values Properties make these accessible with clean syntax while keeping the computation logic hidden. ## Section 5: Class Methods and Class Variables So far, we've seen **instance methods**, which operate on `self`, and **instance variables**. But we can also have methods and variables that belong to the **class itself**. **Class Variables** Class variables are shared across **all instances** of the class. Change it once, it changes for everyone: ```python class Counter: total_count = 0 # Class variable shared by all instances def __init__(self): self.count = 0 # Instance variable unique to each object Counter.total_count += 1 c1 = Counter() c2 = Counter() print(Counter.total_count) # 2 tracks all Counter objects created ``` This is useful for: - **Global counters** such as tracking total instances - **Shared configuration** which dictate default values used by all instances - **Constants** for values that don't change per instance **Class Methods** Class methods operate on the **class** rather than an instance. They're most commonly used for **alternative constructors** different ways to build objects: **Key Difference**: `@classmethod` receives the class as first argument (`cls`), not an instance (`self`). ```python class Configuration: def __init__(self, lr, batch_size, epochs): self.lr = lr self.batch_size = batch_size self.epochs = epochs @classmethod def default(cls): # Note: 'cls' not 'self' return cls(lr=0.01, batch_size=32, epochs=10) @classmethod def from_file(cls, path): data = load_json(path) return cls(**data) @classmethod def for_testing(cls): return cls(lr=0.1, batch_size=8, epochs=2) # Multiple ways to create objects: config1 = Configuration(0.001, 64, 20) # Normal constructor config2 = Configuration.default() # Using class method config3 = Configuration.for_testing() # Another class method ``` **Why This Matters** In ML contexts, you likely will want to experiment with different hyperparameters and configurations. These class methods allow you to possible: - Load **pre-trained models** from files (like in TensorFlow) `Model.from_pretrained(path)` - Create instances with **common configurations** as `Optimizer.adam_default()` - Build objects from **different formats** as `Model.from_config(config_dict)` Class methods provide these factory patterns elegantly. ## Section 6: Context Managers Have you used `with open(file) as f:`? That's a **context manager**! Context managers ensure resources are properly acquired and released, even if errors occur. The `with` statement executes code in a controlled environment where: 1. **Setup** happens when entering the block (`__enter__`) 2. **Your code** runs in the middle 3. **Cleanup** happens when exiting, **even if there's an error** (`__exit__`) **Basic Pattern** ```python class ResourceManager: def __enter__(self): # Acquire resource print("Setting up...") return self # This gets assigned to 'as' variable def __exit__(self, exc_type, exc_val, exc_tb): # Release resource print("Cleaning up...") return False # Don't suppress exceptions # Usage: with ResourceManager() as manager: # Do work with resource pass # Cleanup happens automatically! ``` **Why This Matters** Context managers are perfect for: **Timing Code** ```python with Timer("Training epoch"): train_one_epoch() # Automatically prints: "Training epoch took 45.23 seconds" ``` **Temporarily Changing State** ```python with model.eval_mode(): predictions = model(test_data) # Automatically returns to previous mode ``` **Managing Resources** ```python with DataLoader(dataset) as loader: for batch in loader: process(batch) # Automatically closes/cleans up loader ``` **Temporary Overrides** ```python with set_learning_rate(optimizer, 0.001): # Use different LR temporarily optimizer.step() # Automatically restores original LR ``` The reason context managers are so powerful is because they **guarantee** the cleanup code in `__exit__` **always runs**, even if an exception is raised, you `return` early or you `break` from a loop. This makes your code more robust and prevents resource leaks. ## Connection to Deep Learning These patterns appear constantly in ML frameworks: | Pattern | ML Example | |---------|------------| | Classes | Neural network layers, models | | Inheritance | Different layer types (Conv2D, Dense, Dropout) | | Special methods | `model(input)`, `dataset[i]`, `len(dataloader)` | | State management | Optimizer momentum, batch norm statistics | | Properties | `model.num_parameters`, `layer.output_shape` | | Class methods | `Model.from_pretrained()`, `Optimizer.adam()` | | Context managers | `torch.no_grad()`, `model.eval()` | You'll see these exact patterns in PyTorch, TensorFlow, JAX, and every major ML library. Understanding them deeply will make you much more effective throughout the course! ## 3. Math Review :::info **Task 0:** In each of the following sections, please read though the review and complete the problems in each of the blue boxes. You are welcome to write these up however you'd like although we encourage you to use LaTeX. Here is a stencil for you to [use](https://www.overleaf.com/read/pbyqsczsswfm#dc04cf). Please make a copy and attach your solutions to your submission on gradescope. ::: ### A. Matrix Multiplication 1. Given two column vectors $a \in \mathbb{R}^{m \times 1}, \; b \in \mathbb{R}^{n \times 1}$ the _outer product_ is $$\mathbf{a}\times \mathbf{b}^T = \begin{bmatrix}a_0 \\ \vdots \\ a_{m-1}\end{bmatrix} \times \begin{bmatrix}b_0 \\ \vdots \\ b_{n-1}\end{bmatrix}^T = \begin{bmatrix} a_0 b^T\\ \vdots \\ a_{m-1} b^T\\ \end{bmatrix} = \begin{bmatrix} a_0 b_0 & \cdots & a_0 b_{n-1}\\ \vdots & \ddots & \vdots \\ a_{m-1} b_0 & \cdots & a_{m-1} b_{n-1}\\ \end{bmatrix} \in \mathbb{R}^{m\times n} $$ 2. Given two column vectors $\mathbf{a}$ and $\mathbf{b}$ both in $\mathbb{R}^{r\times 1}$, the _inner product_ (or the _dot product_) is defined as: $$ \mathbf{a} \cdot \mathbf{b} = \mathbf{a}^T\mathbf{b} = \begin{bmatrix} a_0\ \cdots\ a_{r-1} \end{bmatrix} \begin{bmatrix}b_0 \\ \vdots \\ b_{r-1}\end{bmatrix} = \sum_{i=0}^{r} a_i b_i $$ where $\mathbf{a}^T$ is the _transpose_ of a vector, which converts between column and row vector alignment. The same idea extends to matrices as well. 3. Given a matrix $\mathbf{M} \in \mathbb{R}^{r\times c}$, and a vector $x\in \mathbb{R}^c$ let $M_i$ be the i'th row of the $M$. The matrix product is defined as: $$\mathbf{Mx} \ =\ \mathbf{M}\begin{bmatrix} x_0\\ \vdots \\ x_{c-1}\\ \end{bmatrix} \ =\ \begin{bmatrix} \mathbf{M_0}\\ \vdots \\ \mathbf{M_{r-1}}\\ \end{bmatrix}\mathbf{x} \ =\ \begin{bmatrix} \ \mathbf{M_0 \cdot x}\ \\ \vdots \\ \ \mathbf{M_{r-1} \cdot x}\ \\ \end{bmatrix} $$ Further, given a matrix $N \in \mathbb{R}^{c\times m}$ we define $$ MN = \begin{bmatrix} \mathbf{M_0\cdot N^T_0} \cdots\mathbf{M_0\cdot N^T_{m-1}} \\ \vdots \ddots \vdots \\ \mathbf{M_{r-1}\cdot N^T_0} \cdots \mathbf{M_{r-1}\cdot N^T_{m-1}}\end{bmatrix} $$ And we have $MN \in \mathbb{R}^{r\times m}$ 4. $\mathbf{M} \in \mathbb{R}^{r\times c}$ implies that the function $f(x) = \mathbf{Mx}$ can map $\mathbb{R}^{c\times 1} \to \mathbb{R}^{r\times 1}$. 5. $\mathbf{M_1} \in \mathbb{R}^{d\times c}$ and $\mathbf{M_2} \in \mathbb{R}^{r\times d}$ implies $f(x) = \mathbf{M_2M_1x}$ can map $\mathbb{R}^c \to \mathbb{R}^r$. :::info **Task 1:** Given this and your own knowledge, try solving these: - __Prove that $(2) + (3)$ implies $(4)$__. In other words, use your understanding of the inner and matrix-vector products to explain why $(4)$ has to be true. - __Prove that $(4)$ implies $(5)$__ ::: ### B. Differentiation Recall that differentiation is finding the rate of change of one variable relative to another variable. Some nice reminders: $$\begin{align} \frac{df(x)}{dx} & \text{ is how $f(x)$ changes with respect to $x$}.\\ \frac{\partial f(x,y)}{\partial x} & \text{ is how $f(x,y)$ changes with respect to $x$ (and ignoring other factors)}.\\ \frac{dz}{dx} &= \frac{dy}{dx} \cdot \frac{dz}{dy} \text{ via chain rule if these factors are easier to compute}. \end{align}$$ Some common derivative patterns include: $$\frac{d}{dx}(2x^3 + 4x + 5) = 6x^2 + 4 $$$$\frac{\partial}{\partial y}(x^2y^3 + xy + 5x^2) = 3x^2y^2 + x % $$$$\frac{d}{dx}(x^3 + 5)^3 = 3(x^3 + 5)^2 \times (3x^2) $$$$\frac{d}{dx}\ln(x) = \frac{1}{x} $$ :::info **Task 2:** Given this and your own knowledge: Use (and internalize) the log properties to solve the following: $$\frac{\partial}{\partial y}\ln(x^5/y^2)$$ The properties are as follows: $$\log(x^p) = p\log(x)$$ $$\log(xy) = \log(x) + \log(y)$$ $$\log(x/y) = \log(x) - \log(y)$$ Solve the following partial for a valid $j$ and all valid $i$: $$\frac{\partial}{\partial x_j}\ln\bigg[\sum_i x_iy_i\bigg]$$ Consider using the chain rule. Let $g_1(x) = \sum_i x_iy_i$... ::: ### C. Jacobians Now, the previous examples focused on scalar functions (functions that output a single number), but many functions output vectors. For example, consider the function: $$ f(x,y)= \begin{bmatrix} x^2+y \\ 2xy \\ x−y^2 \end{bmatrix}$$ This function takes two inputs $(x,y)$ and produces three outputs. When we want to understand how this vector function changes with respect to its inputs, we organize all the partial derivatives into a matrix called the **Jacobian**. For a function mapping $\mathbb{R}^n \to \mathbb{R}^m$, the Jacobian is **always** an $m \times n$ matrix—it has as many rows as outputs and as many columns as inputs. The Jacobian matrix $\mathbf{J}$ has the form: $$\mathbf{J} = \frac{\partial \mathbf{f}}{\partial (x,y)} = \begin{bmatrix} \frac{\partial f_1}{\partial x} & \frac{\partial f_1}{\partial y} \\ \frac{\partial f_2}{\partial x} & \frac{\partial f_2}{\partial y} \\ \frac{\partial f_3}{\partial x} & \frac{\partial f_3}{\partial y} \end{bmatrix}$$ Each **row** corresponds to one output component, and each **column** corresponds to one input variable. The entry in row $i$, column $j$ tells us how output $i$ changes with respect to input $j$. Returning to our example above, we can define our component functions as: - $f_1(x,y) = x^2 + y$ - $f_2(x,y) = 2xy$ - $f_3(x,y) = x - y^2$ Therefore, the complete Jacobian is: $$\mathbf{J} = \begin{bmatrix} 2x & 1 \\ 2y & 2x \\ 1 & -2y \end{bmatrix}$$ :::info **Task 3:** Higher-Dimensional Jacobians and Multi-Stage Function Analysis Consider the following vector functions: $$\mathbf{f}(s,t) = \begin{bmatrix} s^2t \\ s+e^t \\ \ln(1+s^2) \end{bmatrix}$$ $$\mathbf{g}(u,v,w) = \begin{bmatrix} e^{u} + \ln(1+e^{v}) \\ w(u^2+1) \\ \frac{1}{1+e^{-v}} + w^3 \end{bmatrix}$$ $$\mathbf{p}(a,b) = \begin{bmatrix} a^2 + b^2 \\ 2ab \end{bmatrix}$$ **Part A: Jacobian Computation and Analysis** 1. Compute the complete Jacobian matrix $\mathbf{J}_g$ for function $\mathbf{g}$. Show all partial derivatives explicitly. 2. Evaluate $\mathbf{J}_g$ at the point $(u,v,w) = (1,0,1)$. **Part B: Function Composition and Chain Rule** 1. Consider the potential composition $\mathbf{p}(\mathbf{g}(\mathbf{f}(s,t)))$. Is this composition valid? If not, explain what's wrong in terms of dimensional compatibility. If it is valid, what would be the dimensions of the resulting Jacobian when applying the chain rule? 2. Analyze the composition $\mathbf{g}(\mathbf{f}(s,t))$: - Write out the result of the composition explicitly as a function of $(s,t)$ - Find $\frac{\partial}{\partial s}\mathbf{g}(\mathbf{f}(s,t))$ using direct differentiation - Now compute the same derivative using the chain rule: $\mathbf{J}_g(\mathbf{f}(s,t)) \cdot \mathbf{J}_f(s,t)$ - Verify that both methods yield identical results ::: A special class of vector functions applies the same scalar function to each component independently: An element-wise function $\mathbf{h}: \mathbb{R}^n \to \mathbb{R}^n$ has the form: $$\mathbf{h}(\mathbf{x}) = \begin{bmatrix} h(x_1) \\ h(x_2) \\ \vdots \\ h(x_n) \end{bmatrix}$$ Since each output depends only on its corresponding input, the Jacobian is **diagonal**: $$\mathbf{J}_h = \begin{bmatrix} h'(x_1) & 0 & \cdots & 0 \\ 0 & h'(x_2) & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & h'(x_n) \end{bmatrix}$$ :::info **Task 4: Element-wise Functions** 1. Answer the following questions where $\mathbf{r}(\mathbf{x}) = \max(0, \mathbf{x})$: a) Find the scalar derivative $\mathbf{r}'(\mathbf{x})$. b) Write the Jacobian $\mathbf{J}_{\mathbf{r}}$ for $\mathbf{x} = [x_1, x_2, x_3]^T$. 2. Compare the following functions for the input, $\mathbf{x}$: - **X (Input)**: $\mathbf{x} = [x_1, x_2]^T$ - **A (Element-wise):** $\mathbf{f}_A(\mathbf{x}) = \begin{bmatrix} x_1^2 \\ x_2^2 \end{bmatrix}$ - **B (Non-element-wise):** $\mathbf{f}_B(\mathbf{x}) = \begin{bmatrix} x_1^2 + x_2 \\ x_1 + x_2^2 \end{bmatrix}$ Compute both Jacobians $\mathbf{J}_A$ and $\mathbf{J}_B$. Which one is diagonal and why? ::: ### D. Probability #### Fundamental Concepts **Random Variables**: A random variable $X$ is a function that assigns numerical values to the outcomes of a random experiment. We write $X \sim P(x)$ to indicate that $X$ follows probability distribution $P$. **Independence**: Events $A$ and $B$ are independent if $P(A \cap B) = P(A)P(B)$. For random variables, $X$ and $Y$ are independent if knowing the value of $X$ tells us nothing about the probability distribution of $Y$. **Conditional Probability**: $P(A|B) = \frac{P(A \cap B)}{P(B)}$ represents the probability of $A$ given that $B$ has occurred. :::info **Task 5:** Given this and your own knowledge: - You're trying to train a cat/dog classifier which takes in an image $x$ from our dataset, X, and outputs a prediction, $\hat{y}\in \{0, 1\}$ (0 if the image is a cat, 1 if it is a dog). Let $\hat{Y}(x)$ be a random variable that represents our classifier. Suppose that the dataset of cats and dogs is balanced (i.e. there are an equal number of cat and dog examples). Your friend argues that since the dataset is balanced, the classifier should ignore the input data and produce each prediction with equal probability $$\mathbb{P}[\hat{Y}=0] = \mathbb{P}[\hat{Y}=1]$$ - If your friend's assumption were correct, what value of $\mathbb{P}[\hat{Y}=0]=\mathbb{P}[\hat{Y}=1]$ would make this a valid probability distribution? - Is your friend's assumption correct? Why or why not? ::: #### Expectation and Variance: The Computational Foundation The **expectation** (or mean) of a random variable $X$ is: $$\mathbb{E}[X] = \begin{cases} \sum_{x} x \cdot P(X = x) & \text{if } X \text{ is discrete} \\ \int_{-\infty}^{\infty} x \cdot p(x) \, dx & \text{if } X \text{ is continuous} \end{cases}$$ The **variance** measures spread around the mean: $$\mathbb{V}[X] = \mathbb{E}[(X - \mathbb{E}[X])^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$$ **Key Properties** (these will be crucial for optimization algorithms): - **Linearity of expectation**: $\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]$ (even if $X, Y$ are not independent!) - **Variance of scaled variables**: $\mathbb{V}[aX] = a^2\mathbb{V}[X]$ - **Independence and variance**: If $X, Y$ independent, then $\mathbb{V}[X + Y] = \mathbb{V}[X] + \mathbb{V}[Y]$ **Normal Distribution**: $X \sim \mathcal{N}(\mu, \sigma^2)$ has the familiar bell curve shape. - **Density**: $p(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ - **Expectation**: $\mathbb{E}[X] = \mu$ - **Variance**: $\mathbb{V}[X] = \sigma^2$ **Standard Normal**: $\mathcal{N}(0, 1)$ is the normal distribution with mean 0 and variance 1. **Key Transformation**: If $X \sim \mathcal{N}(\mu, \sigma^2)$, then $\frac{X - \mu}{\sigma} \sim \mathcal{N}(0, 1)$. :::info **Task 6**: Essential Probability Calculations **Part A: Basic Computations** 1. A random variable $Z$ takes values $\{-2, 0, 2\}$ with probabilities $\{0.3, 0.4, 0.3\}$. - Compute $\mathbb{E}[Z]$ and $\mathbb{V}[Z]$ - What is $\mathbb{E}[Z^2]$? Verify that $\mathbb{V}[Z] = \mathbb{E}[Z^2] - (\mathbb{E}[Z])^2$ 2. If $X \sim \mathcal{N}(2, 9)$ and $Y \sim \mathcal{N}(-1, 4)$ are independent: - What is the distribution of $X + Y$? - What is the distribution of $3X - 2Y + 5$? **Part B: Matrix-Vector Products with Random Matrices** 3. Consider a $3 \times 2$ matrix $\mathbf{A}$ where each entry $A_{ij}$ is independent with $\mathbb{E}[A_{ij}] = 0$ and $\mathbb{V}[A_{ij}] = 1$. - For the deterministic vector $\mathbf{v} = [1, -1]^T$, compute $\mathbb{E}[\mathbf{A}\mathbf{v}]$ - What is $\mathbb{V}[(\mathbf{A}\mathbf{v})_1]$? (Note: $(\mathbf{A}\mathbf{v})_1 = A_{11} \cdot 1 + A_{12} \cdot (-1)$) - More generally, if $\mathbf{v}$ is any vector with $\|\mathbf{v}\|^2 = c$, what is $\mathbb{V}[(\mathbf{A}\mathbf{v})_i]$ for any component $i$? **Part C: Optimization from Probabilistic Assumptions** 4. Suppose you observe noisy measurements: $y_i = 2x_i + 3 + \epsilon_i$ where each $\epsilon_i \sim \mathcal{N}(0, 1)$ independently. - Given data points $(x_1, y_1) = (1, 4.8)$, $(x_2, y_2) = (2, 7.2)$, $(x_3, y_3) = (3, 9.1)$, what's the probability density of observing $y_1 = 4.8$ given $x_1 = 1$? **Part C: Averaging Independent Quantities** 5. You measure the same quantity 16 times, getting independent measurements $M_1, M_2, \ldots, M_{16}$ where each $M_i$ has $\mathbb{E}[M_i] = \mu$ (the true value) and $\mathbb{V}[M_i] = \sigma^2$. - What is $\mathbb{E}[\bar{M}]$ where $\bar{M} = \frac{1}{16}\sum_{i=1}^{16} M_i$? - What is $\mathbb{V}[\bar{M}]$? - How many measurements would you need to make the variance of your average 4 times smaller? ::: ## Submission All you need to submit for this assignment are your typed or written answers to the Math review section on Gradescope. You do not need to submit your Python Tutorial files! The assignment submission is located on Gradescope! This assignent is graded on completion and a good faith effort! ## Congratulations 🥳 You have just completed the first assignments for deep learning! You are now set up to begin working on the course. Good luck :)