---
tags: hw0, conceptual
---
# Homework 0 Conceptual: Warm-up
:::info
Conceptual section due **Friday, February 3 at 6:00 PM EST**
Programming section due **Friday, February 3 at 6:00 PM EST**
:::
Welcome to the first (conceptual) homework assignment of CSCI1470/2470! This assignment is just meant to be a short math review of concepts from Linear Algebra and Probability that you will need for this course, and also get you set up with a course virtual environment so that you will be ready to start the first programming assignment (Homework 1).
First we'll introduce some starting concepts and ask you to expound on the details. While this doesn't have to be necessarily easy, it should prepare you for some of the early material and can be used to judge comfort with things that will come up a lot in the course.
:::info
We encourage the use of $\LaTeX$ to typeset your answers. A non-editable homework template is linked, so copy the .tex file into your own Overleaf project and go from there!
> #### [**Latex Template**](https://www.overleaf.com/read/hxjmfvmpgwxx)
:::
:::warning
Do **NOT** include your name anywhere in your submission. Submissions are graded anonymous, and named submission will incur deductions.
:::
## Theme
![](https://cdn.dribbble.com/users/55017/screenshots/2074320/fishworkout.gif)
*This fish is warming up for his race this Friday*
# Conceptual Questions
## Vectors
The following are some common (and _important_) properties and definitions about vectors:
1. Given two column vectors $\mathbf{a} \in \mathbb{R}^{r\times1}$ and $\mathbf{b} \in \mathbb{R}^{c\times1}$, the _outer product_ is:
$$
\mathbf{a} \times \mathbf{b} =
\begin{bmatrix}a_0 \\ \vdots \\ a_{r-1}\end{bmatrix}
\times
\begin{bmatrix}b_0 \\ \vdots \\ b_{c-1}\end{bmatrix} =
\begin{bmatrix}
a_0 b^T\\ \vdots \\
a_{r-1} b^T\\
\end{bmatrix}
= \begin{bmatrix}
a_0 b_0 & \cdots & a_0 b_{c-1}\\
\vdots & \ddots & \vdots \\
a_{r-1} b_0 & \cdots & a_{r-1} b_{c-1}\\
\end{bmatrix}
\in \mathbb{R}^{r\times c}
$$
where $\mathbf{v}^T$ is the _transpose_ of a vector, which converts between column and row vector alignment. The same idea extends to matrices as well.
2. Given two column vectors $\mathbf{a}$ and $\mathbf{b}$ both in $\mathbb{R}^{r\times 1}$, the _inner product_ (or the _dot product_) is defined as:
$$
\mathbf{a} \cdot \mathbf{b} = \mathbf{a}^T\mathbf{b}
= \begin{bmatrix} a_0\ \cdots\ a_{r-1} \end{bmatrix}
\begin{bmatrix}b_0 \\ \vdots \\ b_{r-1}\end{bmatrix}
= \sum_{i=0}^{r} a_i b_i
$$
3. Given a matrix $\mathbf{M} \in \mathbb{R}^{r\times c}$, a matrix product is defined as:
$$\mathbf{Mx} \ =\ \mathbf{M}\begin{bmatrix} x_0\\ \vdots \\ x_{c-1}\\ \end{bmatrix}
\ =\ \begin{bmatrix} \mathbf{M_0}\\ \vdots \\ \mathbf{M_{r-1}}\\ \end{bmatrix}\mathbf{x}
\ =\ \begin{bmatrix} \ \mathbf{M_0 \cdot x}\ \\ \vdots \\ \ \mathbf{M_{r-1} \cdot x}\ \\ \end{bmatrix}
$$
4. $\mathbf{M} \in \mathbb{R}^{r\times c}$ implies that the function $f(x) = \mathbf{Mx}$ can map $\mathbb{R}^{c\times 1} \to \mathbb{R}^{r\times 1}$.
5. $\mathbf{M_1} \in \mathbb{R}^{d\times c}$ and $\mathbf{M_2} \in \mathbb{R}^{r\times d}$ implies $f(x) = \mathbf{M_2M_1x}$ can map $\mathbb{R}^c \to \mathbb{R}^r$.
### Questions (Vectors)
Given the vector rules above and your own knowledge, try solving these:
1. **Prove that $(2) + (3)$ implies $(4)$**. In other words, use your understanding of the inner and matrix-vector products to explain why $(4)$ has to be true.
2. **Prove that $(4)$ implies $(5)$**.
## Differentiation
Recall that differentiation is finding the rate of change of one variable relative to another variable. Some nice reminders:
\begin{align}
\frac{dy}{dx} & \text{ is how $y$ changes with respect to $x$}.\\
\frac{\partial y}{\partial x} & \text{ is how $y$ changes with respect to $x$ (and ignoring other factors)}.\\
\frac{dz}{dx} &= \frac{dy}{dx} \cdot \frac{dz}{dy} \text{ via chain rule if these factors are easier to compute}.
\end{align}
Some common derivative patterns include:
$$\frac{d}{dx}(2x^3 + 4x + 5) = 6x^2 + 4
$$$$\frac{\partial}{\partial y}(x^2y^3 + xy + 5x^2) = 3x^2y^2 + x
% $$$$\frac{d}{dx}(x^3 + 5)^3 = 3(x^3 + 5)^2 \times (3x^2)
$$$$\frac{d}{dx}\ln(x) = \frac{1}{x}
$$
### Questions (Differentiation)
Given the above and your own knowledge:
1. Use (and internalize) the log properties to solve the following:
$$\frac{\partial}{\partial y}\ln(x^5/y^2)$$
The properties are as follows:
1. $\log(x^p) = p\log(x)$
2. $\log(xy) = \log(x) + \log(y)$
3. $\log(x/y) = \log(x) - \log(y)$
2. Let $g_1(x) = \sum_i x_iy_i$. Solve the following partial for a valid $j$ and all valid $i$:
$$\frac{\partial}{\partial x_j} \ln g_1(x) = \frac{\partial}{\partial x_j}\ln\bigg[\sum_i x_iy_i\bigg]$$
**_Hint_**: Consider using the chain rule.
## Probability
There exist events that are **independent** of each other, meaning that the probability of each event stays the same regardless of the outcome of other events.
For example, consider picking a particular 3-digit number at random:
$$P(x = 123) = P(x_0 = 1)P(x_1 = 2)P(x_2 = 3) = (1/10)^3 = 1/1000$$
Alternatively, some events are **dependent** on other events. For example, consider 3 draws from a set of 1 red, 1 green, and 1 blue ball.
\begin{align}
P(b_0 = R) &= 1/3 \\
P(b_1 = G\ |\ b_0 = R) &= 1/2 \\
P(b_2 = B\ |\ (b_0 = R) \cup (b_1 = G)) &= 1/1
\end{align}
This starts off the notion of _conditional probability_, where some components are realized conditional to other components. An important formula for conditional probability is Bayes' Theorem:
$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$
Whenever events happen at random, they happen with some probability. This is governed by some _probability distribution_. For example, $X \sim P(x)$ is a _realization_ (or _variate_, or _random variable_) of the $P(x)$ distribution. Of note:
1. The distribution may be parameterized by some factors. For example, $X \sim \mathcal{N}(\mu=0, \sigma=1)$ is a distribution _similar to_ (AKA an instance of) the unit normal distribution.
2. The distribution may depend on something. For example, the variate may depend on the realizations of some other distribution i.e. with $P(X|Z)$.
These distributions are equipped with _expectation_ functions $\mathbb{E}$ and $\mathbb{V}$ that reveal their expected behavior (mean and variance, respectively). These also usually suggest the _long-term equilibrium behavior_, or the distribution of realizations after many realizations are drawn and accumulated.
1. **Discrete Probability Distribution** governs discrete events $\{e_0, e_1, ...\}$.
1. If the number of possible events is finite such that $x \in \{e_0, e_1, ..., e_n\}$, there are finite set of associated probabilities $\{P(e_0), P(e_1), ..., P(e_n)\}$.
2. The list of probabilities must add up to 1. This implies there is a 100\% chance of an event being... one of the possible events.
2. **Continuous Probability Distribution** governs continuous values. For example, the unit normal distribution mentioned before.
### Questions (Probability)
Given the above probability review and your own knowledge:
1. You're trying to train up a cat/dog classifier which outputs prediction between 0 and 1. Given that the input is in fact an image of a cat or dog, the truth is always one of those two. As such, the output is a probability distribution $Y$ with unknown $P(Y = y)$ for all possible $y$ in the domain of $Y$. Your friend knows that their dataset $\mathbb{D} = (\mathbb{X}, \mathbb{Y})$ is balanced between cats and dogs, and so argues that $P(Y=y)$ is equal for all plausible $y$.
1. If your friend's argument was correct, what value of $P(Y=y)$ would make this a valid probability distribution for all $y$ in the domain of $Y$?
2. Is your friend's assumption correct? Why or why not?
## Conceptual Questions: Submission
Once you have completed the above questions, please submit your answers to the **Homework 0: Conceptual** assignment on Gradescope.
:::info
Your solutions for the conceptual component must be **typeset**. We highly recommend using _LaTeX_ to write clean mathematical formulas.
:::
# [Answers](https://hackmd.io/SewHzh9yRoul3Qb8EOu6_w?view)

Brown Deep Learning Spring 2023
Handouts for CSCI1470/2470 @ Brown

Due April 7th at 6PM EST Answer the following questions, showing your work where necessary. Please explain your answers and work. :::info We encourage the use of $\LaTeX$ to typeset your answers, as it makes it easier for you and us, though you are not required to do so. ::: :::warning Do NOT include your name anywhere within this submission. Points will be deducted if you do so.

3/23/2023:::info Conceptual questions due Monday, 03/20/2023 at 6:00 PM EST Programming assignment due Friday, 03/24/2023 at 6:00 PM EST ::: In this assignment, you will be building a Language Model to learn various word embedding schemas to help minimize your NLP losses. Please read this handout in its entirety before beginning the assignment. Theme ![](https://media.tenor.com/ZWEu6JeTwiAAAAAM/floppy-fish-cat-toy.gif =550x400) Oh no! One of our HTAs, Nitya, has been turned into a toy fish and was captured by evil toy fishermen. To save her, we must commence Operation RNN (Release Nitya Now)

3/16/2023Due March 20th at 6PM Answer the following questions, showing your work where necessary. Please explain your answers and work. :::info We encourage the use of $\LaTeX$ to typeset your answers, as it makes it easier for you and us, though you are not required to do so. ::: :::warning Do NOT include your name anywhere within this submission. Points will be deducted if you do so.

3/13/2023:::info Conceptual questions due Monday, March 6th, 2023 at 6:00 PM EST Programming assignment due Friday, March 10th, 2023 at 6:00 PM EST ::: As Blueno travels through the deep sea, he decides to make good use of the travel time to finally learn to distinguish the creatures he often sees swimming around. However, Blueno thinks he might need a little help. He's traveling alone, so we won't be there to help him, but we can help him in a different way. We decide to build a model that he can access in his submarine to hopefully aid him in learning to distinguish between cat(fish) and dog(fish). In this assignment, you will be building a Convolutional Neural Network (CNN) with pooling layers using the CIFAR dataset to learn to distinguish cats and dogs. Please read this handout in its entirety before beginning the assignment. Theme

3/11/2023
Published on ** HackMD**