Homework 0 Conceptual: Warm-up

Conceptual section due Friday, February 2, 2024 at 6:00 PM EST
Programming section due Friday, February 2, 2024 at 6:00 PM EST

Welcome to the first (conceptual) homework assignment of CSCI1470/2470! This assignment is just meant to be a short math review of concepts from Linear Algebra and Probability that you will need for this course, and also get you set up with a course virtual environment so that you will be ready to start the first programming assignment (Homework 1).

First we'll introduce some starting concepts and ask you to expound on the details. While this doesn't have to be necessarily easy, it should prepare you for some of the early material and can be used to judge comfort with things that will come up a lot in the course.

We encourage the use of

L A T E X

to typeset your answers. A non-editable homework template is linked, so copy the .tex file into your own Overleaf project and go from there!

Latex Template

Do NOT include your name anywhere in your submission. Submissions are graded anonymous, and named submission will incur deductions.

Theme

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

This fish is warming up for his race this Friday

Conceptual Questions

Vectors

The following are some common (and important) properties and definitions about vectors:

Given two column vectors
$a \in R^{r \times 1}$ and
$b \in R^{c \times 1}$ , the outer product is:

$a \times b = [\begin{matrix} a_{0} \\ ⋮ \\ a_{r - 1} \end{matrix}] \times [\begin{matrix} b_{0} \\ ⋮ \\ b_{c - 1} \end{matrix}] = [\begin{matrix} a_{0} b^{T} \\ ⋮ \\ a_{r - 1} b^{T} \end{matrix}] = [\begin{matrix} a_{0} b_{0} & \dots & a_{0} b_{c - 1} \\ ⋮ & ⋱ & ⋮ \\ a_{r - 1} b_{0} & \dots & a_{r - 1} b_{c - 1} \end{matrix}] \in R^{r \times c}$
where
$v^{T}$ is the transpose of a vector, which converts between column and row vector alignment. The same idea extends to matrices as well.
Given two column vectors
$a$ and
$b$ both in
$R^{r \times 1}$ , the inner product (or the dot product) is defined as:

$a \cdot b = a^{T} b = [\begin{matrix} a_{0} \dots a_{r - 1} \end{matrix}] [\begin{matrix} b_{0} \\ ⋮ \\ b_{r - 1} \end{matrix}] = \sum_{i = 0}^{r - 1} a_{i} b_{i}$
Given a matrix
$M \in R^{r \times c}$ , a matrix product is defined as:

$Mx = M [\begin{matrix} x_{0} \\ ⋮ \\ x_{c - 1} \end{matrix}] = [\begin{matrix} M_{0} \\ ⋮ \\ M_{r - 1} \end{matrix}] x = [\begin{matrix} M_{0} \cdot x \\ ⋮ \\ M_{r - 1} \cdot x \end{matrix}]$
$M \in R^{r \times c}$ implies that the function
$f (x) = Mx$ can map
$R^{c \times 1} \to R^{r \times 1}$ .
$M_{1} \in R^{d \times c}$ and
$M_{2} \in R^{r \times d}$ implies
$f (x) = M_{2} M_{1} x$ can map
$R^{c} \to R^{r}$ .

Questions (Vectors)

Given the vector rules above and your own knowledge, try solving these:

Prove that
$(2) + (3)$ implies
$(4)$ . In other words, use your understanding of the inner and matrix-vector products to explain why
$(4)$ has to be true.
Prove that
$(4)$ implies
$(5)$ .

Differentiation

Recall that differentiation is finding the rate of change of one variable relative to another variable. Some nice reminders:

\begin{aligned} \frac{d y}{d x} & is how y changes with respect to x . \\ \frac{\partial y}{\partial x} & is how y changes with respect to x (and ignoring other factors) . \\ \frac{d z}{d x} & = \frac{d y}{d x} \cdot \frac{d z}{d y} via chain rule if these factors are easier to compute . \end{aligned}

Some common derivative patterns include:

\frac{d}{d x} (2 x^{3} + 4 x + 5) = 6 x^{2} + 4

\frac{\partial}{\partial y} (x^{2} y^{3} + x y + 5 x^{2}) = 3 x^{2} y^{2} + x

\frac{d}{d x} (x^{3} + 5)^{3} = 3 (x^{3} + 5)^{2} \times (3 x^{2})

\frac{d}{d x} \ln (x) = \frac{1}{x}

Questions (Differentiation)

Given the above and your own knowledge:

Use (and internalize) the log properties to solve the following:

$\frac{\partial}{\partial y} \ln (x^{5} / y^{2})$
The properties are as follows:
1. $\log (x^{p}) = p \log (x)$
2. $\log (x y) = \log (x) + \log (y)$
3. $\log (x / y) = \log (x) - \log (y)$
Let
$g_{1} (x) = \sum_{i} x_{i} y_{i}$ . Solve the following partial for a valid
$j$ and all valid
$i$ :

$\frac{\partial}{\partial x_{j}} \ln g_{1} (x) = \frac{\partial}{\partial x_{j}} \ln [\sum_{i} x_{i} y_{i}]$
Hint: Consider using the chain rule.

Probability

There exist events that are independent of each other, meaning that the probability of each event stays the same regardless of the outcome of other events.

For example, consider picking a particular 3-digit number at random:

P (x = 123) = P (x_{0} = 1) P (x_{1} = 2) P (x_{2} = 3) = (1 / 10)^{3} = 1 / 1000

Alternatively, some events are dependent on other events. For example, consider 3 draws from a set of 1 red, 1 green, and 1 blue ball.

\begin{aligned} P (b_{0} = R) & = 1 / 3 \\ P (b_{1} = G | b_{0} = R) & = 1 / 2 \\ P (b_{2} = B | (b_{0} = R) \cap (b_{1} = G)) & = 1 / 1 \end{aligned}

This starts off the notion of conditional probability, where some components are realized conditional to other components. An important formula for conditional probability is Bayes' Theorem:

P (A | B) = \frac{P (B | A) P (A)}{P (B)}

Whenever events happen at random, they happen with some probability. This is governed by some probability distribution. For example,

X \sim P (x)

is a realization (or variate, or random variable) of the

P (x)

distribution. Of note:

The distribution may be parameterized by some factors. For example,
$X \sim N (μ = 0, σ = 1)$ is a distribution similar to (AKA an instance of) the unit normal distribution.
The distribution may depend on something. For example, the variate may depend on the realizations of some other distribution i.e. with
$P (X | Z)$ .

These distributions are equipped with expectation functions

E

and

V

that reveal their expected behavior (mean and variance, respectively). These also usually suggest the long-term equilibrium behavior, or the distribution of realizations after many realizations are drawn and accumulated.

Discrete Probability Distribution governs discrete events
${e_{0}, e_{1}, . . .}$ .
1. If the number of possible events is finite such that
  $x \in {e_{0}, e_{1}, . . ., e_{n}}$ , there are finite set of associated probabilities
  ${P (e_{0}), P (e_{1}), . . ., P (e_{n})}$ .
2. The list of probabilities must add up to 1. This implies there is a 100% chance of an event being… one of the possible events.
Continuous Probability Distribution governs continuous values. For example, the unit normal distribution mentioned before.

Questions (Probability)

Given the above probability review and your own knowledge:

You're trying to train up a cat/dog classifier which outputs prediction between 0 and 1. Given that the input is in fact an image of a cat or dog, the truth is always one of those two. As such, the output is a probability distribution
$Y$ with unknown
$P (Y = y)$ for all possible
$y$ in the domain of
$Y$ . Your friend knows that their dataset
$D = (X, Y)$ is balanced between cats and dogs, and so argues that
$P (Y = y)$ is equal for all plausible
$y$ .
1. If your friend's argument was correct, what value of
  $P (Y = y)$ would make this a valid probability distribution for all
  $y$ in the domain of
  $Y$ ?
2. Is your friend's assumption correct? Why or why not?

Conceptual Questions: Submission

Once you have completed the above questions, please submit your answers to the Homework 0: Conceptual assignment on Gradescope.

Your solutions for the conceptual component must be typeset. We highly recommend using LaTeX to write clean mathematical formulas.

Homework 0 Conceptual: Warm-up

Latex Template

Theme

Conceptual Questions

Vectors

Questions (Vectors)

Differentiation

Questions (Differentiation)

Probability

Questions (Probability)

Conceptual Questions: Submission

Answers

Read more

HW3 Programming: CNNs

Deep Learning Final Project

HW6 Conceptual: Variational Autoencoders

HW6 Programming: Variational Autoencoders