Homework 0 Conceptual: Warm-up

Conceptual section due Friday, February 2, 2024 at 6:00 PM EST
Programming section due Friday, February 2, 2024 at 6:00 PM EST

Welcome to the first (conceptual) homework assignment of CSCI1470/2470! This assignment is just meant to be a short math review of concepts from Linear Algebra and Probability that you will need for this course, and also get you set up with a course virtual environment so that you will be ready to start the first programming assignment (Homework 1).

First we'll introduce some starting concepts and ask you to expound on the details. While this doesn't have to be necessarily easy, it should prepare you for some of the early material and can be used to judge comfort with things that will come up a lot in the course.

We encourage the use of

LATEX to typeset your answers. A non-editable homework template is linked, so copy the .tex file into your own Overleaf project and go from there!

Latex Template

Do NOT include your name anywhere in your submission. Submissions are graded anonymous, and named submission will incur deductions.

Theme

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

This fish is warming up for his race this Friday

Conceptual Questions

Vectors

The following are some common (and important) properties and definitions about vectors:

  1. Given two column vectors

    aRr×1 and
    bRc×1
    , the outer product is:

    a×b=[a0ar1]×[b0bc1]=[a0bTar1bT]=[a0b0a0bc1ar1b0ar1bc1]Rr×c
    where
    vT
    is the transpose of a vector, which converts between column and row vector alignment. The same idea extends to matrices as well.

  2. Given two column vectors

    a and
    b
    both in
    Rr×1
    , the inner product (or the dot product) is defined as:

    ab=aTb=[a0  ar1][b0br1]=i=0r1aibi

  3. Given a matrix

    MRr×c, a matrix product is defined as:
    Mx = M[x0xc1] = [M0Mr1]x = [ M0x  Mr1x ]

  4. MRr×c implies that the function
    f(x)=Mx
    can map
    Rc×1Rr×1
    .

  5. M1Rd×c and
    M2Rr×d
    implies
    f(x)=M2M1x
    can map
    RcRr
    .

Questions (Vectors)

Given the vector rules above and your own knowledge, try solving these:

  1. Prove that
    (2)+(3)
    implies
    (4)
    . In other words, use your understanding of the inner and matrix-vector products to explain why
    (4)
    has to be true.
  2. Prove that
    (4)
    implies
    (5)
    .

Differentiation

Recall that differentiation is finding the rate of change of one variable relative to another variable. Some nice reminders:

dydx is how y changes with respect to x.yx is how y changes with respect to x (and ignoring other factors).dzdx=dydxdzdy via chain rule if these factors are easier to compute.

Some common derivative patterns include:

ddx(2x3+4x+5)=6x2+4
y(x2y3+xy+5x2)=3x2y2+x
ddx(x3+5)3=3(x3+5)2×(3x2)
ddxln(x)=1x

Questions (Differentiation)

Given the above and your own knowledge:

  1. Use (and internalize) the log properties to solve the following:

    yln(x5/y2)
    The properties are as follows:

    1. log(xp)=plog(x)
    2. log(xy)=log(x)+log(y)
    3. log(x/y)=log(x)log(y)
  2. Let

    g1(x)=ixiyi. Solve the following partial for a valid
    j
    and all valid
    i
    :
    xjlng1(x)=xjln[ixiyi]

    Hint: Consider using the chain rule.

Probability

There exist events that are independent of each other, meaning that the probability of each event stays the same regardless of the outcome of other events.

For example, consider picking a particular 3-digit number at random:

P(x=123)=P(x0=1)P(x1=2)P(x2=3)=(1/10)3=1/1000

Alternatively, some events are dependent on other events. For example, consider 3 draws from a set of 1 red, 1 green, and 1 blue ball.

P(b0=R)=1/3P(b1=G | b0=R)=1/2P(b2=B | (b0=R)(b1=G))=1/1

This starts off the notion of conditional probability, where some components are realized conditional to other components. An important formula for conditional probability is Bayes' Theorem:

P(A|B)=P(B|A)P(A)P(B)

Whenever events happen at random, they happen with some probability. This is governed by some probability distribution. For example,

XP(x) is a realization (or variate, or random variable) of the
P(x)
distribution. Of note:

  1. The distribution may be parameterized by some factors. For example,
    XN(μ=0,σ=1)
    is a distribution similar to (AKA an instance of) the unit normal distribution.
  2. The distribution may depend on something. For example, the variate may depend on the realizations of some other distribution i.e. with
    P(X|Z)
    .

These distributions are equipped with expectation functions

E and
V
that reveal their expected behavior (mean and variance, respectively). These also usually suggest the long-term equilibrium behavior, or the distribution of realizations after many realizations are drawn and accumulated.

  1. Discrete Probability Distribution governs discrete events

    {e0,e1,...}.

    1. If the number of possible events is finite such that
      x{e0,e1,...,en}
      , there are finite set of associated probabilities
      {P(e0),P(e1),...,P(en)}
      .
    2. The list of probabilities must add up to 1. This implies there is a 100% chance of an event being one of the possible events.
  2. Continuous Probability Distribution governs continuous values. For example, the unit normal distribution mentioned before.

Questions (Probability)

Given the above probability review and your own knowledge:

  1. You're trying to train up a cat/dog classifier which outputs prediction between 0 and 1. Given that the input is in fact an image of a cat or dog, the truth is always one of those two. As such, the output is a probability distribution
    Y
    with unknown
    P(Y=y)
    for all possible
    y
    in the domain of
    Y
    . Your friend knows that their dataset
    D=(X,Y)
    is balanced between cats and dogs, and so argues that
    P(Y=y)
    is equal for all plausible
    y
    .
    1. If your friend's argument was correct, what value of
      P(Y=y)
      would make this a valid probability distribution for all
      y
      in the domain of
      Y
      ?
    2. Is your friend's assumption correct? Why or why not?

Conceptual Questions: Submission

Once you have completed the above questions, please submit your answers to the Homework 0: Conceptual assignment on Gradescope.

Your solutions for the conceptual component must be typeset. We highly recommend using LaTeX to write clean mathematical formulas.

Answers