# CS410 Homework 5: Linear Algebra & Probability
> **Due Date: 10/16/2024**
> **Need help?** Remember to check out Edstem and our website for TA assistance.
## Assignment Overview
### Learning Objectives
What you will know:
* Reinforce your knowledge of linear algebra
* Become proficient in introductory NumPy functions
* Be able to translate mathematical equations into code
What you will be able to do:
* develop...
* interpret...
## Data Structures & Algorithms
## Tasks
### Latest Version of Topics - MAKE QUESTIONS ABOUT THESE:
We need several linear algebra questions on the following subjects:
- Invertable Linear Transformations
- When you can invert a matrix + how
- Projections (for PCA)
- Matrix multiplication and dot products and other manipulations of matrices
### Step 1: Written Questions
We need 3-4 mathematical writing questions on the above topics.
Make these examples extremely applied as to not bore the students! Examples are more fun than "please find the eigenvalue and eigenvectors for this matrix :("
For reference, look at these slides:
https://ocw.mit.edu/courses/9-40-introduction-to-neural-computation-spring-2018/23203cb47ede79bfc5c8c6b1ae2774f2_MIT9_40S18_Lec17.pdf
add some questions about probability also
:::info
**Possible ideas**
1.Proving PCA/Explaining PCA
-> why do you want to maximize variance?
-> How do you maximize variance?
-> Can you calculate/formulate the covariance matrix? What is the diagonal?
[Refer to this link pages 3-4](https://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch18.pdf)
2.smoothness of a fabric physics based eigenvalue explanation problem or stock market explanation
3.[minecraft related image compression pack.png](https://docs.google.com/document/d/1PpZqHWXPLjOsXf_T7uyH4rWuxUMxzBlxvv5gm19P_Z8/edit)
4.Bayesian problems (not strictly bayes rule but using bayesian notations )
:::
### Question 1: Invertible Linear Transformations
**Scenario:**
You are developing a cryptographic system where data is encrypted using a linear transformation. The encryption matrix $\mathbf{A} = \begin{pmatrix} 2 & 1 \\ 3 & 2 \end{pmatrix}$ is used to encode data vectors. However, during testing, you need to verify the correctness of the decryption process by recovering the original data.
**Task:**
1. Compute the inverse matrix $\mathbf{A}^{-1}$ using both the determinant method and row reduction method. Verify that both methods yield the same result.
2. Given an encoded data vector $\mathbf{b} = \begin{pmatrix} 7 \\ 10 \end{pmatrix}$, use $\mathbf{A}^{-1}$ to decrypt and recover the original data vector $\mathbf{x}$ such that $\mathbf{A} \mathbf{x} = \mathbf{b}$.
3. Now consider that a slight error was introduced during encryption, and the vector became $\mathbf{b}' = \begin{pmatrix} 7.1 \\ 10.2 \end{pmatrix}$. Calculate the impact of this error on the decrypted data vector using $\mathbf{A}^{-1}$.
4. Discuss the significance of the condition number of matrix $\mathbf{A}$ in the context of error propagation during decryption.
### Question 2: Projections for PCA
#### Eigenvector refresher:
An eigenvector is a special kind of vector in linear algebra. When a matrix acts on an eigenvector, it simply stretches or shrinks the vector but doesn't change its direction.
In other words, if you have a square matrix $A$ and a vector $\mathbf{v}$, the vector $\mathbf{v}$ is an eigenvector of $A$ if applying $A$ to $\mathbf{v}$ results in a new vector that is just a scaled version of $\mathbf{v}$. This scaling factor is called the eigenvalue $\lambda$. Mathematically, this relationship is written as:
$$
A \mathbf{v} = \lambda \mathbf{v}
$$
Here:
- $A$ is the matrix.
- $\mathbf{v}$ is the eigenvector.
- $\lambda$ is the eigenvalue.
**Scenario:**
You are analyzing a dataset for a robotics project where you must reduce the dimensionality to focus on the most significant movement patterns. After computing the covariance matrix, you find the principal eigenvectors:
*Reminder: An eigenvector *
$$
\mathbf{v}_1 = \begin{pmatrix} 0.6 \\ 0.8 \end{pmatrix}, \quad \mathbf{v}_2 = \begin{pmatrix} 0.8 \\ -0.6 \end{pmatrix}
$$
**Task:**
1. Given a data vector $\mathbf{x} = \begin{pmatrix} 3 \\ 4 \end{pmatrix}$, compute the projection of $\mathbf{x}$ onto the principal eigenvectors $\mathbf{v}_1$ and $\mathbf{v}_2$.
2. Generalize the process by deriving the formula for projecting any vector $\mathbf{y}$ onto an arbitrary eigenvector $\mathbf{v}$.
3. Using the projections onto $\mathbf{v}_1$ and $\mathbf{v}_2$, reconstruct an approximation of the original vector $\mathbf{x}$ in the reduced space and calculate the reconstruction error.
4. If an additional eigenvector $\mathbf{v}_3 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}$ is added, discuss how this would change the projection process and the implications for dimensionality reduction.
<!-- ### Question 3: Eigenvalues and Eigenvectors
**Scenario:**
You are working on a data clustering algorithm for social network analysis, where the adjacency matrix of the network is given by $\mathbf{M} = \begin{pmatrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{pmatrix}$. Eigenvalues and eigenvectors of this matrix can help identify community structures and influential individuals.
**Task:**
1. Use `numpy.linalg.eig` to compute the eigenvalues and eigenvectors of the matrix $\mathbf{M}$.
2. Manually derive the characteristic polynomial of $\mathbf{M}$, solve for the eigenvalues $\lambda$, and verify them against the results from `numpy.linalg.eig`.
3. For each eigenvalue, find the corresponding eigenvectors by solving the system $(\mathbf{M} - \lambda \mathbf{I}) \mathbf{v} = 0$. Normalize the eigenvectors.
4. Analyze the eigenvectors and explain what they reveal about the network structure, such as the presence of clusters or central nodes. Specifically, interpret the significance of the eigenvector corresponding to the largest eigenvalue. -->
### Question 3: Matrix Multiplication and Manipulations
**Scenario:**
You are processing an image represented by the matrix $\mathbf{I} = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$, and you want to apply a series of transformations to enhance specific features. The transformations include applying a filter matrix $\mathbf{F} = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}$, followed by a scaling operation using a matrix $\mathbf{S} = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}$.
**Task:**
1. Perform the matrix multiplication $\mathbf{I}' = \mathbf{F} \mathbf{I}$ to apply the filter to the image.
2. Apply the scaling transformation to the filtered image by computing $\mathbf{I}'' = \mathbf{S} \mathbf{I}'$.
3. If instead of applying the transformations sequentially, you want to combine them into a single transformation matrix $\mathbf{T}$, derive $\mathbf{T} = \mathbf{S} \mathbf{F}$ and use it to transform the original image matrix $\mathbf{I}$ in one step.
4. Discuss how the order of multiplication affects the final result and what this implies for image processing tasks where multiple transformations are applied.
### Question 4: Naive Bayes
To address the challenge of distinguishing between legitimate emails and spam, various techniques have been developed, the first of which was known as a Naive Bayes classifier. Naive Bayes is a probabilistic algorithm commonly used in machine learning for classification tasks, including spam detection. It leverages the probability of observing certain features (such as words or phrases) in different classes (e.g., normal messages versus spam) to make predictions about the class of new instances.
In this context, the provided data serves as an example illustrating the application of Naive Bayes in spam email detection. The table presents counts of specific words ("Dear," "Friend," "Lunch," and "Money") in both normal and spam emails. These counts are used to calculate the probabilities of encountering each word in each class, forming the foundation for the Naive Bayes classifier's decision-making process.
| | "Dear" | "Friend" | "Lunch" | "Money" | Total |
|----------------|--------|----------|---------|---------|-------|
| Normal Emails | 8 | 5 | 3 | 1 | 17 |
| Spam Emails | 2 | 1 | 0 | 4 | 7 |
| Total | 10 | 6 | 3 | 5 | 24 |
Naive Bayes is a simple probabilistic classifier based on Bayes' theorem with strong independence assumptions between the features. Here's a quick explanation of how it works using the provided equation for the case with multiple features (our case):
$$\text{Pr}(A | B_1, B_2, \ldots, B_n) = \frac{ \prod_{i=1}^n \text{Pr}(B_i|A) \cdot \text{P}(A)}{ \prod_{i=1}^n \text{Pr}(B_i)} \propto \prod_{i=1}^n \text{Pr}(B_i|A) \cdot \text{P}(A)$$
Naive Bayes predicts the class of a given data point by calculating the probabilities of each class given the features and selecting the class with the highest probability. Despite its simplicity and strong assumptions, Naive Bayes often performs well in practice, especially for text classification and spam filtering tasks.
1. Calculate the conditional probability of each word appearing given the email is either normal or spam.
2. Let's say we have just received an email that reads "Dear Friend". Use the Naive Bayes algorithm to calculate the relative probability that the email is normal and spam, then classify the email.
3. You just received another email that reads "Lunch Money Money Money". Do the same process as last time to calculate the relative probability that the email is normal and spam, then classify the email. Notice anything off (Hint: you should get a classification error)? Come up with a method to change the algorithm to avoid this?
### Step 2: Coding Questions
Use this DL lab for reference (note: it's in tensorflow, not numpy): https://colab.research.google.com/drive/1mvwO8ELQNjiyIZnY-4RC09G-T9R6h2WA?usp=sharing
These need to be something like "given this formula in the handout, implement this in code with numpy"
The goal is that they will translate existing equations into numpy so that they will learn the functions and how numpy works.
We will also need an introduction to numpy (ex: explain how a list in python is an ArrayList and numpy is an array!) as we are assuming this is the students' first times using this module. It would be helpful to link the docs so they can see what functions to use, but also suggest the most common/helpful functions that you've used. HAVE FUN WITH IT!
Suggestion: Implementing gradient descent without using the function call (seems like this would be helpful/very re-usable in the later homeworks and other classes)
:::info
**Task 1**
:::
:::spoiler **Hint: Signature**
:::
:::info
**Task 2**
:::
:::spoiler **Hint: Signature**
:::
## Downloads
Please click [here](https://github.com) to download the assignment code.
### Support Code
Please click [here](https://github.com) to get the stencil code. It should contain these files: `asssignment.py`, and `local_test.py`.
### Stencil Code
:::warning
**Reminder:** Your solution should modify the stencil code *only*. You will be penalized for modifying the support code, especially if it interferes with the autograder.
:::
## Submission
### Grading
### Handin
Your handin should contain:
- all modified files, including comments describing the logic of your algorithmic modifications, and your tests
- a README, containing a brief overview of your implementation, and the outcomes of all tests
### Gradescope
Submit your assignment via Gradescope.
To submit through GitHub, follow these commands:
1. `git add -A`
2. `git commit -m "commit message"`
3. `git push`
Now, you are ready to upload your repo to Gradescope.
*Tip*: If you are having difficulties submitting through GitHub, you may submit by zipping up your hw folder.
### Rubric
> <span style="color: red">will we include a rubric for all assignments with the handout?</span>
:::success
Congrats on submitting your homework; Steve is proud of you!!


:::