Homework 1

Course: Neural Nets
Deadline: Feb 20
Collaborations policy. You may collaborate with other students and submit a report as a group of 1 to 5 members. However, please note that submitting individually will not provide any additional grading advantage. Therefore, we encourage you to work in groups. Please note that the group policy has been slightly modified from the course syllabus.

Submission Policy. Please submit your reports to Canvas for evaluation. Reports can be pdf files or a link to a Markdown page on hackmd.io or similar websites. Sending links to hackmd's markdown allow us to post comments and ask questions to your report.

Generative AI policy. We recommend avoiding the use of generative AI or, at the very least, ensuring that you fully understand any generated solution before including it in your report. If you do not find the exercises interesting, you have other options for credit such as helping with lecture notes instead.

Q & A policy. You can add your questions as comments to this page. You can select a part of the text and add a comment to it. Then, I will clarify.

Question 1 (Lab): Random Features

Suppose that \(x_1, \dots, x_n \in \mathbb{R}^d\) are drawn i.i.d. from distribution \(\mu\) and \(y_1, \dots, y_n\) are drawn i.i.d. from \(\nu\). In the Introducation Lecture, we defined empirical Maximum Mean Discrepancy (MMD) between \(\mu\) and \(\nu\) as

\[\text{MMD}_n(\mu,\nu):=\frac{1}{n^2} \sum_{i,j=1}^n k(x_i,x_j) - \frac{2}{n^2} \sum_{i,j=1}^n k(x_i,y_j) + \frac{1}{n^2} \sum_{i,j=1}^n k(y_i,y_j)\]
Recall that this metric can be used for a statistical test to see whether \(\mu\) and \(\nu\) are identical. In this excerices, we assume \(k(x,y) = e^{-\|x-y\|^2/2}\).

Part a.

Can we compute MMD\(_n\) for \(n=100000\)? What is the time complexity of computing MMD\(_n\) (O-complexity in \(n\))?

Part b.

To tackle the above challenge, propose a method to approximate MMD\(_n\) for a large \(n\). Hint: See Random Features lecturenotes or slides

Part c.
Implement your proposed method and report your obtained approximate MMD for points \(x_1, \dots, x_n\) and \(y_1, \dots, y_n\) available at this colab. You can include your implementation in your report.

X = data[0,0,:,:]# n \times d matrix containing x_is
Y = data[0,1,:,:]# n \times d matrix containing y_is

# Your implementation

Part d.
What is the time complexity of your proposed method in \(n\)? Compare with those in part a.

Part e.

What is the approximation error of your proposed method? No need to explain how to derive the error bound just use \(O()\) notation. Hint: see Claim 1 in the original paper on random feature

Part f. How can you improve the estimation error in part E.?

Question 2 (Theory): Kernel Methods

Recall: A function \(f\) obeys the reproducing property (RP) associated with kernel \(k\) if

\[\textbf{RP:} \quad \quad \quad f(x) = \int k(x,y) f(y) dy\]

Assume \(k(x,y)= e^{-\| x - y \|^2/2}\). We want to provide insights on this property.

Background
Function \(f(x)\) is Lipschtiz continous if there is a finite constant \(L\) such that \(|f(x)-f(z)|\leq L \| x- z \|\) holds for all inputs \(x\) and \(y\).

Part a.
Prove: A differentiable \(f\) is Lipschitz continous if \(\max_x \| \nabla f(x)\|\) is bounded. Recall \(\nabla f(x)\) denotes the gradient of function \(f\) evaluated at \(x\). Hint: Use Taylor remainder theorem

Part b.
Prove that \(f(x) = k(x,b)\) is Lipschitz continous. Hint: Use part a.

Part c.
Suppose that \(\int |f(y)|dy\) is finite and \(f\) obeys RP.
Prove: \(f\) is Lipschitz continous. Hint: use part b

Part d
Do you think the following function obeys RP? Why? Hint: use part c
square_1000

Part e
Can you now explain why kernel regression (shown bellow and studied in lecturenote on kernel methods ) performance is not good for this function? Hint: recall representer theorem in the lecture notes and use part d

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.