# Tutorial Week 10 Qn 4 Proof
## Problem statement:
> Given points $(x_1, y_1), (x_2, y_2), \dots, (x_n, y_n)$ the linear regression problem is to find 𝑎 and 𝑏 that fits the point as close as possible. Use linear programming to find a solution $(a, b)$ that minimizes:
> $$ \sum_{i=1}^n|y_i-ax_i-b| $$
## Solution
For simplicity, let's call the original as problem $P$. We will convert this into a LP problem $P'$.
We define the LP as follows:
> Objective function:
$$\min \sum_{i=1}^n e_i$$
> Constraints:
> - $\forall i \in [n], \, e_i \geq 0$
> - $\forall i \in [n], y_i - ax_i - b \leq e_i$
> - $\forall i \in [n], y_i - ax_i - b \geq -e_i$
Where the variables are $a, b, e_1, e_2, ..., e_n$.
We will prove that the objective value in LP is the same as the solution to $P$, i.e.
$$\min \sum_{i=1}^n e_i \:\text{in}\: P' = \min \sum_{i=1}^n|y_i-ax_i-b| \:\text{in}\: P$$
Notice that the value of $a$ and $b$ in RHS is specific to that of problem $P$. They are not the same as those in $P'$ (a.k.a. the LP)
## Proof of Correctness
The linear constraints in LP can be simplified as $|y_i - ax_i - b| \leq e_i$ where $e_i \geq 0$.
We will first establish a lemma.
> Lemma: If $(\bar{a}, \bar{b}, \bar{e}_1, \bar{e}_2, ..., \bar{e}_n)$ is an optimal solution to the LP, then
$$\sum_{i=1}^n|y_i-\bar{a}x_i-\bar{b}| = \sum_{i=1}^n{\bar{e}_i}$$
**Proof**
Suppose not, then $\sum_{i=1}^n|y_i-\bar{a}x_i-\bar{b}| < \sum_{i=1}^n{\bar{e}_i}$ (as we know that $|y_i - ax_i - b| \leq e_i$)
This implies that there exists some index $k \in [n]$ such that $|y_k-\bar{a}x_k-\bar{b}| < \bar{e}_k$.
Choosing $e_k = |y_k-\bar{a}x_k-\bar{b}| < \bar{e}_k$ will give a more optimal solution to the objective function. This contradicts with our assumption that our original solution to LP is optimal.
---
We have proven our lemma is true. Next we will prove our main argument, i.e.
$$\min \sum_{i=1}^n e_i \:\text{in}\: P' = \min \sum_{i=1}^n|y_i-ax_i-b| \:\text{in}\: P$$
Let the optimal solution to problem $P$ is $opt$.
Let $(\bar{a}, \bar{b}, \bar{e}_1, \bar{e}_2, ..., \bar{e}_n)$ be one of the optimal solution to the LP. There are two cases:
**Case 1: $\sum_{i=1}^n|y_i-\bar{a}x_i-\bar{b}| = opt$**
Then from our lemma, $\sum_{i=1}^n{\bar{e}_i} = \sum_{i=1}^n|y_i-\bar{a}x_i-\bar{b}| = opt$
**Case 2: $\sum_{i=1}^n|y_i-\bar{a}x_i-\bar{b}| \neq opt$**
In this case, $\sum_{i=1}^n|y_i-\bar{a}x_i-\bar{b}| > opt$ since we know that $opt$ gives the minimum value of $\sum_{i=1}^n|y_i-ax_i-b|$.
From our assumption, there exists $a^*$ and $b^*$ such that $\sum_{i=1}^n|y_i-a^*x_i-b^*|=opt$ (because we know the solution in $P$). Setting $a = a^*$, $b = b^*$ and $e_i = |y_i - a^*x_i - b^*|$, we obtain:
$$\sum_{i=1}^ne^*_i = \sum_{i=1}^n{|y_i - a^*x_i - b^*|}=opt$$
which is more optimal than our original solution. Contradiction.
Hence, we've proven that we can solve $P$ by solving the LP $P'$