<style>
--var-theme-rgba: #44663355
.markdown-body h1{
font-size: 28pt;
border: none;
}
.markdown-body h1:nth-of-type(1){
display: block;
position: relative;
padding: 0 70px 10px;
text-align: center;
counter-reset: week-count;
padding-bottom: 25px;
}
.markdown-body h1:nth-of-type(1)::before,
.markdown-body h1:nth-of-type(1)::after {
content: '';
display: block;
position: absolute;
width: 35px;
height: 3px;
background-color: #C0C0C0;
}
.markdown-body h1:nth-of-type(1)::before {
inset: 50% auto auto 0;
margin-right: 20px;
}
.markdown-body h1:nth-of-type(1)::after {
inset: 50% 0 auto auto;
margin-left: 20px;
}
.markdown-body h1:not(h1:nth-of-type(1)){
position: relative;
padding-top: 7px;
border-bottom: 1px solid #C0C0C0;
text-align: center;
counter-increment: lec-count;
counter-reset: exercise-index
}
.markdown-body h1:not(h1:nth-of-type(1))::before{
content: " " counter(lec-count, decimal-leading-zero) " ";
position: absolute;
top: 50%;
left: 0;
transform: translateY(-50%);
color: #44663355;
font-family: "Roboto";
font-size: 50px;
}
.markdown-body h4{
counter-increment: exercise-index;
padding-left: 32px;
}
.markdown-body h4::before{
content: "Q" counter(exercise-index) ". ";
position: absolute;
left: 14px;
color: #C0C0C0;
font-family: "Roboto";
font-size: 18px;
}
.markdown-body table{
table-layout: fixed;
display: table;
}
.markdown-body em{
font-family: "Copperplate Gothic", "Helvetica", serif;
font-style: normal;
font-weight: 100;
}
.markdown-body strong{
color: #446633;
text-transform: capitalize;
}
.markdown-body mark{
padding-top: 1px;
background-color: #44CC3340;
}
.markdown-body hr{
height: 2px;
width: 50%;
margin-left: auto;
margin-right: auto;
border-radius: 0px;
}
</style>
# Approximation and Online Algorithms
The note of the lecture, *Approximation and Online Algorithms*. Expect typo, grammar errors, and rarely false information.
###### tags: `u-tokyo`, `course-notes`, `AaOA`
# Introduction
This course aims to relate the theories to non-theories:
| Theories | Non-Theories |
| :--: | :--: |
| Complexity <br> Data Structure <br> Discrete Math | Databases <br> Machine Learing <br> Operating Systems |
## The Path of Formulating a IST Problem
```graphviz
digraph D{
rankdir = TB
splines=false // straight-lines
node[
shape = rect
style = filled;
fillcolor = "#C0C0C033"
fontname = "Roboto"
width = 5
]
{
node [group=E] // force/encourage vertical alignment
A[ label = "Formalize Problem Into Optimization Model" ]
B[ label = "Is it Possbile To Find A Solving Algo ?" ]
C[ label = "Is The Problem NP-Hard ?" ]
D[ label = "Is It Solvable Using Approximation Algo ?" ]
E[ label = "Is the Problem Inapproximable ?" ]
}
Done1, Done2[
label = "Done"
width = 1
fillcolor = "#44663355"
]
Reform[
label = "Reformulate The Problem"
width = 3;
group= B
]
// Invis nodes to simulate ortho-like edges
invisB1, invisC1, invisD1, invisE1[
shape = point;
width = 0;
]
invisA2 [ group=B
shape = point;
width = 0;
]
edge[
style = solid //default;
fontname = "Roboto"
weight = 100;
]
A -> B
B -> C
C -> D
D -> E
// Should Be at the Left Side
invisC1 -> C [dir=none]
invisC1 -> invisB1 [dir=none]
invisB1 -> B
B -> Done1
invisE1 -> E [dir=none]
invisE1 -> invisD1 [dir=none]
invisD1 -> D
D -> Done2
E -> Reform
Reform -> invisA2 [dir=none]
A -> invisA2 [dir=back label = " "]
{rank=same A invisA2}
{rank = same; invisB1, B, Done1}
{rank = same; invisC1, C}
{rank = same; invisD1, D, Done2}
{rank = same; invisE1, E, Reform}
}
```
## Online Algorithms
An algorithm that outputs are required **without knowing all inputs**. For example, the [Secretary Problem](https://www.wikiwand.com/ja/%E7%A7%98%E6%9B%B8%E5%95%8F%E9%A1%8C).
## Optimization Models
An optimization model allows us to solve problems using theories.
By ansering the following questions using *math formulations*, we obtain the necessary information to construct a optimization model —
1. **Input** — What information are relavant?
2. **Output** — What information are we trying to obtain?
3. **Constraint** — How can the output be?
4. **Objective Function** — How do we deduce "the best output"?
### Example: The Gyuudon & Melon-pann Problem
Let there be only 2 food to survive from; we need to decide how would be get these food. The four parts can be stated as —
+ **Input** —
- Cost per serving, $c_{_G}, c_{_M} \in \mathbb{R}^+$
- Protein and carbs per serving, $a_{_{P, G}}, a_{_{P, M}}, a_{_{C, G}}, a_{_{C, M}} \in \mathbb{R}^+$
- Required protein and carbs per day, $r_{_P}, r_{_C} \in \mathbb{R}^+$
+ **Output** —
- Number of servings of each food, $x_{_G}, x_{_M} \in \mathbb{R}^+$
+ **Constraint** —
- We must have enough protein and carbs:
$$
\begin{cases}
a_{_{P, G}}x_{_G} + a_{_{P, M}}x_{_M} \geq r_{_P} \\
a_{_{C, G}}x_{_G} + a_{_{C, M}}x_{_M} \geq r_{_C} \\
\end{cases}
$$
+ **Objective Function** —
- The least cost to obtain the foods
$$
f(x_{_G}, x_{_M}) = x_{_G}c_{_G} + x_{_M}c_{_M}
$$
To solve the above model, we can write them into matrix forms —
$$
\text{Minimize} \begin{bmatrix}
c_{_G} & c_{_M}
\end{bmatrix} \begin{bmatrix}
x_{_G} \\ x_{_M}
\end{bmatrix}, \text{Subject to} \begin{bmatrix}
a_{_{P, G}} & a_{_{P, M}} \\
a_{_{C, G}} & a_{_{C, M}}
\end{bmatrix}\begin{bmatrix}
x_{_G} \\ x_{_M}
\end{bmatrix} \geq \begin{bmatrix}
r_{_P} \\ r_{_C}
\end{bmatrix}
$$
We can see this becoming a **linear programming** problem, if we multiply both side of the constraint by $-1\;$ —
:::success
Given matrix $A$ and vectors $b$ and $c\;$,
find vector $x$ minimizing $cx$ Subject to $Ax \leq b$.
:::
By construing the optimization module for the problem, we can find that algorithms for LP can be used to solve this problem —
+ Interior Point Method (Always solvable but slower)
+ Simplex Method (Not always solvable but faster)
## "Solving" a problem
The definition for a problem to be "solvable" is —
> **Given a input file with *$n$-bits*, <br> the running time is within any polynomial of $n$.**
It could be terrible on efficiency, but still is *solvalble* if there exists a program to.
# NP-Hardness
To **formally** prove that a problem cannot be solved, we try to prove that —
> The new problem is ==not simpler than an old problem== widely aggreed to be unsolvable.
## Reduction
The approach to prove the proposition is as following:
1. **Assume we can solve the new problem. (A.k.a., the problem is easier than NP-Hard)**
Since we can solve the problem, a solving program terminating within polynomial time exists. Let the interface be
```
Output NewProblem(Input1, Input2, ... Inputk);
```
2. **Based on the solving program, propose a new program solving the old problem using `NewProblem()`.**
```
Output OldProblem(Input1, Input2, ... Inputk){
// Code using NewProblem()
// Still in polynomial time
}
```
If we are able to construct the program, it indicates that the old problem is also solvable; however we know the old problem is unsolvable. Thus, it **proves our assumtion wrong —** indicating the new problem is unsolvable.
## Example: Using The *satisfiability* problem
The *satisfiability* problem, which is widly known as an NP-Hard problem, is defined as the following —
> Given a Boolean formula in [Conjunctive Normal Form(CNF)](https://www.wikiwand.com/en/Conjunctive_normal_form),
does there exist a set of input(s) that would output `true`?
+ **Input** — A CNF boolean formula
+ **output** — Yes / No
+ **Constraint** — "Yes" if the set of input exists, "No" if otherwise
Then, we define out problem —
> Given a boolean formula in CNF,
find a set of input(s) that would output `true`.
+ **Input** — A CNF boolean formula
+ **output** — Values for the set of input(s)
+ **Constraint** — If its possible to output `true`, output the set. If not, output any set of input(s).
Assuming there exists a solving program for our problem `Boolean b[] outProblem(Formula f);`. We are able to use it to solve `satisfiability()` —
```
bool satisfiability(Formula f){
Boolean set[] = outProblem(f);
if(output_of(f, set) == true){
return true;
}
return false;
}
```
How ever, we shouldn't able to do it since the satisfiability problem is NP-Hard. Thus, we can conclude that `Boolean b[] outProblem(Formula f);` doesn't exist and out problem is not simpler than the satisfiability problem.
## Example: The Hardness of the *Densest Subgraph* Problem
```graphviz
graph G{
rankdir = TB;
node[
shape = circle;
style = filled;
fillcolor = "#44663355"
fontname = "Roboto"
]
invis1, invis2[
style = "invis";
shape = point;
width = 0;
]
D -- invis1 -- invis2 -- B [style = invis; label = " "];
E -- D -- C -- A
A -- B -- C
{rank = same; E, D, invis1, invis2, B}
{rank = same; A, C}
}
```
Given a undireciited graph $G = \{V, E\}$ with no multiedges and a integer $k$, we want the find $S$, a subgraph of $G$ with $k$ vertices that ==contains the maximum amount of edges==.
We can establish the optimization model of the *densest subgraph problem* as following —
+ **Input** —
- Set $V$, the vertices
- Set $E \subseteq \big\{\{u, v\} : u,v \in V \big\}$, the edges
- Positive integer $k$
+ **Output** —
- $S \subseteq V$, some subset of vertices
+ **Objective Function** —
- Maximize $\Big\vert \{ e \in E : e \subseteq S \} \Big\vert$, <br>the subset of all edges connecting vertices in $S$
To prove that the problem is NP-Hard, we utilize another problem *k-clique* which is known to be NP-Hard —
+ **Input** —
- Set $V$
- Set $E \subseteq \big\{ \{u, v\} : u, v \in V \big\}$
- Integer $k$
+ **Output** —
- Yes / No
+ **Objective Function** —
- Yes, if there exists $S$ s.t. $|S| = k \land \forall u, v\,\in V, \{u, v\} \in E$
- No, if otherwise
```=
Set densestSubgraph(Set V, Set E, int k);
boolean kClique(Set V, Set E, int k){
Set S = densestSubgraph(V, E, k);
for([u, v] : S){
if(contains(E, [u, v]) == false){
return false;
}
}
return true;
}
```
Like above, assuming we can solve the densest subset problem, we will be able develop a program the *k-clique* problem; however *k-clique* should be un-solvable, thus ==the densest subsest problem is also NP-Hard==.
## Example: The *Product Selection* Problem
There are a series of product to select from to produce, each has its properties; some products produced from competiters; and a set of customers having their own preferences, who would purchase the cheapest product satifying them. We have to select a limited numbers of products to produce due to a given budget, and want to maximize the number of customers.
We can establish the optimization model for the problem —
+ **Input** —
- Positive integer $n$, number of proposed products
- Vectors of positive real nubers $P_1$, $P_2$, $\ldots P_n$, properties of proposed products
- Positive integers $p_1$, $p_2$, $\ldots p_n$, price of proposed products
- Positive integer $m$, number of proposed products from competitors
- Vectors of positive real nubers $Q_1$, $Q_2$, $\ldots Q_m$, properties of proposed products from competitors
- Positive integers $q_1$, $q_2$, $\ldots q_m$, price of proposed products from competitors
- Positive integer $d$, number of customers
- Vector of positive real numbers $C_1$, $C_2$, $\ldots C_d$, preference of customers
- Positive integer $k$, number of product to produce
+ **Output** —
- Set $S \subset \{1 \ldots n\}$, the set of products to procude
+ **Constraint** —
- $|S| = k$
+ **Objective Function** —
- Maximize $\Big\vert \{ i \in [1, d] : f_i < f_{i}^{'} \} \Big\vert$, <br> where our price to customer is $f$, and competitor price to customer is $f^{'}$
To prove the product selection problem is NP-Complete, we utilize the *K-most representative skyline operator* problem:
+ **Input** —
- Positive integer $n$
- Vectors of positive real numbers $P_1$, $P_2$, $\ldots P_n$
- Positive integer $c$
- Vectors of positive real numbers $C_1$, $C_2$, $\ldots C_n$
- Positive integer $k$
+ **Output** —
- $S \subseteq \{1 \ldots n\}$
+ **Constraint** —
- |S| = k
+ **Objective Function** —
- Maximize $\Big\vert \{ i \in [1, d] : \exists j \in S\; s.t.C_i \leq P_j\} \Big\vert$
For the *Product Selection* problem, of the situation where $m = 0$, non $f^{'}_i$ can be found so if $f_i$ exists, we choose it. Thus, we can rewrite the objective function as the following, indicating the equivalant —
$$
\Big\vert \{ i \in \{1,2,\ldots d\} : \exists j \in S\; s.t.C_i \leq P_j\} \Big\vert
$$
Which supports the construction of the following program —
```=
Set productSelection(
int n, Vector[] P, double[] p,
int m, Vector[] Q, double[] q,
int d, Vector[] C, int k
);
Set kMostRepresentativeSkyline(int n, Vector[] P, int d, Vector[] C, int k){
Double p[] = any(Double[]);
return productSelection(n, P, p, 0, [], [], d, C, k);
}
```
Thus, we can conclude that the *Product Selection* problem is an NP-Hard problem.
## NP-Completeness
An NP-Complete problem is ==as difficult as *satisfiability*==. To prove a problem is NP-Complete, we must prove the both way —
1. The problem is not simpler than *satisfiability*
2. *Satisfiability* is not simpler than the problem
:::success
It doesn't have to be satisfiability, but any problem of the NP-Completeness class. A library of NP-Complete problems can be found [Here](https://www.wikiwand.com/en/List_of_NP-complete_problems), or from <br> [Computers and Intractability: A Guide to the Theory of NP-Completeness](https://www.wikiwand.com/en/Computers_and_Intractability)
:::
# Greedy Algorithms
## The *Simplified Knapsack* Problem
Given a some objects and their weights, pick only some to pack since we only have a limited capacity. Try to maximize the packed weight. The optimization model of the problem can be established as —
+ **Input** —
- Positive integer $n$, the number of objects
- Positive real numbers $w_1$, $w_2$, $\ldots w_n$, weights of each objects
- Positive real number $W$, the maximum capacity
> Assume $w$'s are in increasing order
+ **Output** —
- $S \subseteq \{1 \ldots n\}$, the set of chosen objects
+ **Constraint** —
- $\displaystyle \sum_{i \in S} w_i \leq W$, total weight must not exceed the maximum capacity
+ **Objective Function** —
- Maximize $\displaystyle \sum_{i \in S} w_i$, the total weight
The problem is knon to be NP-Hard, even with this simplified form.
### The Algorithm
A greedy algo can be used to approach the problem —
1. Take from the lightest object untill we overweight
2. Check among all chosen object, is the last one heavier than all others summed
3. Keep the heavier side
However, this algorithm ==does not obtain the optimal solution==.
:::spoiler Expand for mathamatical representation of the algo
$$
\begin{align}
\texttt{1:} &\;\;\;\; S \leftarrow \varnothing \\[1pt]
\texttt{2:} &\;\;\;\; \texttt{For}\;\; j = 1 \;\;\texttt{to} \;\;n \\[1pt]
\texttt{3:} &\;\;\;\;\;\;\;\;\;\;\;\;
\texttt{If}\;\; \textstyle \sum_{i \in S} \;\; w_i + w_j \leq W \\
\texttt{4:} &\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;
S \leftarrow S \cup {j} \\[1pt]
\texttt{5:} &\;\;\;\;\;\;\;\;\;\;\;\;
\texttt{Else:} \\[2pt]
\texttt{6:} &\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;
\texttt{If}\;\; w_j \geq \textstyle\sum_{i \in S}\;\;w_i\\
\texttt{7:} &\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;
S \leftarrow \{j\}\\
\texttt{8:} &\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;
\texttt{break}\\
\end{align}
$$
:::
## Approximation Algorithms
We define the following 2 terms:
+ **SOL** — The **object value**(result of the objecttive function) of some algorithm
+ **OPT** — The **optimal value**(object value of the optimal solution) of a optimal solution
If, for all imputs of a problem, some algorithm have —
$$
\mathrm{SOL} \gtreqqless \alpha \cdot \mathrm{OPT}
$$
We call it an **$\alpha$-approximation Algorithm**.
- When a problem is a **maximization problem**, we have $\mathrm{SOL} \geq \alpha \cdot \mathrm{OPT}$, where $\alpha < 1$
- When a problem is a **minimization problem**, we have $\mathrm{SOL} \leq \alpha \cdot \mathrm{OPT}$, where $\alpha > 1$
:::success
We can see the greedy approach of the *Simplified Knapsack* Problem is at least a 0.5-approximation algorithm.
On the last step, we devide some object having the sum weight $S > W$ into 2 partitions, and choose the heavier one. This means the solution is at least better then $W \over 2$.
Since OPT cannot be heavier then $W$, we know the solution is definitely better then $\mathrm{OPT} \over 2$.
:::
## The *Knapsack* Problem
In the original form of the *Knapsack* problem, each object are associated with a value. Instead of maximizing the sum of weight, we try to maximize the sum of value.
The optimization model of the problem can be established as —
+ **Input** —
- Positive integer $n$, the number of objects
- Positive real numbers $w_1$, $w_2$, $\ldots w_n$, weight of each objects
- Positive real numbers $v_1$, $v_2$, $\ldots v_n$, value of each objects
- Positive real number $W$, the maximum capacity
+ **Output** —
- $S \subseteq \{1 \ldots n\}$, the set of chosen objects
+ **Constraint** —
- $\displaystyle \sum_{i \in S} w_i \leq W$, total weight must not exceed the maximum capacity
+ **Objective Function** —
- Maximize $\displaystyle \sum_{i \in S} v_i$, the total packed value
### The Algorithm
A greedy algo can be used to approach the problem, where the difference between the previous algorithm is that, ==when we run out of capacity, we compare the total value instead of total weight==.
This algorithm is also proven to be a **0.5-approximation algorithm**.
:::spoiler Expand for the algorithm
$$
\begin{align}
\texttt{1:} &\;\;\;\; S \leftarrow \varnothing \\[1pt]
\texttt{2:} &\;\;\;\; \texttt{For}\;\; j = 1 \;\;\texttt{to} \;\;n \\[1pt]
\texttt{3:} &\;\;\;\;\;\;\;\;\;\;\;\;
\texttt{If}\;\; \textstyle \sum_{i \in S} \;\; w_i + w_j \leq W \\
\texttt{4:} &\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;
S \leftarrow S \cup {j} \\[1pt]
\texttt{5:} &\;\;\;\;\;\;\;\;\;\;\;\;
\texttt{Else:} \\[2pt]
\texttt{6:} &\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;
\texttt{If}\;\; w_j \geq \textstyle\sum_{i \in S}\;\;w_i\\
\texttt{7:} &\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;
S \leftarrow \{j\}\\
\texttt{8:} &\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;
\texttt{break}\\
\end{align}
$$
:::
## Bloom Filters
Consider a 2-layered database system: when you access the system to obtain some data,
1. The first layer tries to tell if the data is present. (Fast)
2. The second layer finds the data from the database. (Slow)
We call the first layer the **Filter**.
### Filters
Given the total number of data items is $m$, which is extremely huge.
==It is not possible to have an array `boolean arr[m]`==. In this case, we can implement a hash function on an array of size $k << w$. We set `arr[f(i)] = true` only when when some data item `i` is added into the database.
When checking some data's presence, we check if `arr[f(i)]` is `true`.
However, some **false positive** might occur when for two data point, `i`, `j` both has the same hash value (`f(i) == f(j)`), causing the system to enter the second layer when its unnecessary.
Suppose $n$ data items are already present, the probability of some data point `i` occuring a false positive is:
$$
1 - \left( 1 - \frac{1}{m} \right)^n
$$
Which as $n$ grows, gets unbearably near to 1.
### Bloom Filter
Instead of using a single hash function, we use multiple: `f_1(), f_2(), ..., f_p()`. When some data item `i` is added into the database, all `arr[f_1(i)], arr[f_2(i)], ..., arr[f_p(i)]` are set to `true`.
When checking some data's presence, we check if all `arr[f_1(i)], arr[f_2(i)], ..., arr[f_p(i)]` are `true`.
Suppose $n$ data items are already present, and $p$ differrent hash function is used, for some data point `i`, and some hash function `f_j()`, the probability of `arr[f_j(i)]` being faultly set to `true` is:
$$
1 - \left( 1 - \frac{1}{m} \right)^{np}
$$
> In total, the action "set some bit to true" occured $n \cdot p$ times.
Then, the probability of false positive occurs is:
$$
\left( 1 - \left( 1 - \frac{1}{m} \right)^{np} \right)^p
$$
> All $p$ hash function must fail.
Then, we can know that ==when we minimize the probability of false positive==,
$$
p = \displaystyle\frac{m}{n} \ln{2}
$$
## Adaptive Bloom Filter
Assume we know:
+ $P_i$ — The probability of a data item `i` present in the database
+ $p_i$ — How many bit we set to true when `i` is added
Let $S$ be the set of all data items present, then the probability for some `arr[j]` being not set is:
$$
\left( 1 - \frac{1}{m} \right)^{\sum_{i\in S}p_i}
$$
:::success
According to [Optimizing data popularity conscious bloom filters, PODC '08](https://dl.acm.org/doi/10.1145/1400751.1400798), the probability of false positive is minimized when for all `j`, the probability of `arr[j] == true` is $0.5$.
:::
Then, we can get:
$$
\color{white}{\Rightarrow} \left( 1 - \frac{1}{m} \right)^{\sum_{i\in S}p_i} = 0.5 \Rightarrow \left( \displaystyle\sum_{i \in S}p_i \right) \cdot \ln{\left( 1 - \frac{1}{m} \right)} = \ln{0.5}
$$
Since as $x \rightarrow 0$, we get $1 - x \approx e^{-x}$, implying
$$
\begin{align}
&\color{white}{\Rightarrow} \left( \displaystyle\sum_{i \in S}p_i \right) \cdot \ln{\left(e^{-\frac{1}{m}}\right)} = \ln{0.5} \\
&\Rightarrow \left( \displaystyle\sum_{i \in S}p_i \right) \cdot \left( -{1 \over m} \right) = \ln{0.5} \Rightarrow {1 \over m} \displaystyle\sum_{i \in S} p_i = -\ln{0.5} = \ln{2} \\
&\Rightarrow \displaystyle\sum_{i\in S} p_i = m\ln 2
\end{align}
$$
Since $S$ is unknown, we are not able to know the precise sum of all $p$; however, we do know the probability of $i \in S$ thus we aim for the expected value:
$$
\displaystyle\sum_{i \in S} P_i \cdot p_i = m \ln 2
$$
### The optimization model
For some `i` not in $S$, the probability of all $p_i$ bits true is $0.5^p_i$. If we check all data item for once, the expected number of false positives is:
$$
\displaystyle\sum_{i} 0.5^{p_i}
$$
Accordingly, we can have the optimization model:
+ **Input** —
- Real numbers $P_i \in [0, 1]$ for all data item `i`, probability of $i \in S$
- Positive integer $m$, size of database
+ **Output** —
- Positive integer $p_i$ for all data item `i`, number of bits to set true when `i` enters database
+ **Constraint** —
- $\displaystyle\sum_i P_i\cdot p_i = m \cdot \ln{2}$
+ **Objective Function** —
- Minimize $\displaystyle\sum_i (0.5)^{p_i}$
### Evaluate into the *KNAPSACK PROBLEM*
To deal with exponentials in the objective function, we can break it down into multiple terms. Let there be new variables $p_i^{(1)}, p_i^{(2)}, p_i^{(3)} \ldots$, where:
$$
p_i^{(j)} = \begin{cases}
1, &j > p_i \\
0, &\text{otherwise}
\end{cases}
$$
:::success
Effectly, it counts op until $p_i$. For example, if $p_5 = 3$, then,
$$
p_5^{(1)} = p_5^{(2)} = p_5^{(3)} = 1 \\ p_5^{(4)} = p_5^{(5)} = \ldots = 0
$$
:::
Then we could break the objective function into:
$$
\begin{align}
\displaystyle\sum_i (0.5)^{p_i} &= \displaystyle\sum_i\left( 1 - \displaystyle\sum_j 0.5^j\cdot p_i^{(j)} \right) \\
&= \displaystyle\sum_i 1 - \displaystyle\sum_i\displaystyle\sum_j0.5^j\cdot p_i^{(j)}
\end{align}
$$
Since the total number of data items $N = \sum_i 1$ is a constant, ==minimizing the optimization function is equvalant to maximizing==:
$$
\displaystyle\sum_i\displaystyle\sum_j0.5^j\cdot p_i^{(j)}
$$
----
Furthermore, let there be&emdash;
+ New variables $w_i^{(j)} = P_i$ for all $i, j$. (That is, $j$ makes no difference)
+ New variables $h_i^{(j)} = 0.5^j$ for all $i$. (That is, $i$ makes no difference.)
+ Set $Q = \left\{ p_i^{(j)}: p_i^{(j)} = 1 \right\}$
Since $\sum_j p_i^{(j)} = p_i$, we have
$$
\begin{align}
\displaystyle\sum_i P_i \cdot p_i &= \displaystyle\sum_i P_i \cdot \left( \displaystyle\sum_j p_i^{(j)} \right) = \displaystyle\sum_i \left( \displaystyle\sum_j P_i \cdot p_i^{(j)} \right) \\ &= \displaystyle\sum_i \displaystyle\sum_j w_i^{(j)} \cdot p_i^{(j)} \\ &= \displaystyle\sum_{p_i^{(j)} \in Q} w_i^{(j)}
\end{align}
$$
And we can further modify the objective function into:
$$
\displaystyle\sum_i\displaystyle\sum_j0.5^j\cdot p_i^{(j)}
= \displaystyle\sum_{p_i^{(j)} \in Q} h_i^{(j)}
$$
Now, let $W = m \cdot \ln{2}$. We now have the following optimization model:
+ **Input** —
- Positive integer $N$, number of total data items
- Real numbers $w_i^{(j)} for all $i, j$
- Positive real number $W$
- Positive real number $h_i^{(j)}$ for all $i, j$
+ **Output** —
- Set $Q \subseteq \left\{ p_i^{(j)} : 1 \leq i \leq N \right\}$
+ **Constraint** —
- $\displaystyle\sum_{p_i^{(j)}\in Q} w_i^{(j)} \leq W$
+ **Objective Function** —
- Maximize $\displaystyle\sum_{p_i^{(j)} \in Q} h_i^{(j)}$
:::success
Notice that the constraint has change from $=$ to $\leq$. This is because that it is unlikely we will get a sum less then $W$, since we aim to select as much item as possible, it would become $=W$ anyway.
:::
Now, as the problem is evaluated into the *KNAPSACK PROBLEM*, we could apply the 0.5-approximation algorithm to get how to pick the numbers for $p_i$.
# Deterministic Rounding Algorithms
## The *VERTEX COVER* problem
```graphviz
graph G{
rankdir = LR;
splines = false;
node[
shape = circle;
style = filled;
fillcolor = "#44663355"
fontname = "Roboto"
]
1 -- 3[color = invis]
1 -- 3
1 -- 3[color = invis]
3 -- 4
1 -- 2 -- 3
}
```
Given a graph $G = \left\{ V, E \right\}$ where $E \subseteq \left\{ \{u, v\} : u, v \in V \right\}$, we want so select a set of vertices that connects to all edges.
The optimization model can be described as:
+ **Input** —
- Set $V$, the set of vertices
- Set $E \subseteq \left\{ \{u, v\} : u, v \in V \right\}$, the set of edges.
+ **Output** —
- Set $S \subseteq V$, a subset of $V$
+ **Constraint** —
- $\forall\{u, v\} \in E : u \in S \lor v \in S$
+ **Objective Function** —
- Minimize $\left|S\right|$
### Rewriteinto vector form
Let there be variables $x_1, x_2 \ldots x_{\left|V\right|}$, where
$$
x_u = \begin{cases}
1: u \in S \\
0: u \notin S
\end{cases}
$$
Then, we can change the output to $\mathbf{x} = \left[ x_1, x_2 \ldots x_{\left|V\right|}\right]^\mathrm{T}$ instead of $S$. As for $\left|S\right|$, we know $\left|S\right| = \sum_ix_i$.
Since we know $x_i$ is either $1$ or $0$, we can instead write it as, where $\mathbf{c} = \left[ 1, 1 \ldots 1 \right]^\mathrm{T}$
$$
\left| S \right| = \mathbf{c}^\mathrm{T}\mathbf{x}
$$
For some edge $\left\{ u, v \right\}$ being selected, its equvalent to that $x_u + x_v \geq 1$ must be satisfied. That is, we are able to rewrite the constrains into the form of $A\mathbf{x} \geq \mathbf{b}$:
$$
\begin{bmatrix}
1 & 1 & \cdots & 0 \\ 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1
\end{bmatrix}
\begin{bmatrix}
x_1 \\ x_2 \\ \vdots \\ x_{\left| V \right|}
\end{bmatrix} \geq
\begin{bmatrix}
1 \\ 1 \\ \vdots \\ 1
\end{bmatrix}
$$
:::success
For each row, there is two entries set to $1$, and the others being $0$, representing all the edges $\{u, v\} \in E$
:::
Thus, we can rewrite the optimization model as the following:
+ **Input** —
+ Matrix $A$
+ Vector $\mathbf{b} = \left[1, 1 \cdots 1\right]^\mathrm{T}$
+ Vector $\mathbf{c} = \left[1, 1 \cdots 1\right]^\mathrm{T}$
+ **Output** —
- Vector $\mathbf{x}$, where $\forall x_u : x_U \in \{0, 1\}$
+ **Constraint** —
- $A\mathbf{x} \geq \mathbf{b}$
+ **Objective Function** —
- Minimize $\mathbf{c}^{\mathrm{T}}\mathbf{x}$
## Solving *VECTOR COVER* with *FRACTIONAL VERTEX COVER*
The previous optimization model is almost in a linear programming form, though since $x \in \{0, 1\}$ we couldn't simply use a solving library like CPLEX to do it.
Consider another problem, *FRACTIONAL VERTEX COVER* problem, where the only difference is that we allow $x_u$ to be any real number between $0$ and $1$:
+ **Input** —
+ Matrix $A$
+ Vector $\mathbf{b} = \left[1, 1 \cdots 1\right]^\mathrm{T}$
+ Vector $\mathbf{c} = \left[1, 1 \cdots 1\right]^\mathrm{T}$
+ **Output** —
- Vector $\mathbf{x}$, where $\forall x_u : 0 \leq x_u \leq 1$
+ **Constraint** —
- $A\mathbf{x} \geq \mathbf{b}$
+ **Objective Function** —
- Minimize $\mathbf{c}^{\mathrm{T}}\mathbf{x}$
We can use this problem to help develop a approximation algorithm for the *VERTEX COVER* problem.
## Rounding
Consider the following algorithm for *VERTEX COVER*:
$$
\begin{align}
&\texttt{1.} \qquad \mathbf{x} = \texttt{solveFrationalVertexCover()} \\
&\texttt{2.} \qquad \texttt{forall}\quad x_i \quad\texttt{in}\quad \mathbf{x} \\
&\texttt{3.} \qquad\quad \texttt{if}\quad x_i \geq0.5 \\
&\texttt{4.} \qquad\quad\quad y_i \leftarrow 1 \\
&\texttt{5.} \qquad\quad \texttt{else} \\
&\texttt{6.} \qquad\quad\quad y_i \leftarrow 0 \\
&\texttt{7.} \qquad \texttt{return} \quad \mathbf{y}
\end{align}
$$
**1. The output satisfies the constraint of *VERTEX COVER***. The constraint of *FRATIONAL VERTEX COVER* states that:
$\forall \{u, v\} \in E,$
$$
\begin{align}
&\color{white}{\Rightarrow\quad }x_u + x_v \geq 1 \\
&\rightarrow \quad x_u \geq 0.5 \lor x_v \geq 0.5 \\
&\rightarrow \quad y_u = 1 \lor y_v = 1 \\
&\rightarrow \quad y_u + y_v \geq 1 \\
\end{align}
$$
Which is the constrain of *VERTEX COVER*.
**2. For any input, $\mathrm{SOL} \leq 2 \cdot \mathrm{OPT}$.**
Let
- $\mathbf{x} = [0, 1]^n$ be the optimal solution for *FRACTIONAL VERTEX PROBLEM*, and $\mathbf{x}^* = \{0, 1\}^n$ be the optimal solution for *VERTEX COVER*. Since
+ $\mathbf{x}$ is constructed from a bigger set
+ Both $\mathbf{x}$ and $\mathbf{x}^*$ are under the same constraint
We are more oppurtunity to minimize the objective function. Thus we know, $\mathbf{c}^\mathrm{T}\mathbf{x} \leq \mathbf{c}^\mathrm{T}\mathbf{x}^*$.
----
By ==rounding== in line 3 - 6 in the algorithm, we have $\forall i : y_i \leq 2\cdot x_i$. Thus,
$$
\begin{cases}
\mathrm{SOL} = \mathbf{c}^\mathrm{T}\mathbf{y}
= \displaystyle\sum_i y_i \\
\displaystyle\sum_i y_i \leq 2 \cdot \displaystyle\sum_i x_i \\
2 \cdot \displaystyle\sum_i x_i = 2 \cdot \mathbf{c}^\mathrm{T}\mathbf{x} \\
2 \cdot \mathbf{c}^\mathrm{T}\mathbf{x} \leq 2 \cdot \mathbf{c}^\mathrm{T}\mathbf{x}^* \\
2 \cdot \mathbf{c}^\mathrm{T}\mathbf{x}^* = 2 \cdot \mathrm{OPT}
\end{cases} \Rightarrow \mathrm{SOL} \leq 2 \cdot \mathrm{OPT}
$$
Proving the algorithm being a ==2-approximation algorithm for *VERTEX COVER*==.
# Randomized Algorithms
## The *SET COVER* Problem
<div style="display: flex; align-items: flex-start; justify-content: space-between; margin: 20px 0 30px">
<div>
+ **Input** —
- Set $V = \{1, 2 \ldots n\}$
- Set $E \subseteq 2^V$
+ **Output** —
- Set $S \subseteq V$
+ **Constraint** —
- $\forall e \in E : S \cap e \neq \varnothing$
+ **Objective Function** —
- Minimize $\left\vert S\right\vert$
</div>
<div style="border-left: 1px solid black;">
+ **Input** —
+ Matrix $A$
+ Vector $\mathbf{b}, \mathbf{c}$, both are $\left[1, 1 \cdots 1\right]^\mathrm{T}$
+ **Output** —
- Vector $\mathbf{x} = [x_1, x_2 \ldots x_n]^\mathrm{T}$, where $\forall u : 0 \leq x_u \leq 1$
+ **Constraint** —
- $A\mathbf{x} \geq \mathbf{b}$
+ **Objective Function** —
- Minimize $\mathbf{c}^{\mathrm{T}}\mathbf{x}$
</div>
</div>
It is simular to *VERTEX PROBLEM*, where we change the set of edges into the a set of subsets. ==We still can transform it into the same matrix form.==
This time, for each set in $E = \{e_{1,1}, e_{1,2} \ldots e_{1, k}\}$, one row in $A$ have those entries set to $1$, others to $0$.
:::spoiler Expand for example
$$
\begin{cases}
\left\vert V \right\vert &= 5 \\
E &= \{\{1, 3, 4\}, \{1, 2\}, \{2, 5\}, \{3, 4, 5\}
\end{cases} \\[15px]
\Downarrow \\[15px]
A = \begin{bmatrix}
1 & 0 & 1 & 1 & 0 \\
1 & 1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 & 1 \\
0 & 0 & 1 & 1 & 1
\end{bmatrix}
$$
:::
Thus, the optimization model can rather be written as the RHS.
### Deterministic Rounding is not good enough
**It is not enough to just make use of the *VERTEX COVER*'s algorithm**. Say we only need $x_1 + x_2 + x_3 \geq 1$, then, the solition might be $\left[ \frac{1}{3}, \frac{1}{3}, \frac{1}{3} \right]^\mathrm{T}$, which rounding gets us $[0, 0, 0]^\mathrm{T}$, not satisfying the constraints of *SET COVER*.
Let the largest set in $E$ has the size of $k$. Using the fact that:
$$
x_{u_1} + x_{u_2} + \cdots + x_{u_m} \geq 1 \geq \frac{m}{k} \implies \exists i : x_{u_i} \geq \frac{1}{k}
$$
We then modify the algorithm so that, when we round the solition of *FRACTIONAL SET COVER*, we set $y_u = 1$ when $x_u \geq \frac{1}{k}$. By rounding in the algorithm, we have $\forall i : y_i \leq k \cdot x_i$. Thus,
$$
\begin{cases}
\mathrm{SOL} = \mathbf{c}^\mathrm{T}\mathbf{y}
= \displaystyle\sum_i y_i \\
\displaystyle\sum_i y_i \leq k \cdot \displaystyle\sum_i x_i \\
k \cdot \displaystyle\sum_i x_i = k \cdot \mathbf{c}^\mathrm{T}\mathbf{x} \\
k \cdot \mathbf{c}^\mathrm{T}\mathbf{x} \leq k \cdot \mathbf{c}^\mathrm{T}\mathbf{x}^* \\
k \cdot \mathbf{c}^\mathrm{T}\mathbf{x}^* = k \cdot \mathrm{OPT}
\end{cases} \Rightarrow \mathrm{SOL} \leq k \cdot \mathrm{OPT}
$$
Thus we know the algorithm is a $k$-approximation algorithm, ==but sometimes a terrible one==. Assume a problem instance with $k = n$, the algorithm that simply returns $\mathbf{x} = [1, 1, \cdots 1]^\mathrm{T}$ is also a $n$-approximation algorithm, since $\mathrm{OPT}$ must at least has some $x_u = 1$.
## Randomized Algorithms
If for some algorithm, its execution variation is controlled by a probability distribution, we call it a **Randomized Algorithm**.
By running a randomized algorithm many times, we can almost gurantee in some runs there is the optimal solution.
## Randomized Rounding
Try combining the rounding process with randomizing. We try different settings and evaluate its performance.
### First Setting
For the new randomized rounding, we generate $\mathbf{y} = \left[ y_1, y_2, \cdots y_n \right]^\mathrm{T}$ by:
$$
\begin{cases}
P(y_i = 1) = x_i \\
P(y_i = 0) = 1 - x_i
\end{cases}
$$
For each set $E_i \in E$, there will be constraint requiring $s = \displaystyle\sum_{u \in E_i} x_u \geq 1$, implying the failing probility of a single constraint being:
$$
\prod_{u \in E_i} (1 - x_u) \leq \prod_{u in E_i} e^{-x_u} = e^{-\left(s\right)} \leq e^{-1} \leq 0.37
$$
So we know the probability of satisfying each constraint is 0.63. ==It is not good even when the number of constraint is only slightly big==.
### Second Setting
For the new randomized rounding, we generate $\mathbf{y} = \left[ y_1, y_2, \cdots y_n \right]^\mathrm{T}$ by:
$$
\begin{cases}
P(y_i = 1) = 1 - (1 - x_i)^t \\
P(y_i = 0) = 1 - (1 - (1 - x_i)^t) = (1 - x_i)^t
\end{cases}
$$
For each set $E_i \in E$, there will be constraint requiring $s = \displaystyle\sum_{u \in E_i} x_u \geq 1$,
implying the failing probility of a single constraint being:
$$
\prod_{u \in E_i} (1 - x_u)^t \leq \prod_{u \in E_i} e^{-x_ut} = e^{-ts} \leq e^{-t}
$$
Thus we can know the probability of having some constraint fail is at most $e^t \cdot \vert E \vert$.
----
Next, we aim to ==reduce the probability of some constraint fail to at most $1 \over 4$==.
$$
\begin{align}
e^{-t} \cdot \vert E \vert \leq \frac{1}{4}
&\Rightarrow e^{-t} \leq \frac{1}{4 \cdot \vert E \vert} \\
&\Rightarrow \ln e^{-t} \leq \ln{\frac{1}{4\cdot \vert E \vert}} \\
&\Rightarrow -t \leq -\ln{4 \vert E \vert} \\
&\Rightarrow t \geq \ln{\vert E \vert} + \ln{4}
\end{align}
$$
----
Finally, we end with the following algorithm:
$$
\begin{align}
&\texttt{1.} \qquad \mathbf{x} = \texttt{solveFrationalVertexCover()} \\
&\texttt{2.} \qquad \texttt{while(} \;\; y \;\; \texttt{doesn't satisfy constraint)}\\
&\texttt{3.} \qquad \quad\texttt{forall}\quad x_i \quad\texttt{in}\quad \mathbf{x} \\
&\texttt{4.} \qquad\quad\quad y_i \leftarrow 1 \quad\texttt{with probability}\quad 1 - (1 - x_i)^{\ln{\vert E \vert} + \ln{4}}\\
&\texttt{5.} \qquad \texttt{return} \quad \mathbf{y}
\end{align}
$$
:::success
**The Markov Inequality**
For some random variable $X$ with expected value $\mu(X)$,
$$
P(X > a) \leq \frac{\mu(X)}{a}
$$
**The Burnolli's Inequality**
For some $x, r \in \mathbb{R}, x \leq -1, r \leq 0, x \neq 0, r \neq 1$,
$$
(1 + x)^r \geq 1 + rx
$$
:::
The expected value of $\mathrm{SOL}$ is:
$$
\begin{align}
\mu(\mathrm{SOL}) = \mu(\sum_{i \in V}y_i) &= \sum_{i \in V}\left\{1 \cdot \left[ ( 1 - (1 - x_i)^{\ln{\vert E \vert} + \ln{4}} ) \right] + 0 \cdot (1 - x_i)^{\ln{\vert E \vert} + \ln{4}}\right\} \\
&= \sum_{i \in V}\left[ ( 1 - (1 - x_i)^{\ln{\vert E \vert} + \ln{4}} ) \right] \\
\color{#C0C0C0}{1 - (1 - x_i)^{\ln{\vert E \vert} + \ln{4}} \leq(\ln{\vert E \vert} + \ln{4})x_i\qquad} &\leq \sum_{i \in V}\left[ \left( \ln{\vert E \vert} + \ln{4} \right)x_i \right] \\
&= \left( \ln{\vert E \vert} + \ln{4} \right) \sum_{i \in V} x_i \\
&\leq \left( \ln{\vert E \vert} + \ln{4} \right)\cdot \mathrm{OPT}
\end{align}
$$
By the Markov inequality, the probability of the solution being far from optimal is low.
$$
P(\mathrm{SOL} \geq 4 \cdot \mu(\mathrm{SOL}) \color{#C8C8C8}{= 4 \cdot \left( \ln{\vert E \vert} + \ln{4} \right) \cdot () \mathrm{OPT}}) \leq {1 \over 4}
$$
### Summary
+ We have $\geq 0.5$ chance to get a solution that ,
- Satisfies all constraints
- Not far from optimal ($\leq 4 \cdot (\ln{\vert E \vert} + \ln{4}) \cdot \mathrm{OPT}$)
## Principal Component Analysis
PCA is a technique to reduce dimension among records. By lowering the dimension, it simplfies calculation extracting information using machine learning algorithms.
+ **Input** —
- Matrix $X$, has $n$ rows (number of records) and $d$ columns (number of dimensions).
+ **Output** —
- Vector $\mathbf{v} = [v_1, v_2, \cdots v_n]^\mathrm{T}$
+ **Constraint** —
- $\displaystyle\sum_{i = 1}^n v_i^2 = 1$
+ **Objective Function** —
- Maximize $\mathbf{v}^{\mathrm{T}}X^{\mathrm{T}}X\mathbf{v}$
The above optimization model ==is solvable by semi-definite programming==.
### Sparse PCA
However, when we reduce dimensions, if we combine too many into a simgle dimension, it might be to difficult to understande the characteristic of the result dimension. Thus, we tend to limit the number of choosen dimensions (number of non-zero terms in $\mathbf{v}$).
Thus leads to the following optimization model, being NP-Hard:
+ **Input** —
- Matrix $X$, has $n$ rows (number of records) and $d$ columns (number of dimensions).
- integer $k$
+ **Output** —
- Vector $\mathbf{v} = [v_1, v_2, \cdots v_n]^\mathrm{T}$
+ **Constraint** —
- $\displaystyle\sum_{i = 1}^n v_i^2 = 1$
- $\left\vert\{ v_i : v_i \neq 0 \}\right\vert \leq k$
+ **Objective Function** —
- Maximize $\mathbf{v}^{\mathrm{T}}X^{\mathrm{T}}X\mathbf{v}$
### Approximation Algorithm From Relaxed Sparse PCA
*SPARSE PCA* can be solved using a randomized approximation algorithm, first by solving the relaxed version which is solvable by semi-definite programming:
+ **Input** —
- Matrix $X$, has $n$ rows (number of records) and $d$ columns (number of dimensions).
- integer $k$
+ **Output** —
- Vector $\mathbf{u} = [u_1, u_2, \cdots u_n]^\mathrm{T}$
+ **Constraint** —
- $\displaystyle\sum_{i = 1}^n u_i^2 = 1$
- $\displaystyle\sum_{i = 1}^n \vert u_i \vert \leq \sqrt{k^\color{white}{|}}$
+ **Objective Function** —
- Maximize $\mathbf{u}^{\mathrm{T}}X^{\mathrm{T}}X\mathbf{u}$
From the result, we define the following probability to get $\mathbf{v}$ from $\mathbf{u}$, using some parameter $s \geq 1$:
$$
\begin{cases}
\displaystyle P(v_i = \frac{u_i}{p_i}) &= p_i = \min\left\{\frac{s \cdot u_i}{\sum_{j = 1}^n u_j}, 1\right\} \\
\displaystyle P(v_i = 0) &= 1 - p_i
\end{cases}
$$
----
Then,
+ the expected value of number of non-zero entries is:
$$
\mu(\left\vert\left\{v_i : v_i \neq 0\right\}\right\vert) = \sum_{i = 1}^n p_i \leq \frac{s \cdot \displaystyle\sum_{j = 1}^n |u_j|}{\displaystyle\sum_{j = 1}^n |u_j|} = s
$$
+ The expected value of square sum of the vector entries is:
- When $p_i = \frac{s \cdot u_i}{\sum_{j = 1}^n u_j}$,
$$
\begin{align}
\frac{u_i^2}{p_i} = \frac{u_i^2}{s \cdot \vert u_i \vert} \cdot \sum_{j = 1}^n \vert u_j \vert = \frac{\vert u_i \vert}{s} \cdot \sum_{j = 1}^n \vert u_j \vert \leq \frac{\vert u_i \vert}{s}\left[ \vert u_i \vert + \sum_{j = 1}^n \vert u_j \vert \right] &= \frac{u_i^2}{s} + \frac{\vert u_i \vert}{s}\sum_{j = 1}^n \vert u_j \vert \\
& \leq u_i^2 + \frac{\vert u_i \vert}{s}\sqrt{k^{\color{white}{|}}}
\end{align}
$$
- When $p_i = 1$,
$$
\frac{u_i^2}{p_i} = u_i^2 = 1
$$
- Summing both up,
$$
\begin{align}
\mu(\displaystyle\sum_{i = 0}^{n} v_i^2) &= \displaystyle\sum_{i = 0}^{n} \mu(v_i^2) = \displaystyle\sum_{i = 0}^{n} \left( \frac{u_i}{p_i} \right)^2 \cdot p_i = \displaystyle\sum_{i = 0}^{n} \frac{u_i^2}{p_i} \\
&= \sum_{p_i = \frac{s \cdot u_i}{\sum_{j = 1}^n u_j}}\frac{u_i^2}{p_i} + \sum_{p_i = 1} \frac{u_i^2}{p_i} \\
&= \sum_{p_i = \frac{s \cdot u_i}{\sum_{j = 1}^n u_j}} \left[ u_i^2 + \frac{\vert u_i \vert}{s}\sqrt{k^{\color{white}{|}}} \right] + \sum_{p_i = 1} u_i^2 \\
&\leq \displaystyle\sum_{i = 0}^{n} u_i^2 + \displaystyle\sum_{i = 0}^{n} \frac{\vert u_i \vert}{s}\sqrt{k^{\color{white}{|}}} \\
&= 1 + \frac{\sqrt{k^{\color{white}{|}}}}{s}\displaystyle\sum_{i = 0}^{n} \vert u_i \vert \leq 1 + \frac{k}{s}
\end{align}
$$
----
Now, we have two contradicting requirements:
1. The expected number of non-zero entries, which we must keep low, is $s$
2. The expexpected value of square sum of $\mathbf{v}$, which must keep low, is $1 + \frac{k}{s}$
According to researchs, $s = 200k$ is a good solution.
## Additive Approximation Ratio
From researches, the above randomized algoritm for *SPARSE PCA* can satisfy
$$
\mathrm{SOL}\geq \mathrm{OPT} - 1
$$
The form $\mathrm{SOL}\geq \mathrm{OPT} - \beta$ is called the **Additive Approximation Ratio**. It is rarely used since it ==cannot provide a good guarantee for any input==.
# Inapproximability
To prove some $\alpha$-approximate algorithm for some optimization model is impossible, we can try to modify the optimization model on prove it unsolvawbel, in such way that, if the new optimization model is solvable, it would become the $\alpha$-appriximation algorithm for the problem.
## $1.9999$-Approximation Algorithm for *VERTEX COVER*
The optimation model of *VERTEX COVEDR* (LHS) can be rewrite as the RHS, where only the optimal solution is feasible:
<div style="position: relative; display: flex; align-items: flex-start; justify-content: space-between; margin: 20px 0 30px">
<div>
+ **Input** —
- Set $V = \{1, 2 \ldots n\}$
- Set $E \subseteq \left\{ \{u, v\} : u, v \in V \right\}$
+ **Output** —
- Set $S \subseteq V$
+ **Constraint** —
- $\forall \{u, v\} \in E : u \in S \lor v \in S$
+ **Objective Function** —
- Minimize $\left\vert S \right\vert$
</div>
<div style="position: relative; display: block; background-color: #C0C0C0; width: 1px; height: 100%;"></div>
<div>
+ **Input** —
- Set $V = \{1, 2 \ldots n\}$
- Set $E \subseteq \left\{ \{u, v\} : u, v \in V \right\}$
+ **Output** —
- Set $S \subseteq V$
+ **Constraint** —
- $\forall \{u, v\} \in E : u \in S \lor v \in S$
- $\forall \{u, v\} \in E : u \in S^{'} \lor v \in S^{'}$
- $\forall S^{'} : \vert S \vert \leq \vert S^{'} \vert$
+ **Objective Function** —
- None
</div>
</div>
We then can easily get change the constraints into where the solution must be within $1.9999$ times of the optimal solution. The algorithm is unproven if its unfeasable.
+ **Input** —
- Set $V = \{1, 2 \ldots n\}$
- Set $E \subseteq \left\{ \{u, v\} : u, v \in V \right\}$
+ **Output** —
- Set $S \subseteq V$
+ **Constraint** —
- $\forall \{u, v\} \in E : u \in S \lor v \in S$
- $\forall \{u, v\} \in E : u \in S^{'} \lor v \in S^{'}$
- $\forall S^{'} : \vert S \vert \leq 1.9999 \cdot \vert S^{'} \vert$
+ **Objective Function** —
- None
## 1.9999-Approximation Algorithm for *K-CENTER*
*K-CENTER* and *K-MEDIAN* are two common optimization for clustering data. The optimization model of *K-CENTER* is as following:
+ **Input** —
- Integer $n$, the number of data points
- $\forall i, j : d(i, j)$, distance function between all pairs of data points
- Integer $k$, the number of clusters to find
+ **Output** —
- Set $S \subseteq \{1, 2, \cdots n\}$, The set of cluster centers
+ **Constraint** —
- $\vert S \vert = k$
+ **Objective Function** —
- Minimize $\displaystyle\max_{i \in S}\left\{ \displaystyle\min_{j \in S}\left\{ d(i, j) \right\}\right\}$, the max distance from some data point to cluster center.
We can rewrite the optimization model so that solutions must fall under $1.9999 \cdot \mathrm{OPT}$:
+ **Input** —
- Integer $n$, the number of data points
- $\forall i, j : d(i, j)$, distance function between all pairs of data points
- Integer $k$, the number of clusters to find
+ **Output** —
- Set $S \subseteq \{1, 2, \cdots n\}$, The set of cluster centers
+ **Constraint** —
- $\vert S \vert = k$
- $\vert S^{'} \vert = k$
- $\forall S^{'} : \displaystyle\max_{i \in S}\left\{ \displaystyle\min_{j \in S}\left\{ d(i, j) \right\}\right\} \leq 1.9999 \cdot \displaystyle\max_{i \in S^{'}}\left\{ \displaystyle\min_{j \in S^{'}}\left\{ d(i, j) \right\}\right\}$
+ **Objective Function** —
- None
### Proving NP-Hard for *1.999-APPROXIMATION K-CENTER*
Using the *DOMINATING SET* problem which is known to be NP-Hard, we can achieve the prove. The optimization model of *DOMINATING SET* is as:
+ **Input** —
- Set $V$, the set of vertices
- Set $E \subseteq \left\{ \{u, v\} : u, v \in V \right\}$, the set of edges
- Positive integer $k$, max size of set of chosen vertices
+ **Output** —
- Yes / No
+ **Constraint** —
- Yes, if $\exists S \subseteq V :$
- $\vert S \vert \leq k$
- $\forall u \in V : \exists v \in S : v = u \lor \{u, v\} \in E$
- No, otherwise
+ **Objective Function** —
- None
If *1.999-APPROXIMATION K-CENTER* is solvable, we can cunstrut the following program to solve *DOMINATING SET*:
```
Set approxKCenter1.9999(int n, double d[][], int k);
bollean doiminatingSet(Set V, Set E, k){
double d[V.size()][V.size];
// Setup distance function
for(int i = 0; i < V.size(); ++i){
for(int j = 0; j < V.size; ++i}{
if(i == j){
d[i][j] = 0;
}
else if(E.contain({i, j})){
d[i][j] = 1;
}
else{
d[i][j] = 2;
}
}
}
Set S = approxKCenter1.9999(V.size(), d, k);
if(S.maxCenterDistance() <= 1 ){
return true;
}
else{
return false;
}
}
```
But *DOMINATING SET* should not be solvable, thus proving *1.999-APPROXIMATION K-CENTER* is also unsolvable.
## The *RG-TOSS* Problem
The problem aims to find a subset os size from vertices, where each selected vertices has probability providing some value to elements within another set, so that the sum of values can be maximized. The optimization model is as following:
+ **Input** —
- Set $V$, the set of vertices
- Set $E \subseteq \left\{ \{u, v\} : u, v \in V \right\}$, the set of edges
- Set $T$, the set of value target
- Value function $\forall t\in T, v\in V : f(t, v)$, the value given to $t$ when $v$ is selected
- Integer $k$, the number of chosen vertices each vertex must at least have an edge with
- Integer $s$, the number of vertices we must choose
+ **Output** —
- $S \subseteq V$
+ **Constraint** —
- $\forall v \in S : \left\vert \left\{ \{u, v\} \in E : u \in S \right\} \right\vert \geq k$
- $\vert S \vert = s$
+ **Objective Function** —
- Minimize $\displaystyle\sum_{t \in T} \displaystyle\sum_{v \in S} p(t, v)$
Not only finding the optimal solution is NP-Hard, just finding one feasable solution is NP-Hard.
> Refert the lecture document to find why any $\alpha$-approximation algorithm is impossible.
# Online Algorithms
## The *SKI RENTAL* problem
We try to decide the best strategy on purchasing a ski pass. One day pass costs $400$, and a yearly pass costs $9000$. We cannot predict will we go ski in the future. The optimization model can be described as:
+ **Input** —
- An array $\mathbf{g} = \{g_1, g_2 \cdots g_365\} \in \{0, 1\}^{365}$, $g_i = 1$ if we go ski on day $i$.
+ **Output** —
- An array $\mathbf{t} = \{t_1, t_2 \cdots c_365\} \in \{400, 9000, 0\}^{365}$
+ **Constraint** —
- $g_i = 1 \implies \left(t_i = 0 \iff \exists j < i : t_j = 9000\right)$
- ==$t_i$ must be decided based on only $g_1, g_2 \cdots g_i$==
+ **Objective Function** —
- Minimize $\displaystyle\sum_{i = 1}^{365} t_i$
The highlight part implies the optimization model requires an **online algorithm** to solve.
## Competitive Ratio
The **Competitive Ratio** describes how effective our online algorithm is. In online algorithms, we define $\mathrm{SOL}_i$ and $\mathrm{OPT}_i$ as the objective value and optimal value on step $i$.
$$
\textrm{competitive ratio} = \max_{i = 1}^{365}{\frac{\mathrm{SOL}_i}{\mathrm{OPT}_i}} \\
\forall i \in \mathbb{Z}^+, \quad\textrm{competitive ratio} \geq \frac{\mathrm{SOL}_i}{\mathrm{OPT}_i}
$$
## The *Operating Server* Problem
We want to decide a strategy booting/shuting server in a datacenter according to current and past demands. Cost happens when a server is running, also booting it costs extra. We want to minimize the cost. The problem can be described as the folloting optimization model:
+ **Input** —
- An array $\mathbf{d} = \{d_1, d_2 \cdots d_n\} \in \{ 0 \} \cup \mathbb{N}$, demanding in ecah time step.
- $c \in \{ 0 \} \cup \mathbb{R}$, cost of operating server
- $b \in \{ 0 \} \cup \mathbb{R}$, cost of booting server
+ **Output** —
- An array $\mathbf{s} = \{s_1, s_2 \cdots s_n\} \in \{ 0 \} \cup \mathbb{N}$
+ **Constraint** —
- $s_i \geq d_i$
- ==$s_i$ must be decided based on only $d_1, d_2 \cdots d_i$==
+ **Objective Function** —
- Minimize $c \cdot\displaystyle\sum_{i = 1}^{n} s_i + b \cdot \displaystyle\sum_{i : s_{i - 1} < s_i} (s_i - s_{i - 1})$
One viable online algorithm is as following, which is ==2-competitive==:
```
FOR i := 1 TO n
IF d[i + 1] = s[i];
s[i + 1] := d[i + 1];
count[j] = d[i = 1];
ELSE
FOR j := 1 TO n
IF d[i + 1] <= j < s[i]
count[j] += c;
ELSE
count[j] = 0;
s[i + 1] = MIN(j) WHERE count[j] > b;
```