# Group Assignment 2.4-2.5
## 2.4 (A-Question)
Consider a problem when the input is a stream of $m$ inequalities of the form “$ai ≠ aj$” for some $1 ≤ i < j ≤ n$. Design an analyze an algorithm that assigns value $0$ or $1$ to each of the variables $a_1, . . . , a_n$ so that the number of inequalities that are satisfied are at least $m/2$ in expectation.
For example, if the input stream is ($a_1 ≠ a_2$, $a_1 ≠ a_3$, $a_2 ≠ a_3$), then assigning $a_1 = a_2 = 1$ and $a_3 = 0$ will satisfy two inequalities (all but the first inequality).
Your algorithm should use $O(poly log(n) + poly log(m))$ space. You can assume that n and m are known in advance. After the algorithm finishes reading the stream, it should output the values of $a_1, . . . , a_n$ and the number of satisfied inequalities.
Hint: Try to solve the problem when there is no space limit first. How can you adapt your solution to use less space?
### Solution
**Algorithm:**
Create a Cater-Wegman hash function by generating a random prime $p \in \{m,...,2m\}$, hash to a 1 bit number.
Assign values by assigning each value the hash of their index: $a_i = h(i)$ for all $1 \le i \le m$.
Assign a counter to keep count of satisfied equalities.
For each inequality, check if it is satisfied. If it is: add 1 to counter.
Return assignment and assigned values.
**Input:** A stream of inequalities $s$, two integers $m$ and $n$.
**Output:** An assignment of values to $a_1,...,a_n$ and the number of inequalities satisfied.
```python
h(a,b,p,x): # Cater-Wegman hash
return ((ax+b)mod p)mod 2
Inequalities(s,m,n):
p <- random prime from {m,...,2m}
a <- random integer from {1,...,p-1}
b <- random integer from {0,...,p-1}
counter <- 0
for each inequality (i,j) in s:
if h(a,b,p,i) ≠ h(a,b,p,j):
counter <- counter + 1
for each value i from 1 to m:
output h(a,b,p,i) #print the assginment a_i = h(i)
return counter
```
**Lemma 1:**
The probability of an inequality being satisfied is at least 0.5.
**Proof:**
The inequality $a_i \ne a_j$ is equivalent to $h_{ab}(i) \ne h_{ab}(j)$ after our assignment.
Notice $i \ne j$. The hash function used is universal which means $p(h_{ab}(i) = h_{ab}(j)) \le \frac{1}{n}$ where $n$ is the bound on the generated number. In this case we use $n=2$ since we want either 0 or 1.
$$p(h_{ab}(i) = h_{ab}(j)) \le \frac{1}{2}
\implies p(h_{ab}(i) \ne h_{ab}(j)) \ge \frac{1}{2}
\implies p(a_i \ne a_j) \ge \frac{1}{2}$$
**Lemma 2:**
$E(\text{# of satisfied inequalities}) \ge 0.5m$
**Proof:**
By lemma 2, each inequality has at least probability 0.5 of being satisfied.
Let $X_i$ be 1 if inequality $i$ is satisfied and 0 otherwise, for all $1 \le i \le m$.
$$E(X_i) \ge 0.5$$
$$E(\text{# of satisfied inequalities}) = E(\sum_{i=1}^m X_i) = \sum_{i=1}^m E(X_i) \ge 0.5m$$
**Lemma 3:**
The space required is $O(polylog(n)+polylog(m))$
**Proof:**
The counter will maximum be $m$, if all inequalities are satisfied. This requires $\log m$ bits.
$p$, $a$, and $b$, are all bounded by $2m$. They therefore require $\log 2m$ bits.
In total we require $\log m + 3 \log 2m \in O(polylog(n)+polylog(m))$ bits.
## 2.5 (D-level or E-level)
In this problem you will work out the details behind the last slide on the Misra-Gries algorithm (in the lecture on Nov 14, where we covered the Misra-Gries algorithm, we skipped this).
In particular, the objective is to show that the Misra-Gries algorithm can be adjusted to also output estimates of the count of each item. Recall that the basic Misra-Gries algorithm outputs a list of $k$ values $x_1, . . . , x_k$ and is guaranteed to contain all values that appear more than $m/(k+1)$ times in the input.
Show how to adjust the algorithm to also output $k$ frequency estimates $g_1, . . . , g_k$ such that it is guaranteed that $f_i − m/(k + 1) ≤ g_i ≤ f_i$ where $f_i$ is the exact number of times that $x_i$ appears in the input stream.
### Solution
**Input:** A positive integer $k$ and a finite sequence $s$ with values $x_1,x_2,...,x_m$ for a positive integer $m$
**Output:** An associative array $G$ where $G[i]$ is the frequency estimate for the value $x_i$ ($1 \le i \le m$)
**Pseudocode:**
```
ModifiedMisraGries(s, k) =
let G be hashmap
for each x in s:
if x is in keys(G):
G[x] <- G[x] + 1
else if |keys(G)| < k:
G[x] <- 1
else:
for each y in keys(G):
G[y] <- G[y] - 1
if G[y] = 0:
remove y from keys(G)
return G
```
^[Inspired by Wikipedia pseudocode. https://en.wikipedia.org/wiki/Misra%E2%80%93Gries_summary.]
**Claim:** $f_i − \frac{m}{k + 1} \le G[i] \le f_i$
**Proof:** The statement is equivalent to proving that any outputted counter $G[a]$ can at most have been decremented or not incremented when it *should have* at most $\frac{m}{k + 1}$ times.
To clarify: $G[a]$ can only be decremented when the current element of the iteration $x_i \neq a$ is being considered or *not incremented* when $x_i=a$ is considered (but $a$ is not in $G$ yet) at most $\frac{m}{k + 1}$ times, since then its recorded frequency $G[a]$ will be at most $f_i - \frac{m}{k + 1}$ where $f_i$ is its actual frequency.
To not increment $G[a]$ when $x_i=a$ is considered or to decrement $G[a]$ when $x_i\neq a$ is considered is equivalent to executing the last else statement (independent of whether or not $x_i=a$), otherwise some counter will be incremented by one.
The last else statement is only executed if $|keys(G)| \geq k$. This only holds when each element $G[i] \geq 1$, therefore we find that it can only be executed if $\sum G[i] \geq k$. Otherwise if $|keys(G)| < k$ we find that for each iteration of the loop it increments $\sum G[i]$ with one. We can look at this as if $\sum G[i]$ is always incremented by one after reading an element; let us call this the *default* case. If the last else statement is executed then we can see it as $\sum G[i]$ will be decremented by $k+1$, as it decrements each value in $G$ (which has a size of $k$) and does not add a value (therefore decrementing by $k+1$ compared to the *default* case).
The final else statement can at most be executed every $k+1$ iterations, since we need to add at least $k$ elements before we can remove any of them ($\sum G[i]\geq k$ in order to reach the last else statement). Due to there being $m$ iterations we have the following:
$$
c \cdot (k + 1) \leq m \Leftrightarrow c \leq \frac{m}{k + 1}
$$
where $c$ is the number of total calls to the else function, and thus also the upper bound for the number of decrementations and missed incrementations of any one counter $G[i]$.
We also know the upper bound for the counted frequency. If it is not decremented, or not *not added*, any times then the counted frequency will be equal to its real frequency $f_i$.
The lower bound for $G[i]$ is thus $f_i - \frac{m}{k+1}$ since we decrement it or miss an incrementation a maximum of $\frac{m}{k + 1}$ times. Therefore, we have the following:
$$
f_i - \frac{m}{k+1} \leq G[i] \leq f_i
$$
QED.
<!-- Alldeles för långt men jag vill ha godkänt, feel free to fixa det ni känner för. -->