# Complexity classes

[xkcd #287](https://xkcd.com/287/)
> If P=NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in “creative leaps,” no fundamental gap between solving a problem and recognizing the solution once it’s found. Everyone who could appreciate a symphony would be Mozart; everyone who could follow a step-by-step argument would be Gauss; everyone who could recognize a good investment strategy would be Warren Buffett.
*[Scott Aaronson, 2006](https://scottaaronson.blog/?p=122)*
## Problems as languages
Recall, how we identified (decision) problems with languages, without loss of generality, we can work over $\Sigma = \{0,1\}$: a decision problem $P$ corresponds to corresponds to a formal language $L_P \subseteq \Sigma^*$ with:
$$w \in L_P \iff \text{$P(w)$ is true}$$
Some problems are algorithmically decidable (e.g. "Does a given DFA accept a given word?"), while others aren't ("Does a given TM halt on itself?").
If a problem is decidable, how much work does it take to find an answer? This is where [complexity theory](https://en.wikipedia.org/wiki/Complexity_theory) comes in.
## P vs. NP
Complexity theory provides a "zoology" for the computational (in)tractability of problems via [complexity classes](https://en.wikipedia.org/wiki/Complexity_class).
Important ones are:
* the class $\mathrm{P}$: all problems solvable by a **deterministic** Turing machine (DTM) in polynomial time,
* the class $\mathrm{NP}$: all problems solvable by a **non-deterministic** Turing machine (NTM) in polynomial time.
What does this amount to?
### 1. Polynomial time
Polynomial time means the class of functions in $\mathcal O(n^k)$, for $n$ the input length and some exponent $k \in \mathbb N$. The idea is that the runtime should be described by a polynomial function:
$$f(n) = a_k n^k + a_{k-1} n^{k-1} + \ldots + a_1 n + a_0 = \sum_{i=0}^k a_i n^i$$
In particular, recall that $\mathcal O(\sum_{i=0}^k a_i n^i) = \mathcal O(a_k n^k) = \mathcal O(n^k)$.
### 2. DTMs vs. NTMs
Intuitively, while a DTM has no freedom of choice for its next moves, an NTM does. Moreover, an NTM can "guess" the next move for an exponential number of possible transitions, check each of them in polynomial time "in parallel":

Formally, we define:
$$\mathrm{P} = \{ L \subseteq \{0,1\}^* \; | \; \text{there is a DTM $\mathcal M$ with $L(\mathcal M) = L$ and runtime $T(\mathcal M) \in \mathcal O(p(n))$ for some polynomial $p(n)$}$$
$$\mathrm{P} = \{ L \subseteq \{0,1\}^* \; | \; \text{there is an NTM $\mathcal M$ with $L(\mathcal M) = L$ and runtime $T(\mathcal M) \in \mathcal O(p(n))$ for some polynomial $p(n)$}$$
Any DTM is a special kind of NTM (one where no actual choice is possible), thus $\mathrm{P} \subseteq \mathrm{NP}$. What about the reverse inclusion: does $\mathrm{NP} \stackrel{?}{\subseteq} \mathrm{P}$, hence $\mathrm{NP} \stackrel{?}{=} \mathrm{P}$?
This is a long-standing open problem, see S. Aaronson's quote above!
Empirically, this is generally expected to be **false**, but to this day we don't have a proof!

Intuitively, problems in $\mathrm{P}$ are quick to solve, while problems in $\mathrm{NP}$ are quick to **check**. They tend to be not quick to solve, which is why they are thought of as **intractable** problems, meaning inefficient to solve.
## Examples of problems
| problems in $\mathrm{P}$ | problems in $\mathrm{NP}$ |
| -------- | -------- |
| sorting | traveling salesperson (TSP) |
| Rubik's cube | Integer factorization (see cryptography)
| multiplication | Sudoku
| ... | Protein folding |
| ... | ...
## Reduction of problems and $\mathrm{NP}$-completeness
How to show that some $L$ is in $\mathrm{NP}$? The idea is to use a *reduction argument*: Given problems/languages $L_1$, $L_2$ with $L_1 \notin \mathrm{P}$ we could try to **reduce** $L_1$ to $L_2$, then $L_2 \notin \mathrm{P}$. We want the **reduction** to be be doable by a polynomial-time algorithm.
If there exists such a reduction from $L_1$ to $L_2$, we write $L_1 \leq_{\mathrm{P}} L_2$.
Among the problems in $\mathrm{NP}$, the **$\mathrm{NP}$-complete** problems have a special role:
:::info
**Definition ($\mathrm{NP}$-completeness).** We call a problem $L$ **$\mathrm{NP}$-complete** if:
1. $L \in \mathrm{NP}$
2. For every $L' \in \mathrm{NP}$ we have $L' \leq_{\mathrm{P}} L$ (we say, $L'$ is **$\mathrm{NP}$-hard**).
:::
:::info
**Theorem.** Assume that $P_1$ is $\mathrm{NP}$-complete, $P_2$ is in $\mathrm{NP}$, and $P_1 \leq_{\mathrm{P}} P_2$, then $P_2$ is also $\mathrm{NP}$-complete.
:::
## $\mathrm{SAT}$
A fundamental $\mathrm{NP}$-complete problem is $\mathrm{SAT}$, the **boolean satisfiability problem**. It asks: given a formula from boolean propositional logic, is it satisfiable?
### Boolean propositional logic
Boolean propositional formulas can be used to model logic states of systems such as circuits, networks, databases,...
We work over an (infinite) alphabet $\mathrm{Var}$ of propositional variables $p,q,r,\ldots$ that can take the values "true" and "false", denoted by $1$ and $0$, resp.
:::info
**Definition (Boolean formulas).**
A **formula** is an expression of one of the following forms:
1. $\varphi \lor \psi$ (**disjunction/"or"**), for formulas $\varphi, \psi$
2. $\varphi \land \psi$ (**conjunction/"and"**), for formulas $\varphi, \psi$
3. $\lnot \varphi$ (**negation/"not"**)
4. $p$ (**variables**)
5. $\mathrm{b} \in \{0,1\}$ (**truth values**)
:::
:::info
**Definition (Valuation).**
A **valuation** or **assignment** is a function $\mathfrak{J} : \mathrm{Var} \to \{0,1\}$. Any valuation acts on the formulas by defining:
1. $\mathfrak{J}(\varphi \lor \psi) = \max{(\mathfrak{J}(\varphi) , \mathfrak{J}(\psi))}$
2. $\mathfrak{J}(\varphi \land \psi) = \min{(\mathfrak{J}(\varphi), \mathfrak{J}(\psi))}$
3. $\mathfrak{J}(\lnot \varphi) = 1-\mathfrak{J}(\varphi)$
4. $\mathfrak{J}(p) = p$
5. $\mathfrak{J}(\mathrm{b}) = \mathrm{b}$
:::
The truth tables for the logical connectives are given as follows:
| $p$ | $q$ | $p \lor q$ | $p \land q$ | $\lnot p$
| -------- | -------- | -------- | -------- | --------
| $0$ | $0$ | $0$ | $0$ | $1$
| $0$ | $1$ | $1$ | $0$ | $1$
| $1$ | $0$ | $1$ | $0$ | $0$
| $1$ | $1$ | $1$ | $1$ | $0$
We define **implication** by $\varphi \to \psi := (\lnot \varphi \lor \psi)$, and **equivalence** by $\varphi \leftrightarrow \psi := (\varphi \to \psi) \land (\psi \to \varphi)$.
We call two formulas $\varphi$ and $\psi$ **equivalent**, written $\varphi \equiv \psi$, iff for all assignments it holds that:
$$\varphi^\mathfrak J = 1 \iff \psi^\mathfrak J = 1$$
One can show that the following laws hold:
1. Distributivity:
$\varphi \land (\psi \lor \xi) \equiv (\varphi \land \psi) \lor (\varphi \land \xi)$
$\varphi \lor (\psi \land \xi) \equiv (\varphi \lor \psi) \land (\varphi \lor \xi)$
2. De Morgan's laws:
$\lnot (\varphi \land \psi) \equiv \lnot \varphi \lor \lnot \psi$
$\lnot (\varphi \lor \psi) \equiv \lnot \varphi \land \lnot \psi$
3. Double negation elimination:
$\lnot \lnot \varphi \equiv \varphi$
:::info
**Example.**
Define the evaluation $\mathfrak{J}$ by $\mathfrak{J}(p) := 0$ and $\mathfrak{J}(q) := 1$. Consider the formulas:
* $\varphi := \lnot p \lor q$
* $\psi := \lnot p \land (p \lor \lnot q)$
Under $\mathfrak J$ the formulas evaluate to:
* $\varphi^\mathfrak J = \max(1,1) = 1$
* $\psi^\mathfrak J = \min(1,0) = 0$
:::
The formula $\xi := p \land (\lnot p \lor q) \land \lnot q$ is, in fact, **not** satisfiable, in the sense that there exists no assignment $\mathfrak J$ such that $\xi^\mathfrak J = 1$ (why?).
The problem $\mathrm{SAT}$ is defined as the language of (the codings of) all formulas $\varphi$ which are **satisfiable**, i.e., for which there exists an assignment $\mathfrak J$ such that $\varphi^\mathfrak J = 1$
:::info
**Theorem (Cook--Levin).**
$\mathrm{SAT}$ is $\mathrm{NP}$-complete.
:::
$\mathrm{SAT}$-solvers are used to solve a range of various [problems](https://en.wikipedia.org/wiki/SAT_solver#Applications). Because $\mathrm{SAT}$ is known to only be in $\mathrm{NP}$ rather than $\mathrm{P}$, it is a generally infeasible problem. Hence, developing efficient $\mathrm{SAT}$-[solvers](https://en.wikipedia.org/wiki/SAT_solver) is an active and challenging area.
If one could show that $\mathrm{SAT}$ were in $\mathrm{P}$, then this would [show](https://en.wikipedia.org/wiki/P_versus_NP_problem) $\mathrm{P} = \mathrm{NP}$; a [Millenium problem](https://en.wikipedia.org/wiki/Millennium_Prize_Problems) with a $1M reward, and many consequences for questions of algorithm efficiency across the board!
## Conjunctive normal form
For algorithmic purposes such as $\mathrm{SAT}$-solving, it is often convenient to assume the input formula to be in a special format, called [**conjunctive normal form (CNF)**](https://en.wikipedia.org/wiki/Conjunctive_normal_form), i.e., a conjunction of disjunctions of literals $\ell_{ij}$:
$$ \varphi = \bigwedge_{i} \bigvee_j \ell_{ij}, $$
where a literal $\ell_{ij}$ is either a variable $p$ or the negation $\lnot p$ of a variable $p$.
For instance, the formula $\psi = \lnot p \land (p \lor \lnot q)$ is in CNF. Another example is $\vartheta := (p \lor \lnot q) \land (\lnot p \lor q \lor r) \land p$.
One can show that every boolean formula can be brought into CNF preserving the set of satisfying assignments.