Simons HD'20 Proposed Polymath Problems

$\newcommand{\N}{\mathbb{N}}$ $\newcommand{\E}{\mathbb{E}}$ $\newcommand{\e}{\varepsilon}$ $\newcommand{\R}{\mathbb{R}}$ # Simons HD'20 Proposed Polymath Problems ## Intro Many thanks to all the contributors of candidate polymath open problems. Now it's your turn, everybody else! Each post below contains one open problem which we could work on this semester, polymath-style -- that is, in an open and massively-collaborative style where as a large group we share partial progress, failed attempts, and so on, as we go after a solution. (See e.g. the discussion [here](https://www.nature.com/articles/461879a) of the original polymath project.) Voting instructions: use the :thumbsup: emoji on any open problem you would be interested in participating in solving. You can vote for as many problems as you like! The deadline for voting is midnight (00:00) this Thursday, September 3. ## Cotypes of Projective Tensor Product Spaces *Contact: Elisabeth Werner* Let $X$ be a Banach space. We say $X$ has *type* $p$ for $1 \leq p \leq 2$ if there is a constant $T$ such that for all $n \in \N$ and all $x_1,\ldots,x_n \in X$, $$ \mathbb{E}_{\epsilon_1,\ldots,\epsilon_n} \left \| \sum_{i=1}^n \epsilon_i x_i \right \| \leq T \cdot \left ( \sum_{i=1}^n \|x_i\|^p \right )^{1/p} $$ where $\epsilon_1,\ldots,\epsilon_n$ are independent Rademacher random variables (i.e., uniform on $\{ \pm 1\}$). The infimum over all such $T$ is the *type-$p$ constant of $X$*, denoted $T_p(X)$. Similarly, $X$ has *cotype* $q$ for $2 \leq q \leq \infty$ if there is $C$ such that for all $n \in \N$ and all $x_1,\ldots,x_n \in X$, $$ C \cdot \mathbb{E}_{\epsilon_1,\ldots,\epsilon_n} \left \| \sum_{i=1}^n \epsilon_i x_i \right \| \geq \left ( \sum_{i=1}^n \|x_i\|^q \right )^{1/q} \, . $$ The supremum over such $C$ is the cotype-$p$ constant, $C_p(X)$. Type and cotype capture fundamental geometric properties of a Banach space. For instance, a Banach space has type $2$ and cotype $2$ if and only if it is a Hilbert space (Kwapien). A Banach space has cotype $< \infty$ if and only if it does not uniformly contain $\ell_\infty$, and similarly has type $> 1$ if and only if it does not uniformly contain $\ell_1$ (Maurey-Pisier, Pisier). If $X,Y$ are Banach spaces, then their *projective tensor product* is a Banach space $X \otimes_\pi Y$ where for $A \in X \otimes_\pi Y$ we define the norm $$ \|A\|_\pi = \inf \left \{\sum_{i=1}^m \|x_i\|_X \|y_i\|_Y \, : \, A = \sum_{i=1}^m x_i \otimes y_i \right \} \, . $$ It is known that $\ell_2 \otimes_\pi \ell_2$ has cotype $2$ (Tomczak-Jaegermann), and similarly for $\ell_1 \otimes_\pi \ell_1$. For $p \geq 2$, $\ell_p \otimes_\pi \ell_p$ has cotype $p$ (Pisier, Lust-Piquard). **Problem 1:** What is the cotype of $\ell_p \otimes_\pi \ell_p$ for $1 < p < 2$? **Problem 2:** What is the behavior for triple (or higher) tensor products? Concretely: does $\ell_2 \otimes_\pi \ell_2 \otimes_\pi \ell_2$ have finite cotype? Some results on cotype for $\ell_p \otimes_\pi \ell_q \otimes_\pi \otimes \ell_r$ for specific choices of $p,q,r$ found by Briët, Naor, Regev. ## Universality for Semidefinite Optimization *Contact: Sidhanth Mohanty* Famously, Parisi predicted and Talagrand proved the following result concerning ground states of the Sherrington-Kirkpatrick spin glass -- also known to computer scientists as the Max-$2$-XOR problem on Gaussian-weighted graphs: $$ \max_{x \in \{ \pm 1\}^n} \frac{\langle G, xx^\top \rangle}{n} = 1.5235\ldots \pm o(1) \text{ whp for $G$ with independent entries from } \mathcal{N}(0,1/n)\, . $$ The matrix $G$ does not even have to be Gaussian -- (Carmona-Hu) show that the same holds under only the a bounded $3$rd moment condition on the entries of $G$. (When the distribution of the entries is independent of $n$, which we do not necessarily want to assume below.) Replacing the hypercube $\{\pm 1\}^n$ by its standard SDP relaxation, the set of PSD matrices $X \in \mathbb{R}^{n \times n}, X \succeq 0$, with $X_{ii} = 1$, it is known (Montanari-Sen, implicit) that $$ \max_{X} \frac{\langle X, G \rangle}{n} = 2 \pm o(1) \text{ whp } $$ again for $G$ in a large universality class -- in fact, $G$ can even be the (centered) adjacency matrix of a sparse random graph with slightly super-constant degree. **Problem:** Let $S \subseteq \mathbb{R}^{n \times n}$ be a convex subset of the PSD matrices. Can we impose weak conditions on $S$ and still obtain a universality result for $\max_{X \in S} \langle X,G\rangle / n$? For a concrete conjecture, we could consider the following conditions: 1. $X_{ii} = 1$ for all $i \in [n]$ 2. If $X \in S$ then $\Pi X \Pi^\top \in S$ for all permutation matrices $\Pi$ 3. $S$ contains $xx^\top$ for each $x \in \{ \pm 1\}^n$. The motivation behind these particular conditions is that they allow for $S$ to be the degree-$r$ Sum of Squares relaxation of the hypercube $\{\pm 1\}$, for any degree $r$. A sufficiently strong universality result would then allow one to prove Sum of Squares upper/lower bounds in a Gaussian setting and transfer them to combinatorial optimization problems -- Max-Cut, Max-$2$-XOR, etc. ## Rapid Mixing of Glauber Dynamics for the Sherrington-Kirkpatrick Model *Contact: Andrea Montanari* Consider the Sherrington-Kirkpatrick spin glass model at inverse temperature $\beta > 0$ -- that is, the Gibbs distribution over $\{\pm 1\}^n$ given by $\mu(x) \propto \exp \{ \tfrac \beta 2 \langle x, A x \rangle \}$ where $A \sim \text{GOE}(n)$; i.e. $A$ is symmetric and for $i \leq j$ we have $A_{ij} \sim \mathcal{N}(0,1/n)$ iid. Glauber dynamics is the following Markov chain on $\{ \pm 1\}^n$. At each step, in state $x$, choose a coordinate $i \in [n]$ uniformly at random, then sample a bit $b \in \{ \pm 1\}$ from the marginal distribution of the $i$-th cooridnate under $\mu$, conditioned on the remaining coordinates $x_{-i}$. Transition to the state $x'$ given by $x'_{-i} = x_{-i}$ and $x'_i = b$. **Problem 1:** Show that Glauber dynamics mixes rapidly in the $\ell_2$ sense for any $\beta \leq 1$. That is, show that for any initial condition $\mu_0$ for this Markov chain, if $\mu_t$ is the distribution after $t$ steps of Glauber dynamics, then $\|\mu_t - \mu\|_{L_2} \leq A e^{-ct}$ for some positive constants $A,c$, where $\|f\|_{L_2} = (\mathbb{E}_{x \sim \{ \pm 1\}^n} f(x)^2)^{1/2}$. A bit more technically, for this one should prove the following Poincarè inequality for the Gibbs distribution $\mu$. For all $f \, : \, \{\pm 1\}^n \rightarrow \mathbb{R}$, $$ \text{Var}_\mu(f) \leq C \cdot \mathbb{E}_{x \sim \mu} \sum_{i=1}^n (\mathbb{E}_{y \sim \mu}[f(y) \, | \, x_{-i} = y_{-i}] - f(x))^2 \, , $$ for some $C$ independent of $n$. This was recently shown for all $\beta \leq 1/4$ by Eldan, Koehler, and Zeitouni. The goal is to extend this to $\beta \leq 1$, which should be tight. **Problem 2 (suggested by Ronen Eldan):** Show that for $\beta > 1$, one can partition $\{ \pm 1\}^n$ into a small number of sets, called "pure states", such that restricted to each pure state, Glauber dynamics mixes rapidly, either in the $L_2$ sense above or perhaps in a weaker sense. Note that when $\beta \rightarrow \infty$ (in which case we are the optimization rather than the sampling setting), there does exist an efficient algorithm to find a near-ground state, due to Montanari. So when $\beta > 1$ there may be something "in between" efficient-sampleability and computational intractability going on. ## Data Processing for Correlated Samples from the Hypercube *Contact: Thomas Courtade* Fix $\alpha \in (0,1)$. Let $X = (X_1, \dots, X_n)$ and $Y = (Y_1, \dots, Y_n)$ each be uniformly distributed on $\{0,1\}^n$, with $\Pr\{X_i \neq Y_i\} = \alpha$ for all $1 \leq i \leq n$. **Problem:** For any $f: \{0,1\}^n\rightarrow \{0,1\}$, we have $I(f(X); Y) \leq 1-h_2(\alpha)$, where $I$ is the mutual information and $h_2(\alpha)$ is the binary entropy function. Both are computed with respect to the base-2 logarithm. This was conjectured by Courtade and Kumar, 2014. As a start, one could try to show that it suffices to consider balanced boolean functions $f$ -- even this is not known. The function $f(X) = X_1$ saturates the conjectured inequality, and the Gaussian version of the inequality is a consequence of Borell's isoperimetric inequality, as observed by Kindler, O'Donnell, and Witmer. ## Jointly Sampling a Random CSP and a Random Satisfying Assignment *Contact: Prasad Raghavendra* Fix a constraint satisfaction problem -- for instance, $3$SAT. **Problem:** Design an algorithm which takes $n,m \in \mathbb{N}$ and outputs a pair $(\phi,x)$, where $\phi$ is a random $3$SAT (or other CSP) instance with $m$ clauses and $n$ variables and $x$ is a random satisfying assignment to $\phi$. For this to make sense, $m,n$ should be chosen below the satisfiability threshold -- where whp a random $\phi$ has satisfying assignments. For $m,n$ such that "quiet planting" is possible -- that is, planted and un-planted random CSP models are $o(1)$-close (in, say, total-variation distance) -- this goal is achieved by quiet planting. Morally, quiet planting should be possible before condensation, when the second moment method suffices to analyze the partition function of the model. *Can we design algorithms which go beyond the quiet planting regime?* ## Verifying Communities *Contact: Prasad Raghavendra* Studying algorithms which *verify* solutions to computational problems -- as opposed to *finding* solutions -- has been enormously fruitful in the worst-case complexity theory, where it leads to the theory of NP-completeness. Can it pay off for average-case problems as well? What follows is just one concrete problem in this direction. Consider the $k$-community stochastic block model on graphs of average degree $d$. To sample a graph from the block model, first sample a random partition of $[n]$ into $k$ parts. Then, include the edge $\{i,j\}$ in your graph $G$ with probability $\tfrac d n (1 + \varepsilon(\mathbf{1}(i,j \text{ in same community}) - \tfrac 1k))$, for some parameter $\varepsilon > 0$. The well-studied *community detection* problem asks to recover an estimate of the communities given a sample from the block model. Famously, this problem appears to be polynomial-time solvable if (and only if) the parameters $d,\e,k$ lie above the *Kesten-Stigum threshold*, which requires $d > k^2 / \e^2$ (based on evidence from statistical physics as well as performance algorithms based on low degree polynomials). **Problem:** Is *community verification* easier than *community detection?* Design a polynomial-time algorithm which takes an $n$-vertex graph $G$ and a partition $V_1,\ldots,V_k$ of $[n]$ into $k$ parts, such that if $G$ was sampled from the block model with communities $V_1,\ldots,V_k$ the algorithm outputs YES (with high probability), but if $G$ is sampled from the Erdos-Renyi model $G(n,d/n)$ then with high probability *for all partitions $V_1,\ldots,V_k$* the algorithm outputs NO. ## Higher-Degree SoS Relaxations for Independent Set in Sparse Random Graphs -- Evidence for Computational Phase Transitions *Contact: Prasad Raghavendra* Sum of Squares semidefinite programs are among the most powerful algorithmic tools we have for challenging optimization problems. This makes SoS-based algorithms a useful proxy for the class $P$ of polynomial time algorithms. To study the *lower bounds* side of conjectured computational phase transitions, one approach is therefore to rule out the existence of SoS-based algorithms in conjectured hard regimes -- although technically challenging, it is sometimes within reach, for instance for random CSPs (Grigoriev, Kothari-Mori-O'Donnell-Witmer), planted clique (Barak-Hopkins-Kelner-Kothari-Moitra-Potechin), and the Sherrington-Kirkpatrick model (Kunisky-Bandeira, Mohanty-Raghavendra-Xu, Ghosh-Jeronimo-Jones-Potechin-Rajendran). However, current techniques for proving such lower bounds break down for some of the most interesting random optimization problems with (conjectured) computational phase transitions -- optimization problems on sparse random graphs, especially with $2$-wise predicates (rather than $\geq 3$-wise, such as in the case of $3$-SAT, etc.). Here is one problem in this direction: **Problem:** Let $\alpha_i(G) \in [0,1]$ be the (normalized) value of the degree-$i$ SoS semidefinite program for independent set on a graph $G$. If $G$ is a random $n$-node $d$-regular graph, is $\E \alpha_2(G) = \E \alpha_{10}(G)$? More generally, is $\E \alpha_i(G)$ the same for all constant $i$ as $n \rightarrow \infty$? ## Diameter of Random (Gaussian) Polytopes *Contact: Daniel Dadush* Given a polytope $P$ in $n$ variables and $m$ constraints, which we can write as $$P = \{x\in\mathbb{R}^n : Ax \leq b\}$$ we are interested in the *diameter* of $P$, i.e., the maximum length of the shortest path between any two $v,w\in P$ traversing edges of $P$. A classical conjecture, due to Hirsch, was that this diameter is always $m-n$. This was disproven, but it remains possible that this diameter is always at most polynomial in $n$ and $m$. On the other hand, the best upper bounds we know are $\operatorname{diam}(P) = O(m\cdot 2^n)$ due Barnette and Larman, in the 70's, improved upon in the '90s by Kalai and Kleitman, who showed that $\operatorname{diam}(P) = m^{O(\log n)}$. Improving on these bounds in general is likely to be really hard, but we can consider the average-case version, for *random* (Gaussian) polytopes: specifically, let $P=\{x\in\mathbb{R}^n : Ax \leq \textbf{1}\}$, for a random matrix $A\in\mathbb{R}^{m\times n}$ with i.i.d. $\mathcal{N}(0,1)$ entries. The conjecture is that under this model, there actually is an algorithm, the *shadow simplex method*, which will obtain polynomial-length paths between any two vertices of the polytope. **Shadow simplex method:** Given two vertices $v,w$ of $P$, choose two objective functions $c,d$ which are maximized at each vertex, respectively: $$ v = \arg\!\max_{x\in P} \langle c, x\rangle, \qquad w = \arg\!\max_{x\in P} \langle d, x\rangle $$ and then, the shadow simplex method linearly interpolates between these two objectives functions by rotating $c$ into $d$, to obtain a sequence of maximizing faces: $$ \arg\!\max_{x\in P} \langle (1-\lambda) c+ \lambda d, x\rangle, \quad \forall \lambda\in[0,1] $$ This sequence generically induces a vertex-edge path. Some evidence that the conjecture is true: it is known (Borgwardt, 1980s) that (1) if the polytope is a random Gaussian polytope as above, and (2) instead of fixing $v,w$ one starts with fixed objectives $c,d$ (so, flipping the problem around), then the shadow path $c\leadsto d$ has length $O(n^{3/2}\sqrt{\log m})$ (as $m\to \infty$). **Problem:** Prove that the shadow simplex algorithm finds $poly(n,m)$-length paths in random polytopes, and hence the diameter of a random polytope is $poly(n,m)$. ## Algorithmic Lower Bounds on the Storage Capacity of the Binary Perceptron *Contact: Lenka Zdeborova* Consider the *binary perceptron* with $n$ variables and $\alpha n$ halfspaces, defined as follows: $m = \alpha n$ random unit vectors $w_1,\ldots,w_{m}$ are drawn independently, then we consider the set of *solutions* $x \in \{ \pm 1\}^n$ such that for all $i\leq m$ we have $\langle x, w_i \rangle \geq K$ for some *threshold* $K \in \R$. The *capacity* of the perceptron is the critical $\alpha_c = \alpha_c(K)$ such that for $\alpha > \alpha_c(K)$ there are no solutions (whp), and solutions do exist for $\alpha < \alpha_c$. (Note that even the existence of such a sharp threshold has only recently been shown rigorously, by Xu.) The location of $\alpha_c(0)$ can be predicted by the replica method to be $\alpha_c \approx 0.833\ldots$. Recent work by Ding and Sun shows that the lower bound side of this prediction is correct, rigorously -- i.e. $\alpha_c \geq 0.833$. A matching upper bound is still missing -- the best rigorous result known is $\alpha_c \leq 0.9973$, due to Kim and Roche. An *algorithmic* lower bound on the capacity is an efficient algorithm which takes $w_1,\ldots,w_m$ and finds a solution $x$. The best known algorithmic lower bound is due to Kim and Roche, who give an algorithm for $\alpha \geq 0.005$. (They note that their techniques might be extendable to $\alpha \geq 0.3$.) Interestingly, statistical physics suggests that for any $\alpha > 0$, the solution set $x$ exhibits "frozen $1$RSB" -- meaning that the solutions lie in an exponential number of clusters, each containing $2^{o(n)}$ solutions, and clusters have $\Omega(1)$ Hamming distance to one another. Such solution structures are (conjecturally) hard for sampling algorithms (i.e. algorithms for sampling a random solution). However, the algorithmic lower bound of Kim and Roche already shows that this is not an obstacle for the goal of finding a single solution, and heuristic/numerical work by Baldassi, Ingrosso, Lucibello, Saglietti, Zecchina suggests that for $\alpha$ as large as $0.75$ there may be a small number of atypical but easy to find solutions. **Problem:** Give an algorithm (and rigorous analysis) which finds a solution to the binary perceptron for $\alpha > 0.005$. Perhaps "replicated message passing" as proposed by Zecchina et al is a candidate algorithm, as well as MCMC methods. Is there a separation in which $\alpha$'s are algorithmically achievable by linear-time as opposed to, say, quadratic time algorithms? ## David Gamarnik's Problems See pdf from David