AlgStats Course Overview, Parts II and III

Part II: Weak Assumptions

"The world is a simple MRF/BayesNet" is a relatively strong assumption. What do we do when we are not in the epistemic position to make such strong assumptions? Can try to perform weaker tasks.

Mean Estimation

Recall in Gaussian setting learning mean and covariance implies learning in TV. Even if don't want to assume Gaussian, learning the mean of a distribution still meaningful. Why? Here's one example:

Well-Conditioned Linear Regression: Imagine getting samples of the form \((X,y)\) where \(X \in R^d\), \(y \in R\), \(Cov(X) = I\), and \(y = X^\top \beta + \epsilon\) for some random variable \(\epsilon\) with \(E \epsilon = 0\). Goal is to find \(\hat{y}\) to minimize mean-squared prediction error \(E(X^\top \hat{y} - y)^2\). Turns out you just need to estimate the mean of the random variable \(yX\). Notice weakness of assumption on \(X\).

Weak assumption: Assume \(X\) has only a finite mean and covariance. Given iid samples, estimate the mean \(EX\) as well as possible.

Lecture 15: Confidence intervals for high-dimensional mean estimation. We will see that for mean estimation, strong assumptions aren't necessary and you can estimate just as well assuming only bounded covariance, if you use the right notion of high-dimensional median. ("Just as well" meaning that your confidence intervals will be as small as they would have if you had Gaussian samples.) But, the resulting estimator will be in some sense exponential time.

Sum of Squares Method

We will need to develop a new tool for efficient (polynomial-time) algorithm design in statistics settings. Enter the SoS method, a unified "high level programming language" for algorithm design based on convex programming.

Lecture 16: Introduction to the SoS method.

Lecture 17: Polynomial-time mean estimation via the SoS method. We will return to the mean estimation problem above and see how to design a polynomial-time algorithm with the same estimation error as our previous exponential-time method.

Beyond iid Sampling: The Contamination Model

What if our samples are not even iid from the underlying population we want to know about? Maybe an adversary is poisoning some of them. Or maybe our model of the world is only \(\epsilon\)-close to the truth e.g. we assume the world is well-modeled by a Gaussian, but in reality it's only \(99.9\%\) close to Gaussian in TV distance. Can we still make rigorously justified mathematical conclusions from such data?

Lecture 18: Introduction to the contamination model, where a small constant fraction of data can be corrupted by a malicious adversary. Robust mean estimation from contaminated samples via SoS method.

Lecture 19: Coming full circle, we will see how to learn Ising models in polynomial time from contaminated samples.

Part III: Computational Complexity

Parts I and II of the course have largely been "optimistic:" we showed that under a variety of reasonable assumptions, we can draw meaningful conclusions from reasonable amounts of data in a rigorously justified way. In many cases we also saw tightness of these methods with less data, it's statistically impossible to draw the conclusions we want to. (E.g., recall that we saw that learning true-structured Ising models on \(n\) vertices with \(o(n \log n)\) samples is impossible.)

The last portion of the course introduces another twist: sometimes there is a gap between what is possible statistically and what is possible in polynomial time.

Example: Robust Sparse Mean Estimation: Suppose getting samples \(X_1,\ldots,X_n\), \(\epsilon\)-contaminated, from a \(d\)-dimensional distribution where the mean is promised to have just \(k\) nonzero entries. Without contamination, could use \(k \log d\) samples to learn the mean. Same is true with contamination, but if we want polynomial time no algorithm is known to use fewer than \(k^2\) samples. Why? Should we keep looking for an algorithm? Or is there reason to believe no poly-time algorithm can ever accomplish this task with \(o(k^{1.99})\) samples?

Lecture 20: Low-degree polynomials, hypothesis testing lower bounds. We will introduce a (relatively) quick-and-dirty way to assess whether known techniques for polynomial-time algorithm design are likely to solve a given high-dimensional hypothesis testing problem, by studying whether it can be solved by "low-degree polynomial tests". We'll see that the problem of distinguishing between \(N(0,I)\) and \(N(v,I)\) where \(v\) is \(k\)-sparse in the presence of contamination requires at least \(k^2\) samples for low-degree polynomial tests.

Lecture 21: The statistical query model. We will introduce a second viewpoint on complexity for hypothesis testing problems, the statistical query model, where instead of seeing iid samples, the algorithm gets to query the expected values of (bounded, real-valued) functions under an unknown distribution, and needs to identify the distribution via the answers to such queries. We will see that the statistical query model is essentially equivalent to the low-degree polynomial tests perspective.

Lecture 22: Reductions. We'll see a third perspective on complexity via reductions. The story starts with the planted clique problem: given a graph on \(n\) vertices, distnguish between two possibilities:

  1. the graph was drawn from \(G(n,1/2)\)
  2. the graph was drawn from \(G(n,1/2)\) and then an \(\omega\)-clique was added to it.

Despite 30 years of study, no polynomial-time algorithm is known which can solve this problem when \(\omega \ll \sqrt{n}\). We will see that if there were an algorithm for sparse robust mean estimation with fewer than \(k^2\) samples then there would be an algorithm for planted clique with \(\omega \ll \sqrt{n}\).

Lecture 23: Complexity wrap-up: we will widen our horizon past mean estimation to see a high-level survey of the emerging field of complexity of statistical inference, including several other perspectives (impossibility results for SoS algorithms, reductions from cryptographic problems) and a brief tour of some other core statistics problems which appear to be hard for polynomial-time algorithms.

Select a repo