# Lecture 22: Orthogonal Polynomials, 3-term recurrence, Jacobi Coeffs.
We defined orthogonal polynomials $\{p_n\}$ with respect to a weight function $\mu$ on an interval. We showed that they are an orthogonal basis of $L^2$ when the interval is finite (Weierstrass), and that they satisfy a three term recurrence
$$ p_{n+1}(x) = (x-a_n)p_n(x) - b_np_{n-1}(x)$$
for some $a_n$ and $b_n>0$ called Jacobi coefficients (proof: just expand $xp_n(x)$ as a linear combination of $p_0,\ldots,p_{n+1}$ and notice that most terms vanish).
We also showed that they have real, distinct roots and that the roots of $p_n$ interlace those of $p_{n+1}$, by recognizing that $p_n$ is the characteristic polynomial of an $n\times n$ tridiagonal matrix $J_n$ with diagonals $a_n$ and off-diagonals $\sqrt{b_n}$. This follows by writing the three-term recurrence in matrix form, evaluating the recurrence for $p_0,\ldots,p_{n-1}$ at the zeros $x_1,\ldots,x_n$ of $p_n(x)$, and applying a similarity by the diagonal matrix with entries $\sqrt{c_j}:=\|p_j\|$, using further that $b_j=c_j/c_{j-1}$.
# Lecture 23: Gauss Quadrature, Weights, Favard's Theorem
**Theorem.** Let $x_1,\ldots,x_n$ be the zeros of $p_n(x)$ the $n$th OP with respect to a measure $\mu$. Then there are positive weights $w_i$ such that
$$ \int q(x)d\mu(x) = \sum_{i} q(x_i)w_i$$
for every polynomial $f$ of degree at most $2n-1$.
*Proof.* Use the Euclidean algorithm to write $f(x)=q(x)p_n(x)+r(x)$ for $q,r$ of degree at most $n-1$. This reduces the problem to $r(x)$, which is easily solved by considering Lagrange interpolants.
We then showed that the weights in the quadrature can be computed by computing eigenvectors of the associated Jacobi matrix.
Favard's theorem is a converse to the fact that every measure generates a sequence of Jacobi coefficients. We proved the theorem for compactly supported measures.
**Theorem.** Suppose $a_n$ and $b_n>0$ are two bounded sequences. Then there is a unique compactly supported measure $\mu$ with them as its Jacobi coefficients.
*Proof.*
*Existence*. Form the infinite tridiagonal Jacobi matrix $J$ with diagonal entries $a_0$ and $J(n+1,n)=J(n,n+1)=\sqrt{b_n}$ for $n=0,1,\ldots$. Then $J$ is a bounded operator. Let $\mu$ be the spectral measure of $J$ for the vector $e_0$, i.e.:
$$(e_0,J^ke_0) = \int x^k d\mu(x)$$
for every $k$. Let $v_n$ be the (unnormalized) iterates of Gram-Schmidt run on the vectors $e_0, Je_0,J^2e_0,\ldots$. We will show inductively that $v_n/\|v_n\|=e_n$ and $\|v_n\|=\sqrt{b_n}$; this is trivially true for $n=0$. For the induction step:
$$ v_{n+1} = Je_n - (Je_n,e_{n})e_{n} - (Je_n,e_{n-1})e_{n-1} = Je_n - a_ne_{n}-\sqrt{b_n}e_{n-1} = \sqrt{b_n}e_{n+1}$$
by the tridiagonal structure of the matrix, and normalizing yields $v_{n+1}/\|v_{n+1}\|=e_{n+1}$.
Let $\phi_n$ be the orthonormal polynomials with respect to $\mu$. We will show inductively that $\phi_n(J)e_0=e_n$. By Gram-Schmidt, for the unnormalized monic polynomials $p_n$:
$$p_{n+1} = x\phi_{n} - (x\phi_n,\phi_n)\phi_{n} - (x\phi_n,\phi_{n-1})\phi_{n-1}.$$ Evaluating this identity at $J$ and applying to the vector $e_0$, and using that $(\phi_m, x^k \phi_n)=(\phi_m(J)e_0,J^k \phi_n(J)e_0)$ for every $m,n$ by the spectral theorem:
$$p_{n+1}(J)e_0 = J\phi_{n}(J)e_0 - (J\phi_n(J)e_0,\phi_n(J)e_0)\phi_{n}(J)e_0 - (J\phi_n(J)e_0,\phi_{n-1}(J)e_0)\phi_{n-1}(J)e_0
$$
$$ = Je_n - (Je_n,e_n)e_n - (Je_n,e_{n-1})e_{n-1},\textrm{ by induction}$$
Thus, the vectors $\phi_n(J)e_0$ satisfy exactly the same recurrence as the $v_n$ with the same initial condition $e_0$, so we must have $v_n/\|v_n\|=\phi_n(J)e_0=e_n$ for all $n$.
Thus, the polynomials $\phi_n$ satisfy the recurrence:
$$\sqrt{b_{n+1}}\phi_{n+1} = (x-a_n)\phi_n - \sqrt{b_{n-1}}\phi_n$$
which is the normalized version of the standard three term recurrence, as desired.
*Uniqueness.* Suppose $\nu$ is another measure with the same Jacobi coefficients $a_n,b_n$. Then the orthogonal polynomials $p_n$ with respect to $\mu$ are $\det(x-J_n)$ where $J_n$ is the finite $n\times n$ tridiagonal matrix with these coefficients. This is a finite submatrix of the infinite $J$ above. We then have for every $k$
$$ (e_0, J^k e_0) = (e_0, J_k^ke_0)= (Qe_0,X^k Qe_0)$$
by the explicit diagonalization $J_k=Q^*XQ$ in the previous lecture. But $Qe_0 = (\phi_0(x_1),\ldots,\phi_0(x_k))^T$ by the explicit diagonalization in the previous lecture, where $x_j$ are the zeros of $p_k$. Thus, by the quadrature formula:
$$ (Qe_0, X^k Qe_0) = \sum_{j\le n} \phi_0(x_j)x_j^kw_j = \int x^kd\nu(x)$$
by Gauss quadrature. Since $\nu$ agrees with $\mu$ in all moments and both measures are compactly supported, we must have $\mu=\nu$. $\square$
Thus, the process of generating OP (and thereby a Jacobi operator) from a measure can be seen as the "inverse" of the spectral theorem, which generates a measure from an operator.
# Lecture 24: Chebyshev Series and Projections
The goal of the next few lectures is to understand efficient methods for approximating smooth enough functions on an interval by polynomials. This is one of the tools used in discretizing differential equations so that they can be solved on a computer, and is also useful in many other areas of mathematics.
We introduced the Chebyshev polynomials
$$ T_n(\cos(\theta)) = \cos(n\theta)$$
which are orthogonal on $[-1,1]$ with respect to the weight function $w(x)=(1-x^2)^{-1/2}$. They form an orthogonal basis of $L^2([-1,1],w(x)dx)$ so every $f$ in this space may be expanded as
$$ f = \sum_{n=0}^\infty a_n T_n(x)$$
where $a_n=(T_n,f)/\|T_n\|^2$ and the convergence is in $L^2$.
The partial sums of the above series are called Chebyshev projections. We showed that (A) the Chebyshev coefficients $a_n$ of a Lipschitz continuous function $f$ decay at rate $1/n$ (B) those of a function analytic in the Bernstein ellipse $E_\rho$ around $[-1,1]$ (i.e., the image of the annulus $ann(\rho^{-1},\rho)$ under the Joukowski map $z\mapsto (z+z^{-1})/2$) decay at rate $\rho^{-n}$. These imply that the uniform (sup norm) error of the Chebyshev projections is of order $1/\sqrt{n}$ and $\rho^{-n+1}$, respectively.
The proof of (A) was by interpreting the Chebyshev coefficients as Fourier coefficients of $f(\cos(\theta))$ on the circle. The proof of (B) was by interpreting them as Laurent coefficients of $f((z+z^{-1})/2)$ on an annulus containing the unit circle and shifting the contour in the integral formula for the Laurent coefficients, using analyticity of $f$.
# Lecture 25: Chebyshev Interpolation, Hermite Integral Formula
Given a set of distinct points $x_0,\ldots,x_n\in [-1,1]$ define the $n$th polynomial interpolant of a function $f$ as the unique polynomial $\hat{f}_n$ of degree $n$ such that $\hat{f}(x_j)=f(x_j)$ for all $j=0,\ldots,n$; explicitly we have
$\newcommand{\fh}{\hat{f}}$
$$ \fh(x_j) = \sum_j f(x_j)\ell_j(x)$$
where $\ell_j$ are the Lagrange interpolants. The main question is: how well (and for which $x_j$) does $\|f-\fh_n\|_\infty\rightarrow 0$?
We showed that taking $x_j$ to be the Chebyshev extreme points (i.e. Lobatto points) yields bounds comparable to the error bounds for Chebyshev series. The key phenomenon enabling this is *aliasing*, namely that for $x_j=\cos(\pi j/n)$:
$$T_m(x_j)=T_k(x_j)$$
whenever $m\pm k = 0 (mod ~2n)$. Thus, we could easily relate $\fh_n$ to the Chebyshev series of $f$.
We then proved the **Hermite Integral Formula**: if $p(x)$ is the interpolant of $f$ through $x_1,\ldots,x_n$, then for every $x\in[-1,1]\setminus\{x_1,\ldots,x_n\}$:
$$ f(x)-p(x)=\frac{1}{2\pi i}\oint_\gamma \frac{\ell(x)}{\ell(s)}\frac{f(s)}{s-x}ds$$
where
$$\ell(s):=\prod_{j=1}^n (s-x_j),$$
$\gamma$ is any contour containing the unit interval, and $f$ is assumed to be analytic inside $\gamma$. The proof was essentially an application of the residue theorem.
Using this formula, we showed that any choice of distinct interpolation points gives exponentially decaying error in $n$ when $f$ is analytic in the "stadium" $S_\alpha:=\{z:dist(z,[-1,1])\le \alpha\}$ for $\alpha>2$. This fails when $f$ is only analytic in a smaller region, in which case care must be taken to choose the points.
Using the HIF, we found that Chebyshev interpolants have exponentially decaying error $O(\rho^{-n})$ when $f$ is analytic in the Bernstein ellipse $E_\rho$ (which may be much smaller than the stadium $S_2$), matching the error bound for Chebyshev projection. The proof is clearly robust to small perturbations of the points $x_j$, and implies essentially the same result for the Lobatto points by interlacing of extreme points and zeros.
# Lecture 26: Potential Theory, Lebesgue Constants
### Potential Theory
Let's begin with a motivating example. Consider the Runge function. Then, it is observed that the interpolation error diverges exponentially for equidistributed points, and converges exponentially for the Chebyshev zeros. We now develop a framework to explain this phenomenon, which will also explain why the Chebyshev points are so good.
In the last lecture, we saw that the error of polynomial interpolation at $x_1,\ldots,x_n$ is controlled by the ratio
$$ \ell(x)/\ell(s).$$
An interpolation scheme consists of a specification of interpolation points for every $n$. Letting $u_n(x):=$, the above quantity may be written as $\exp n(u_n(x)-u_n(s))$. For good approximation, we would like $u_n(x)-u_n(s)<0$ as $n$\rightarrow\infty$.
Let us study the limiting objects. For a sequence of measures $\mu_n$, we say $\mu_n$ converges wealky to $\mu$ if TODO. For example, equispaced points converge to uniform, and Chebyshev points converge to the arcsin law. (Remark: this is true for all orthogonal polynomials with reasonable weights).
We now define the limiting potential to be $u(t):=\int$, $t\notin [-1,1]$. This definition can also be extended to $x$ in the interval, though $u_n$ is certainly not continuous there.
Let's calculate the limiting potentials for Chebyshev and equispaced points.
Chebyshev: the limiting potential is equal to $-\log 2 +\log \rho$ on $E_\rho$, including $\rho=1$. So for a function analytic inside $E_\rho$, we have $u(x)-u(s)<0$.
Equispaced: The limiting potential is TODO.
### Lebesgue Constants
We defined Lebesgue constants and showed that the Lebesgue constant for Chebyshev interpolation is $O(\log n)$, so it's never much worse than the best approximation. The proof I presented is from Natanson, *Constructive Function Theory* Vol III, Chap. 2.