數值方法 - HackMD

--- title: 數值方法 --- # Numerical Methods NTNU 數值方法 Fall 2020 ##### [Back to Note Overview](https://reurl.cc/XXeYaE) {%hackmd @sophie8909/pink_theme %} ###### tags: `NTNU` `CSIE` `選修` `Numerical Methods`  ## Score - Participation + homework : 50% - 出席且於課堂完成 100 - 出席且於當日完成90 - 出席且於三天內完成80 - 出席但未於三天內完成40 - 未出席但於三天內完成40 - 未出席未完成0 - Midterm : 25% - Final exam: 25%  ## Solving equations ### ## Solving systems ## Interpolating data ## Least squares ### Normal equation ### --- ## Horner's Method ## Iterative Method - Intermediate value theorem ### Stoppping an iterative procedure - Various criteria are used to check(almost) convergence: We terminate iterating after $n$ iterations if: - $|x_n-x_{n-1}| <$ atol - $|x_n - x_{n-1}|<$ rtol $|x_n|$ - $|f(x_n)|<$ ftol ### Bisection - Method for finding a root of scalar equation $f(x) = 0$ in an **interval $[a,b]$**( 要有初始範圍 ) - Assumption: $\textbf{f(a)f(b) < 0}$ #### After n Bisection ....? - $n > \log_{2}({\frac{b-a}{\varepsilon}})$ #### Properties - Simple - Safe - ... ### Fixed Point Iteration - Definition: The real number $x$ is a fixed point of a function g if $\textbf{g(x) = x}$ - Given a function $f(x)$, select a function $g(x)$ such that $\textbf{f(x) = 0 }\to \textbf{g(x) = x}$ - Then - $x_0 = \text{initial guess}$ - $x_{i+1} = g(x_i) \text{ for } i = 0, 1, 2, ...$ - Until $x_{i+1}$ satisies some t...?????? #### Convergence - $S = |g'(r)| < 1$( 計算切線斜率要小於1 $\to$ 會收斂 ) #### Mean Value Theorem - Let $f$ a continuous function on the interval $[a,b]$. - Then there exists a number c between a and b such that ????????  ##### Proof - Let $x_i$ denote the iterate at step $i$. There exists a number $c_i$ between $x_i$ and $r$ such that $$\begin{align} g'(c) =& (g(x_i)- g(r)) / x_i - r \\ =& (x_{i+1} - r) / (x_i - r) \end{align}$$ $$(x_{i+1} - r) = g'(c)(x_i - r)$$ $$\text{Define }e_i = |x_i - r|(真正解跟近似解的差距)$$ $\lim_{i\to \infty}{\frac{e_i}{e_i}}$  ### Newton's Algorithm 1. Start from an initial guess $x_0$ 2. For $i = 0,1,2, \dots, \text{compute}$ $$g(x) = x - \frac{f(x)}{f'(x)}$$  ### Secant method > 用近似值取代 Newton's Algorithm tangent line 的切線斜率[color=pink]  ## Solve Linear System ### Solving $Ax = b$ - elimination $O(n^3)$ - substitutaion $O(n^2)$ #### Direct method - Gaussian elimination #### Iterative method ### Error estimation - the residual $r_a = b - Ax_a$ - sensitive? ### Infinity norm - The infinity norm, or the maximum norm, of the vector $x = [x_1, ..., x_n]^T$ is - $||x||_\infty = \max |x_i|,\ i = 1, ...,n$ - The matrix (absolute row sum) norm of an $n\times n$ matrix $A$ is - $||A_x||_\infty = \max_j \sum_{j = 1}^n|x_{ij}|$ ### Definition - Residual - $r_a = b - Ax_a$ - Backward error - $||b - Ax_a||_\infty$ - Forward error - $||x - x_a||_\infty$ - Error magnification factor - $\frac{\text{relative forward error}}{\text{relative backward error}} = \frac{\frac{||x-x_a||_\infty}{||x||_\infty}}{\frac{||r||_\infty}{||b||_\infty}}$ ### Condition number - The condition number of a square matrix $A$ , $\text{cond}(A)$, is the maximum possible error magnification factor for solving Ax = b, over all right-hand sides b. - $\text{cond}(A) = ||A||_\infty \cdot ||A^{-1}||_{\infty}$ ### Summary of error estimation $\frac{\text{relative forward error}}{\text{relative backward error}} = \frac{\frac{||x-x_a||_\infty}{||x||_\infty}}{\frac{||r||_\infty}{||b||_\infty}} = \text{cond}(A) = ||A||_\infty \cdot ||A^{-1}||_{\infty}$ ### Gaussian elimination with partial pivoting $PA = LU$ #### Swamping multiple too big - solution: - Partial pivoting - pivot is the biggest - Permutation matrix - A permutation matrix is a $n \times n$ matrix consisting of all zeros, except for a single 1 in every row and column ## Iterative - faster if the input matrix is large - A good approximation to the solution is already - The input matrix is sparse ### Direct vs. iterative Linear solvers - Direct - Computation is numerically stable in many relevant cases - Can solve economically for severl right-hand sides - But: fill-in limits usefulness - Iterative - Only a rough approximation to x is required. - A good x_0 approximating x is generally known(warm start). - ### Typical scenarios - Direct solvers - Many linear systems with the same matrix A - Application that requires very accurate solutions - Iterative solvers - Many linear systems with "slightly changing" ## Jacobi method ### Strictly diagonally dominant $|a_{ii}|>\sum_{j\neq i}|a_{ij}|$  ## Solving Nonlinear Systems of Equations (11/13) ### Multivariate Newton's method #### Review one-variable Newton's method Given from an initial guess $x_0$ 1. Start from an initial guess $x_0$ 2. For i = 0, 1, 2 ... compute ...... #### Extending to more variables - Suppose we have 3 unknowns, 3 nonlinear equations: $$ f_1(u, v, w) = 0\\ f_2(u, v, w) = 0\\ f_3(u, v, w) = 0 $$ - Define the vector-valued function: $$ F(x) = F(u,v,w)=(f_1,f_2,f_3) $$ where $$ x=(u,v,w) $$ #### Jacobian matrix - 3 varibles:$x=(u,v,w)$ - s functions:$f_1,f_2,f_3$. $$ D_F(x)=\begin{bmatrix} \frac{\partial f_1}{\partial u}&\frac{\partial f_1}{\partial v}&\frac{\partial f_1}{\partial w}\\ \frac{\partial f_2}{\partial u}&\frac{\partial f_2}{\partial v}&\frac{\partial f_2}{\partial w}\\ \frac{\partial f_3}{\partial u}&\frac{\partial f_3}{\partial v}&\frac{\partial f_3}{\partial w}\\ \end{bmatrix} $$ #### Multivariate Newton's method - The Taylor expansion for $F(x)$ around $x_0$ is $F(x)=F(x_0)+D_F(x_0)\cdot(x-x_0)+O(x-x_0)^2$ - Let $r$ be the root, $x_0$ be the current guess $$ \begin{align} 0=F(r)&\approx F(x_0)+D_F(x_0)\cdot(r-x_0)\\ -F(x_0)&\approx D_F(x_0)\cdot(r-x_0)\\ -D_F(x_0)^{-1}F(x_0)&\approx (r-x_0)\\ x_0-D_F(x_0)^{-1}F(x_0) &\approx r \end{align} $$ - $x_{k+1} = x_k-D_F(x_k)^{-1}F(x_k)$ - $s= -D_F(x_k)^{-1}F(x_k)$ $D_F(x_k)s= -F(x_k)$ - $x_{k+1} = x_k+s$ ## Interpolation ### Interpolating data - We are given a collection of **data samples**${(x_i,y_i)}_{i=0}^n$. - $x_i$: the abscissas - $y_i$: the data values - Interpolation is the reverse of evaluation - **Evaluation** ( e.g. nested multiplication): given a polynomial, evaluate a y-value for a given x-value. - **Interpolation**: given these points, compute a polynomial that can generate them. ### Interpoltion, extrapolation, approximation - Interpolation - The new value $x\neq x_i$ is inside the range of the interpolation points $x_0, x_1, \dots, x_n$. - Extrapolation - The new value $z$ is outside this range - Approximation - Some norm $||v-y||$ of the differenvce of the vectors $v = [v(x0,\dots ,v(xn))]$ and y = $[y0,\dots,yn]$ is minimized. ### Lagrange interpolation 拉格朗日插值法 - Given $n$ data points $(x_1,y_1),(x_2, y_2,)\dots,(x_n,y_n)$, the polynomial that interpolates the points is: - Degree of $P(x)$ is at most $d=n-1$ - when $d= n-1$, the polynomial for n point unique (當最高次為n-1, 通過n個點的方程式唯一) - $P(X)$: a polynomial in this form: $$ P(x) = y_1L_1(x)+y_2L_2(x)+\cdot+y_nL_n(x)\\ L_k(x) = \frac{(x-x_1)\cdots(x-x_{k-1})(x-x_{k+1})\cdots(x-x_n)}{(x_k-x_1)\cdots(x_k-x_{k-1})(x_k-x_{k+1})\cdots(x_k-x_n)} $$ - Proof sketch(by contradiction) - Suppose $P(x)$ and $Q(x)$ have degree at most $n-1$ and both interpolate all $n$ points - Now, define $H(x)=P(x)-Q(x)$ - Degree of $H(x)$? - $H(x_i)=P(x_i)-Q(x_i)=y_i-y_i=0$ - $0=H(x_1)=H(x_2)=\cdots=H(x_n)$ - $H(x)$ has a degree at most $n-1$ and it has $n$ roots. - $H(x)$ is the identically zero polynomial, and $P(x)\equiv Q(x)$ ### Interpolation error (**will apear in final exam**) - Assume we have $_i = f(x_i), i=0,1,\cdots, n$ and an interpolating polynomial P(x). The interpolation error at $x$ is $$ f(x)-P(x)=\frac{(x-x_1)(x-x_2)\cdots(x-x_n)}{n!}f^{(n)}(c) $$ $c$ lies beween th smaillest and largest of $x_i$ ### Runge Phenomenon - Polynomials can fit any set of data points - But, there are some shapes that polynomial prefer over other. - Example - interpolate $f(x)={1\over(1+12x^2)}$ at evently spaced points in $[-1,1]$ - the shape ![](https://i.imgur.com/4RaXvWg.png =300x) ### Newton's divided differences - $P(x)$:a polynomial in this form: $$ P(x)=c_0+c_1(x - x_1)+c_2(x-x_1)(x-x_2)+\cdots+c_{n - 1}(x-x_1)\cdots(x-x_{n - 1})\\ = c_0 + (x-x_1)( c_1 + (x - x_2)( c_2 + (x - x_3) \cdots))\\ $$ - Divided difference Denoted by $f[x_1\cdots x_n]$ the coefficient of the $x^{n-1}$ term $f[x_1\cdots x_n] \equiv c_{n-1}$ $$ \begin{align} P(x)=f[x_1]&+f[x_1\ x_2](x-x_1)\\ &+f[x_1\ x_2\ x_3](x-x_1)(x-x_2)\\ &+f[x_1\ x_2\ x_3\ x_4](x-x_1)(x-x_2)(x-x_3)\\ &+\cdots\\ &+f[x_1\cdots x_n](x-x_1)\cdots(x-x_{n-1}) \end{align} $$ - 0-th $$ f[x_i]=P(x_i)=y_i $$ - 1-st $$ f[x_i\ x_{i+1}]=\frac{f[x_{i+1}]-f[x_i]}{x_{i+1}-x_i} $$ - k-th divided difference $$ f $$ - Degree of P? - at most (n - 1) - ![](https://i.imgur.com/EkXQdvz.png) # Approximation - Solving inconsistent systems of equations $$ x_1 + x_2 = 2\\ x_1 + x_2 = 3\\ x_1 - x_2 = 1 $$ - No solution - Find the "closest" x instead ## Course contents - 12/11 - Normal equations for least squares - Solving an inconsistent system - Fitting data - A survey of models ## Least squares approximation - Fitting model to data - Linear model $y = at + b$ - find $a, b$ such that $\sum e_i^2$ is minimized - Seek to locate the specific instance of the model that best fits the data points - Normal equations for least squares - Solving an inconsistent system - Fitting data ## An inconsistent system $$ x_1 + x_2 = 2\\ x_1 + x_2 = 3\\ x_1 - x_2 = 1 $$ - The matrix from (Ax = b: $$ \begin{bmatrix} 1&1\\1&-1\\1&1\end{bmatrix}\begin{bmatrix} x_1\\x_2\end{bmatrix} = \begin{bmatrix} 2\\1\\3\end{bmatrix} $$ - or $$ x_1\begin{bmatrix}1\\1\\1\end{bmatrix}_{(v1)}+x_1\begin{bmatrix}1\\-1\\1\end{bmatrix}_{(v2)}=\begin{bmatrix}2\\1\\3\end{bmatrix}_{(b)} $$ - Any mxn system Ax = b can be viewed as a vector equation $$x_1v_1+x_2v_x + \cdots + x_nv_n = b$$ - $b$ is a linear combination of the columns $v_i$ of $A$, with coefficients $x_1,\dots,x_n$ - Has a solution if $b$ lies on the plane - Find a point in the plane $Ax$ closest to $b$ - $\widetilde{b} = A\widetilde{x}$ - Residual vector $b - \widetilde{b} = b - A\widetilde{x}$ - $(b - A\widetilde{x})\perp Ax$ - . $$ (Ax)^T(b-A\widetilde{x}) = 0 \forall x\\ \implies x^TA^T(b-A\widetilde{x}) = 0\\ \implies A^T(b-A\widetilde{x}) = 0\\ \implies (A^TA)\widetilde{x} = A^Tb = 0 $$ ## Popular methods - Euclidean length(2-norm) $$ \|r\|_2=\sqrt{r_1^2+r_2^2+\dots+r_m^2} $$ - Squared error $$ \text{SE} = r_1^2+r_2^2+\dots+r_m^2 $$ - Root mean squared error(RMSE) $$ \text{RMSE}=\sqrt{\text{SE}\over m} $$ ## A survey of models - Linear: $y= c_1 + c_2x$ - Parabola: $y=c_1+c_2x+c_3x^2$ - others - Periodic models 週期性的 - $y = c_1 + c_2 \cos2\pi t + c_3 \sin2\pi t$ - Exponential data - $y=c_1e^{c_2t}$ - Applying the natural logarithm $$\begin{aligned} \ln y &= \ln(c_1e^{c_2t}) \\ \ln y &= \ln c_1+e^{c_2t} = \ln c_1 + c_2t\\ \text{let } k &= \ln c_1\\ \ln y &= k + c_2t \end{aligned} $$ - model linearization changes the least squares problem - The original problems minimizes $$ (c_1e^{c_2t_1}-y_1)^2+\cdots+(c_1e^{c_2t_m}-y_m)^2 $$ - The **lineared** problems minimizes $$ (\ln{c_1}+c_2t_1-\ln{y_1})^2+\cdots+(\ln{c_1}+c_2t_m-\ln{y_m}) $$ ## QR Factorization - Orthogonal set - vectors in which $v_i^Tv_j = 0$ whenever $i \neq j$ - Orthonormal set - An orthogonal set of **unit vectors** - Orthogonal matrix - The column vectors from an orthogonal set - Reduced QR factorization $$ (A_1|\cdots |A_n) = (q_1|\cdots |q_n) $$ - Full QR factorization $$ (A_1|\cdots|A_n)=(q_1|\cdots|q_m) \begin{bmatrix}r_{11}&r_{12}&\cdots&r_{1n}\\ &r_{22}&\cdots&r_{2n}\\ &&\ddots&\vdots\\ &&&r_{mn} \end{bmatrix} $$ - A = QR - Q is orthogonal $\implies Q^{-1} = W^T$ - R is upper-triangular - Given an $m\times n$ inconsistent system $Ax=b$ - $Ax=b$ - $QRx=b$ - $Q^{-1}QRx=Q^{-1}b=Q^Tb$ - $Rx=Q^Tb$ *directly using back substitution to solve* $x$ - QR vs. Normal equations - avoid computing $A^TA$ ## Gram-Schmidt method ## Gauss-Newton method ### Least squares - The least squares solution $\tilde{x}$ of a **linear** system $A\tilde{x}=b$ minimizes the Euclidean norm of the residual $\|A\tilde{x}-b\|_2$ - two method to find $\tilde{x}$ - Normal equations - QR factorization ## Gauss-Newton method - Consider the system of $m$(nonlinear) equations in $n$ unknowns: **(m>n)** $$ r_1(x_1,\cdots,x_n)=0\\ \vdots\\ r_m(x_1,\cdots,x_n)=0 $$ - Sum of the squares of the $$ r_1(x)^2+r_2(x)^2+\cdots+r_m(x)^2 $$ 1. set $x^0=$ initial vector, 2. **for** k = 0, 1, 2, 3... - $$ x_0=\text{initial vector}\\ x_{k+1}=x_k-\left(D_r(x_k)^TD_r(x_k)\right)^{-1}D_r(x_k)^Tr(x_k) \text{ for }k=1,2,3...\\ x_{k+1}=x_k+v_k $$ - $$ v_k=-\left(D_r(x_k)^TD_r(x_k)\right)^{-1}D_r(x_k)^Tr(x_k)\\ \left(D_r(x_k)^TD_r(x_k)\right)v_k=-D_r(x_k)^Tr(x_k) $$  *[Interpolation]: 內插 *[Extrapolation]: 外插 *[Approximation]: 近似