---
title: 數值方法
---
# Numerical Methods
NTNU 數值方法 Fall 2020
##### [Back to Note Overview](https://reurl.cc/XXeYaE)
{%hackmd @sophie8909/pink_theme %}
###### tags: `NTNU` `CSIE` `選修` `Numerical Methods`
<!-- tag順序 [學校] [系 必選] or [學程 學程名(不含學程的 e.g. 大師創業)] [課程] -->
## Score
- Participation + homework : 50%
- 出席且於課堂完成 100
- 出席且於當日完成90
- 出席且於三天內完成80
- 出席但未於三天內完成40
- 未出席但於三天內完成40
- 未出席未完成0
- Midterm : 25%
- Final exam: 25%
<!-- 照這邊整理 -->
## Solving equations
###
## Solving systems
## Interpolating data
## Least squares
### Normal equation
###
---
## Horner's Method
## Iterative Method
- Intermediate value theorem
### Stoppping an iterative procedure
- Various criteria are used to check(almost) convergence: We terminate iterating after $n$ iterations if:
- $|x_n-x_{n-1}| <$ atol
- $|x_n - x_{n-1}|<$ rtol $|x_n|$
- $|f(x_n)|<$ ftol
### Bisection
- Method for finding a root of scalar equation $f(x) = 0$ in an **interval $[a,b]$**( 要有初始範圍 )
- Assumption: $\textbf{f(a)f(b) < 0}$
#### After n Bisection ....?<!-- 待修 -->
- $n > \log_{2}({\frac{b-a}{\varepsilon}})$
#### Properties
- Simple
- Safe
- ...
### Fixed Point Iteration
- Definition: The real number $x$ is a fixed point of a function g if $\textbf{g(x) = x}$
- Given a function $f(x)$, select a function $g(x)$ such that $\textbf{f(x) = 0 }\to \textbf{g(x) = x}$
- Then
- $x_0 = \text{initial guess}$
- $x_{i+1} = g(x_i) \text{ for } i = 0, 1, 2, ...$
- Until $x_{i+1}$ satisies some t...??????
#### Convergence
- $S = |g'(r)| < 1$( 計算切線斜率 要小於1 $\to$ 會收斂 )
#### Mean Value Theorem
- Let $f$ a continuous function on the interval $[a,b]$.
- Then there exists a number c between a and b such that ????????
<!-- 你來乾消失了啦哈哈哈啊-->
##### Proof
- Let $x_i$ denote the iterate at step $i$. There exists a number $c_i$ between $x_i$ and $r$ such that
$$\begin{align}
g'(c) =& (g(x_i)- g(r)) / x_i - r \\
=& (x_{i+1} - r) / (x_i - r)
\end{align}$$
$$(x_{i+1} - r) = g'(c)(x_i - r)$$
$$\text{Define }e_i = |x_i - r|(真正解跟近似解的差距)$$
$\lim_{i\to \infty}{\frac{e_i}{e_i}}$ <!-- 待改 -->
### Newton's Algorithm
1. Start from an initial guess $x_0$
2. For $i = 0,1,2, \dots, \text{compute}$
$$g(x) = x - \frac{f(x)}{f'(x)}$$
<!-- QAQ -->
### Secant method
> 用近似值取代 Newton's Algorithm tangent line 的切線斜率[color=pink]
<!-- 不小心睡著了他在說啥???哈哈哈哈哈哈痾他在講newton's algo ??? -->
## Solve Linear System
### Solving $Ax = b$
- elimination $O(n^3)$
- substitutaion $O(n^2)$
#### Direct method
- Gaussian elimination
#### Iterative method
### Error estimation
- the residual $r_a = b - Ax_a$
- sensitive?
### Infinity norm
- The infinity norm, or the maximum norm, of the vector $x = [x_1, ..., x_n]^T$ is
- $||x||_\infty = \max |x_i|,\ i = 1, ...,n$
- The matrix (absolute row sum) norm of an $n\times n$ matrix $A$ is
- $||A_x||_\infty = \max_j \sum_{j = 1}^n|x_{ij}|$
### Definition
- Residual
- $r_a = b - Ax_a$
- Backward error
- $||b - Ax_a||_\infty$
- Forward error
- $||x - x_a||_\infty$
- Error magnification factor
- $\frac{\text{relative forward error}}{\text{relative backward error}} = \frac{\frac{||x-x_a||_\infty}{||x||_\infty}}{\frac{||r||_\infty}{||b||_\infty}}$
### Condition number
- The condition number of a square matrix $A$ , $\text{cond}(A)$, is the maximum possible error magnification factor for solving Ax = b, over all right-hand sides b.
- $\text{cond}(A) = ||A||_\infty \cdot ||A^{-1}||_{\infty}$
### Summary of error estimation
$\frac{\text{relative forward error}}{\text{relative backward error}} = \frac{\frac{||x-x_a||_\infty}{||x||_\infty}}{\frac{||r||_\infty}{||b||_\infty}} = \text{cond}(A) = ||A||_\infty \cdot ||A^{-1}||_{\infty}$
### Gaussian elimination with partial pivoting
$PA = LU$
#### Swamping
multiple too big
- solution:
- Partial pivoting
- pivot is the biggest
- Permutation matrix
- A permutation matrix is a $n \times n$ matrix consisting of all zeros, except for a single 1 in every row and column
## Iterative
- faster if the input matrix is large
- A good approximation to the solution is already
- The input matrix is sparse
### Direct vs. iterative Linear solvers
- Direct
- Computation is numerically stable in many relevant cases
- Can solve economically for severl right-hand sides
- But: fill-in limits usefulness
- Iterative
- Only a rough approximation to x is required.
- A good x_0 approximating x is generally known(warm start).
-
### Typical scenarios
- Direct solvers
- Many linear systems with the same matrix A
- Application that requires very accurate solutions
- Iterative solvers
- Many linear systems with "slightly changing"
## Jacobi method
### Strictly diagonally dominant
$|a_{ii}|>\sum_{j\neq i}|a_{ij}|$
<!-- This is the end of the note -->
## Solving Nonlinear Systems of Equations (11/13)
### Multivariate Newton's method
#### Review one-variable Newton's method
Given from an initial guess $x_0$
1. Start from an initial guess $x_0$
2. For i = 0, 1, 2 ... compute
......
#### Extending to more variables
- Suppose we have 3 unknowns, 3 nonlinear equations:
$$
f_1(u, v, w) = 0\\
f_2(u, v, w) = 0\\
f_3(u, v, w) = 0
$$
- Define the vector-valued function:
$$
F(x) = F(u,v,w)=(f_1,f_2,f_3)
$$
where
$$
x=(u,v,w)
$$
#### Jacobian matrix
- 3 varibles:$x=(u,v,w)$
- s functions:$f_1,f_2,f_3$.
$$
D_F(x)=\begin{bmatrix}
\frac{\partial f_1}{\partial u}&\frac{\partial f_1}{\partial v}&\frac{\partial f_1}{\partial w}\\
\frac{\partial f_2}{\partial u}&\frac{\partial f_2}{\partial v}&\frac{\partial f_2}{\partial w}\\
\frac{\partial f_3}{\partial u}&\frac{\partial f_3}{\partial v}&\frac{\partial f_3}{\partial w}\\
\end{bmatrix}
$$
#### Multivariate Newton's method
- The Taylor expansion for $F(x)$ around $x_0$ is
$F(x)=F(x_0)+D_F(x_0)\cdot(x-x_0)+O(x-x_0)^2$
- Let $r$ be the root, $x_0$ be the current guess
$$
\begin{align}
0=F(r)&\approx F(x_0)+D_F(x_0)\cdot(r-x_0)\\
-F(x_0)&\approx D_F(x_0)\cdot(r-x_0)\\
-D_F(x_0)^{-1}F(x_0)&\approx (r-x_0)\\
x_0-D_F(x_0)^{-1}F(x_0) &\approx r
\end{align}
$$
- $x_{k+1} = x_k-D_F(x_k)^{-1}F(x_k)$
- $s= -D_F(x_k)^{-1}F(x_k)$
$D_F(x_k)s= -F(x_k)$
- $x_{k+1} = x_k+s$
## Interpolation
### Interpolating data
- We are given a collection of **data samples**${(x_i,y_i)}_{i=0}^n$.
- $x_i$: the abscissas
- $y_i$: the data values
- Interpolation is the reverse of evaluation
- **Evaluation** ( e.g. nested multiplication): given a polynomial, evaluate a y-value for a given x-value.
- **Interpolation**: given these points, compute a polynomial that can generate them.
### Interpoltion, extrapolation, approximation
- Interpolation
- The new value $x\neq x_i$ is inside the range of the interpolation points $x_0, x_1, \dots, x_n$.
- Extrapolation
- The new value $z$ is outside this range
- Approximation
- Some norm $||v-y||$ of the differenvce of the vectors $v = [v(x0,\dots ,v(xn))]$ and y = $[y0,\dots,yn]$ is minimized.
### Lagrange interpolation 拉格朗日插值法
- Given $n$ data points $(x_1,y_1),(x_2, y_2,)\dots,(x_n,y_n)$, the polynomial that interpolates the points is:
- Degree of $P(x)$ is at most $d=n-1$
- when $d= n-1$, the polynomial for n point unique (當最高次為n-1, 通過n個點的方程式唯一)
- $P(X)$: a polynomial in this form:
$$
P(x) = y_1L_1(x)+y_2L_2(x)+\cdot+y_nL_n(x)\\
L_k(x) = \frac{(x-x_1)\cdots(x-x_{k-1})(x-x_{k+1})\cdots(x-x_n)}{(x_k-x_1)\cdots(x_k-x_{k-1})(x_k-x_{k+1})\cdots(x_k-x_n)}
$$
- Proof sketch(by contradiction)
- Suppose $P(x)$ and $Q(x)$ have degree at most $n-1$ and both interpolate all $n$ points
- Now, define $H(x)=P(x)-Q(x)$
- Degree of $H(x)$?
- $H(x_i)=P(x_i)-Q(x_i)=y_i-y_i=0$
- $0=H(x_1)=H(x_2)=\cdots=H(x_n)$
- $H(x)$ has a degree at most $n-1$ and it has $n$ roots.
- $H(x)$ is the identically zero polynomial, and $P(x)\equiv Q(x)$
### Interpolation error (**will apear in final exam**)
- Assume we have $_i = f(x_i), i=0,1,\cdots, n$ and an interpolating polynomial P(x). The interpolation error at $x$ is
$$
f(x)-P(x)=\frac{(x-x_1)(x-x_2)\cdots(x-x_n)}{n!}f^{(n)}(c)
$$
$c$ lies beween th smaillest and largest of $x_i$
### Runge Phenomenon
- Polynomials can fit any set of data points
- But, there are some shapes that polynomial prefer over other.
- Example
- interpolate $f(x)={1\over(1+12x^2)}$ at evently spaced points in $[-1,1]$
- the shape

### Newton's divided differences
- $P(x)$:a polynomial in this form:
$$
P(x)=c_0+c_1(x - x_1)+c_2(x-x_1)(x-x_2)+\cdots+c_{n - 1}(x-x_1)\cdots(x-x_{n - 1})\\
= c_0 + (x-x_1)( c_1 + (x - x_2)( c_2 + (x - x_3) \cdots))\\
$$
- Divided difference
Denoted by $f[x_1\cdots x_n]$ the coefficient of the $x^{n-1}$ term $f[x_1\cdots x_n] \equiv c_{n-1}$
$$
\begin{align}
P(x)=f[x_1]&+f[x_1\ x_2](x-x_1)\\
&+f[x_1\ x_2\ x_3](x-x_1)(x-x_2)\\
&+f[x_1\ x_2\ x_3\ x_4](x-x_1)(x-x_2)(x-x_3)\\
&+\cdots\\
&+f[x_1\cdots x_n](x-x_1)\cdots(x-x_{n-1})
\end{align}
$$
- 0-th
$$
f[x_i]=P(x_i)=y_i
$$
- 1-st
$$
f[x_i\ x_{i+1}]=\frac{f[x_{i+1}]-f[x_i]}{x_{i+1}-x_i}
$$
- k-th divided difference
$$
f
$$
- Degree of P?
- at most (n - 1)
- 
# Approximation
- Solving inconsistent systems of equations
$$
x_1 + x_2 = 2\\
x_1 + x_2 = 3\\
x_1 - x_2 = 1
$$
- No solution
- Find the "closest" x instead
## Course contents
- 12/11
- Normal equations for least squares
- Solving an inconsistent system
- Fitting data
- A survey of models
## Least squares approximation
- Fitting model to data
- Linear model $y = at + b$
- find $a, b$ such that $\sum e_i^2$ is minimized
- Seek to locate the specific instance of the model that best fits the data points
- Normal equations for least squares
- Solving an inconsistent system
- Fitting data
## An inconsistent system
$$
x_1 + x_2 = 2\\
x_1 + x_2 = 3\\
x_1 - x_2 = 1
$$
- The matrix from (Ax = b:
$$
\begin{bmatrix} 1&1\\1&-1\\1&1\end{bmatrix}\begin{bmatrix}
x_1\\x_2\end{bmatrix} = \begin{bmatrix} 2\\1\\3\end{bmatrix}
$$
- or
$$
x_1\begin{bmatrix}1\\1\\1\end{bmatrix}_{(v1)}+x_1\begin{bmatrix}1\\-1\\1\end{bmatrix}_{(v2)}=\begin{bmatrix}2\\1\\3\end{bmatrix}_{(b)}
$$
- Any mxn system Ax = b can be viewed as a vector equation
$$x_1v_1+x_2v_x + \cdots + x_nv_n = b$$
- $b$ is a linear combination of the columns $v_i$ of $A$, with coefficients $x_1,\dots,x_n$
- Has a solution if $b$ lies on the plane
- Find a point in the plane $Ax$ closest to $b$
- $\widetilde{b} = A\widetilde{x}$
- Residual vector $b - \widetilde{b} = b - A\widetilde{x}$
- $(b - A\widetilde{x})\perp Ax$
- .
$$
(Ax)^T(b-A\widetilde{x}) = 0 \forall x\\
\implies x^TA^T(b-A\widetilde{x}) = 0\\
\implies A^T(b-A\widetilde{x}) = 0\\
\implies (A^TA)\widetilde{x} = A^Tb = 0
$$
## Popular methods
- Euclidean length(2-norm)
$$
\|r\|_2=\sqrt{r_1^2+r_2^2+\dots+r_m^2}
$$
- Squared error
$$
\text{SE} = r_1^2+r_2^2+\dots+r_m^2
$$
- Root mean squared error(RMSE)
$$
\text{RMSE}=\sqrt{\text{SE}\over m}
$$
## A survey of models
- Linear: $y= c_1 + c_2x$
- Parabola: $y=c_1+c_2x+c_3x^2$
- others
- Periodic models 週期性的
- $y = c_1 + c_2 \cos2\pi t + c_3 \sin2\pi t$
- Exponential data
- $y=c_1e^{c_2t}$
- Applying the natural logarithm
$$\begin{aligned}
\ln y &= \ln(c_1e^{c_2t}) \\
\ln y &= \ln c_1+e^{c_2t} = \ln c_1 + c_2t\\
\text{let } k &= \ln c_1\\
\ln y &= k + c_2t
\end{aligned}
$$
- model linearization changes the least squares problem
- The original problems minimizes
$$
(c_1e^{c_2t_1}-y_1)^2+\cdots+(c_1e^{c_2t_m}-y_m)^2
$$
- The **lineared** problems minimizes
$$
(\ln{c_1}+c_2t_1-\ln{y_1})^2+\cdots+(\ln{c_1}+c_2t_m-\ln{y_m})
$$
## QR Factorization
- Orthogonal set
- vectors in which $v_i^Tv_j = 0$ whenever $i \neq j$
- Orthonormal set
- An orthogonal set of **unit vectors**
- Orthogonal matrix
- The column vectors from an orthogonal set
- Reduced QR factorization
$$
(A_1|\cdots |A_n) = (q_1|\cdots |q_n)
$$
- Full QR factorization
$$
(A_1|\cdots|A_n)=(q_1|\cdots|q_m)
\begin{bmatrix}r_{11}&r_{12}&\cdots&r_{1n}\\
&r_{22}&\cdots&r_{2n}\\
&&\ddots&\vdots\\
&&&r_{mn}
\end{bmatrix}
$$
- A = QR
- Q is orthogonal $\implies Q^{-1} = W^T$
- R is upper-triangular
- Given an $m\times n$ inconsistent system $Ax=b$
- $Ax=b$
- $QRx=b$
- $Q^{-1}QRx=Q^{-1}b=Q^Tb$
- $Rx=Q^Tb$ *directly using back substitution to solve* $x$
- QR vs. Normal equations
- avoid computing $A^TA$
## Gram-Schmidt method
## Gauss-Newton method
### Least squares
- The least squares solution $\tilde{x}$ of a **linear** system $A\tilde{x}=b$ minimizes the Euclidean norm of the residual $\|A\tilde{x}-b\|_2$
- two method to find $\tilde{x}$
- Normal equations
- QR factorization
## Gauss-Newton method
- Consider the system of $m$(nonlinear) equations in $n$ unknowns: **(m>n)**
$$
r_1(x_1,\cdots,x_n)=0\\
\vdots\\
r_m(x_1,\cdots,x_n)=0
$$
- Sum of the squares of the
$$
r_1(x)^2+r_2(x)^2+\cdots+r_m(x)^2
$$
1. set $x^0=$ initial vector,
2. **for** k = 0, 1, 2, 3...
-
$$
x_0=\text{initial vector}\\
x_{k+1}=x_k-\left(D_r(x_k)^TD_r(x_k)\right)^{-1}D_r(x_k)^Tr(x_k) \text{ for }k=1,2,3...\\
x_{k+1}=x_k+v_k
$$
-
$$
v_k=-\left(D_r(x_k)^TD_r(x_k)\right)^{-1}D_r(x_k)^Tr(x_k)\\
\left(D_r(x_k)^TD_r(x_k)\right)v_k=-D_r(x_k)^Tr(x_k)
$$
<!-- 註解 -->
*[Interpolation]: 內插
*[Extrapolation]: 外插
*[Approximation]: 近似