Author: , Ph.D. in Electrical Engineering, NCTU, R.O.C.
email: kccddb@gmail.com
Date: 20250304
Copyright: CC BY-NC-SA
Mathematics (logic, calculus, linear algebra, probability, statistics), Data Structure, Algorithm, DSP, Micro-processor and OS 是內功心法
基礎要紮實
network programming, embedded system 是戰鬥工法
所有矩陣運算要注意電腦計算錯誤與overflow的可能性
台灣之心愛護動物協會(Heart of Taiwan Animal Care, HOTAC)
有些圖取至 wiki, 感謝這些圖的創作者
簡單 整合 重要的 metric, norm, inner product, Schwarz inequality, norm 引入 相關性 等工程 物理等重要觀念
Set and define a function
We say is a metric space (賦距空間) with metric .
e.g., Euclidean Space.
Distance between and the subset
Applications:
,
is also a metric. (Minkowski Inequality) for good functions.
inner product on vector space :
vectors
or
Define norm:
triangle inequality
e.g.,
,
, where f, g real function on (interval)
Remark.
: Lebesque integral (如果在 I 區間Riemann integral存在, 則兩個相等)
e.g., Let , , , if , then (Riemann integral) and . Riemann integral 不存在.
Probability Measure and Conditional Expectation
Functions are Vectors
3. , where f, g are complex functions.
, norm
norm=, ,
space (平方可積分函數), space (平方和 有限 的數列) 用在 數位通訊, 量子力學 等
space is also a Hilbert space(特例: ).
這裡 是 Lebesque Integral
Definition. A Hilbert space is a complete inner product space.
科學與工程 最重要的就是 space. 有距離 與角度 的觀念
Schwarz inequality in a unit circle of the Euclidean plane.
抽象空間(甚至是函數) 角度, e.g., space
Define when , via Schwarz inequality
can not imply .
e.g., Gibbs phenomenon
Trace of a matrix
is a real matrix, . Then
complex numbers
The inner product of matrices is given by .
Let be a subfield of (complex numbers). Then is an inner product space.
證明不難, 請自學
Matrices form an inner-product space
e.g.,
(Lebesque integral) is a inner product on , where and provided exists .
For any interval , , the length (i.e., measure) of the interval =.
Schwarz Inequality:
with equality holding in the Cauchy–Schwarz Inequality if and only if and are linearly dependent.
The Schwarz inequality allows one to extend the notion of "angle between two vectors" to any real inner-product space.
Define when
代表 兩向量的相關性 (correlation)
Probability space
random variables, by Swartz inequality, we have the covariance inequality:
, since (Lebesque integral)
random vector
covariance matrix(dispersion matrix)
eigenspace
is a Hibert space, bounded linear operator, there exists an elelmet with the property that , then is the eigenvalue, eigenvector. The eigenvectors belonging to a given eigenvalue form the eigenspace (or the characteristic space of ) corresponding to .
Lemma. is a Hilbert space, , and , then the eigensapce of the eigensapce of .
Proof. Assume , . , which is possible only if .
Any covariance matrix is symmetric and positive semi-definite and its main diagonal contains variances.
A square matrix is called positive definite if it is symmetric and all its eigenvalues are positive, that is .
If is positive definite, then it is invertible and .
A real symmetric matrix is positive definite if and only if for every column in .
e.g.,
is positive-definite.
is not positive-definite in (and not in ), since , where .
is a real-valued symmetric positive-definite matrix.
Let is an inner product.
Rayleigh quotient
where
are respectively the smallest and largest eigenvalues of .
The geometric multiplicity of an eigenvalue is the dimension of the linear space of its associated eigenvectors (i.e., its eigenspace).
characteristic polynomial
, , where
null space
and
range space
geometric multiplicity of .
e.g.,
In multivariate statistics and probability theory, the scatter matrix is a statistic that is used to make estimates of the covariance matrix.
Scatter Matrix S
classification
Given n samples of m-dimensional data, represented as the matrix, , the sample mean is
The scatter matrix is the positive semi-definite matrix
Hermitian 矩陣特徵值的變化界定
Scatter matrix , Covariance and Correlation Explained
Scatter Plot
Applications:
數位通訊
機率統計
檢測與估計
machine learning
Assume , and (i.e., f,g 也有抽象夾角觀念),
dt =0
Define and in .
and
當一 vector space basis 就像 一樣, 有 距離, 內積 與 角度 觀念的 vector space.
Probability space
Ref. Probability measure and conditional expectation for EE
Assume is noise and (random processes, Stochastic Processes or random functions in ). Implicity, we consider (Hilbert Space)
Correlation Receiver ()~Matched Filter Receiver
Assume .
Notice that by Schwarz Inequality.
Assume or (random process,random sequence), .
and by inner product properity and Schwarz Inequality.
Let and
Maximizing SNR(Signal-to-noise ratio) (digital communication) to get good performance.
, the two sides are equal if and only if and are linearly dependent, i.e., , . Let , in
這是 基本的數位通訊Correlation Receiver or matched filter 的數學原理
Lineary Independent
A sequence of vectors from a vector space V is said to be linearly dependent, if there exist scalars not all zero, such that . A set of vectors is linearly dependent if and only if one of them is zero or a linear combination of the others.
A sequence of vectors is said to be linearly independent if it is not linearly dependent, that is, if the equation can only be satisfied by , .
The kernel of a linear map, also known as the null space or nullspace, is the linear subspace of the domain of the map which is mapped to the zero vector.
A linear map between two vector spaces and
the kernel of :
The image of :
is a subspace of , .
e.g.,
eigenvalues of : 1, 1, 1
, then
Why?
Assume , , where is a -dimentional vector space, is an matrix.
rank of = rank .
A sequence of vectors denotes the set of all (finite) linear combinations of the points in , i.e., the linear subspace spanned by (or linear span of A~span(A), 線性生成空間).
e.g., ,
Lemma.
Proof.
Let dim and be a basis of . Then choose such that is a basis of .
Claim is a basis of and dim. . Since , then .
Claim are linearly independent.
Suppose . Then or . There exists a such that . Since are linearly independent then . Hence are linearly independent and dim .
Corollary. Let be a linear transformation of (into itself). Then is one-to-on if and only if is onto.
Proof. . If , then . Hence ono-to-one. Conversly, it is trival. .
Corollary. matrix . Let denote the column vectors of . For , and . the maximal number of linearly independent columns of
Corollary. Let matrix and matrix B, we have , since , and .
Corollary. nonsingular (invertible), then .
Corollary. Let be an by matrix. Then iff rows of are linearly independent.
Corollary. matrices. , since
這就是您 線代 解 rank 的方法之原理.
Let be a Hilbert space, is a subset of . Define , the othogonal complemet of by
.
Closed set
A set in a metric space is a closed set if and only if if every convergent sequence with has it limit in .
e.g., , ,
, however , interrval (0,1) not closed set.
interval [0,1] is a closed set.
Projection theorem
Let be any closed linear subspace of a Hilbert space , Then (linear subspace generated by ). Moreover, each can be expressed uniquely , where and and .
Let , projection operator. Then
Let be a nonempty set in a linear space . The set is lineary independent if and only if for each in , there is one and only one finite subset of , say , and a unique tuple of nonzero scalars, , such that .
: coordinates
: basis
e.g., and .
Operator Norm
,
: projection operator, then and .
Definition. Let , be Hibert spaces, bounded linear operator. The adjoint operator (共軛算子) is defined by the equation
.
Remark (重要).
e.g.,
is normal, since and its eigenvalues are . is neither unitary, Hermitian, nor skew-Hermitian matrix.
is self-adjoint, and it can be checked (e.g., using the characteristic polynomial) that the eigenvalues of are .
is self-adjoint, and it can be checked (e.g., using the characteristic polynomial) that the eigenvalues of are .
Geometric multiplicity of is .
Let , then is the square root of the largest eigenvalue of the symmetric matrix , i.e., .
Why?
e.g.,
Then
, which has eigenvalues , so
The normed space of all bounded linear operators from the into is denoted .
Lemma 1. Hilbert spaces, and , then .
Lemma 2. 類似您學過的矩陣運算 Hilbert spaces.
a Hilbert space, , is said to be self-adjoint operator (自共軛算子) if .
軛:在車衡兩端扼住牛、馬等頸背上的曲木。
e.g.,
e.g.,
, , transpose of .
e.g.,
is a self-adjoint positive definite operator.
e.g.,
Fredholm (Volterrra) integral equation of the first kind
Let and define , and on , where is called kernel function. Check . Then .
FACT(重要):
Every linear operator definited on a finite dimentional normed space is compact(bounded operator, and so continuous).
For any matrix , the eigenvalues of are aways real and
non-negative (in other words, and are positive definite).
Proof. Assume and , then .
Singular Value Decomposition(重要)
Any matrix can be decomposed as and and are unitary matrices. The matrix is “diagonal,” with non-negative elements on the main diagonal. The non-zero elements of are called the singular values of , and satisfy .
Proof. Assume . Since is Hermitian, there exists a diagonal matrix such that .
Let . Then . Construct by choosing the columns in so that is unitaary and . , i.e., .
Proposition.
Let and be Hilbert spaces with inner product and respectively. Let be a boundded linear operator. Define the adjoint so that holds for all and all .
self-adjoint bounded linear operator on a real Hilbert space . is said to be positive semidefinite (positive definite) if , for all .
e.g.,
is a probability space, i.i.d. and
The covariance matrix is semi-positive definite and symmetric.
Proof: Positive semi-definiteness of the covariance matrix
Hence , the eigenvalue if is positive definite.
HOME WORK. Prove it.
Fact. (需要 Hahn-Banach Theorem, 對EE太難)
Sample covariance matrix Q
For a sample of vectors , where is the sample size. Sample meam is
Sample covariance matrix is
Assume and , then .
Sample covariance matrix is always positive semi-definite.
random process 經由 Karhunen–Loève theorem 也可得covariance matrix 用於 Principal component analysis
實務上 會有
MIT OpenCourseWare:
Singular values
Matrix Perturbations
Dynamic Systems And Control
Generalized eigenvalue problem
線代啟示錄
Geometric Analysis of Operator (投影運算子的幾何分析特性)
這觀念一定要懂, 這也是 Hilbert space 的重要 觀念, 也 很多證明 對 你們 太難, 我用 最簡單的 的 來解釋, 沒有 很好 的數學 應是 一堆 聽不懂的 "這很正常" 別 自責
Null space
is a linear transformation,
Range space
is an orthongonal projection on a Hilbert space , then
Hence, .
Let pojection operators, if , , and , then .
Assume , , , if . Let .
寫到 "連續" 學生就不懂了, 至少 有限維度(矩陣) 與 bounded (物理與工程應用) 是對的
Assume continuious projection operator (e.g., finite dimentional projection operator, bounded operator, ODE, …), for any vector , there exits and
Eigenvalue-Eigenvector
has a solution , (eigenvalue), then (eigenvector) and
Intuitively, an eigenvector is a vector whose direction remains unchanged when a linear transformation is applied to it.
Riesz Representation Theory
is a Hilbert space and let be a bounded linear functional () on . Then there is one and only one vector such that
for all .
e.g., , . Then is a linear functional and .
Hilbert Space 的連續線性泛函都可以表示成內積。
Remark. A linear operator is bounded if and only if it is continuous.
以電路而言, 積分器 , 有沒有看出 SNR 的重要性而且用 power?
space
is an inner product on
Probability space
sample space
is the -field of subsets of sample space
probability measure with
, is an event
is a random variable(-measurable function from ).
, are random variables with and
is an inner product
is an inner product
is the norm of
Markov's inequality
If X is a nonnegative random variable and a > 0,
If is a monotonically increasing nonnegative function for the nonnegative reals, X is a random variable, , and , then
e.g.,
Let , then Chebyshev's inequality.
顯示了隨機變數的「幾乎所有」值都會「接近」平均
函數 數列 隨機變數 也可看成 vector
is a norm if .
correlation coefficient
correlation coefficient=
:::
Linear independence
Projection operator
basis
orthonormal basis
Independent random variables
…
請自行看 linear algebra, probability 書~寫起來很長, 若引入 measure theory 可以寫一本書了, 對一般資電又太難, wiki 沒有寫得很好
擴充 為 random vector:
If and ,
and if (orthonormal basis, )
then
.
, one-to-one mapping
e.g., QAM (Quadrature amplitude modulation)
~64 QAM (Quadrature amplitude modulation) 原理
Why?
Wi-Fi 6/6E 1024 QAM
5th generation mobile networks (5G) 512-QAM or 1024-QAM
4G 256-QAM or 64-QAM
Assume if and . Then , where and
one-to-one mapping
Homework:
Hint:
estimation and prediction 基礎
用到機率 covariance matrix
因中文翻譯不同 請用 英文
is a random variable and , if
e.g., if
Jensen's inequality
,
(convex, concave 有些是反過來的 請自行確認)
If is a random variable and is a convex function, then
e.g.,
e.g.,
Let be traffic rate and bandwidth, then denotes the average loss rate. Assume iid samples, .
If the function is concave, then
Jensen's inequality (Random vector )
Let be a convex function on a convex set , is a random vector with , then , exists and .
linear time-invariant (LTI) system and convolotion
Let , i.e., for EE (In fact, this is not the Riemann integral).
Then , by LTI properties.
Notice that is incorrent if the system is not time-invariant.
Nyquist–Shannon sampling theorem— If a function
contains no frequencies higher than hertz, then it can be completely determined from its ordinates at a sequence of points spaced less than , i.e., sample rate
Ref. Principles of Digital Communication, by Robert G. Gallager
這本書 有難度, 沒老師教 不容易懂, 而且數學要有一定程度, 有用到 Lebesgue integration, space, entropy, Markov chain, Viterbi algorithm ( dynamic programming algorithm) …
Let be the impulse responce . Assume exists.
Assume is the band-limited input process of the LTI sysyem,
then the output porcess
,
by LTI properties.
We define as the convolution of and .
Discrete time
Assume
is the discrete time band-limited input process of the LTI sysyem.
Let if and if .
Let
by LTI properties.
Assume discrete time impulse response is given by ,
we define as the convolution of and .
Then is the output sequence of the LTI system by LTI properties.
Discrete Fourier transform (DFT)
complex numbers, DFT of
form an orthogonal basis over the set of -dimensional complex vectors.
Claim , i.e., , where if and if .
Parseval's theorem
If we define DFT via , then we have
In practice, we can choose ~
There are two main types of DFT errors: "aliasing" and “leakage"
Increase sampling rate +Window Function…
183–FFT系列:甚麼是洩漏leakage?窗函數window如何改善洩漏leakage?
n-dimentional DFT
,
2D Convolution Animation(by Michael Plotke)
Image Filtering and Edge Detection
Z-transform
The Z-transform converts a discrete-time signal, which is a sequence of real or complex numbers, into a complex frequency-domain (the z-domain or z-plane) representation.
Bilateral Z-transform
Unilateral Z-transform
inverse Z-transform
where is a counterclockwise closed path encircling the origin and entirely in the region of convergence (ROC).
In control theory, a causal system (also known as a physical or nonanticipative system) is a system where the output depends on past and current inputs but not future inputs—i.e., the output depends only on the i[nput for values .
Image Analysis by Andrew Zisserman
Q:
Assume input signal is and , find a LTI filter to maximize SNR in after the signal to the recever over a AWGN channel. Please see books, wiki or 通訊實驗matched filter結報…
A: matched filter.
Idea: if , where is a white noice. (Amplication is not good, because the noice for LTI system.)
is also the same result.
這是數位通訊 的 基本原理 例如 QAM 等, 原來 你學的乘法器,積分器 是在進行 inner product, 所以很多研究所 入學考試 有 線性代數 ~因為重要
已知, 只要 傳送 接收端 能收到 , 可還原
SNR (Signal-to-noise ratio) Why?
Parseval's Theorem for Hilbert space (e.g., )
Define Fourier Transform of a real function g(x) as
, where means frequency (not angular fequency).
Fourier inversion integral:
F: frequency domain
T: time domain
Then we have
and
.
is a unitary operator.
, since let , .
Time scaling
Let , if , then
Differentiation
multi-dimensional Fourier Transform
Fourier inversion integral:
,
where
Plancherel theory
The Fourier transform on very nice functions is an -isometry: more precisely, for any
, we have
conservation of energy
Parseval's Theorem states that the total energy computed in the time domain must equal the total energy computed in the frequency domain. It is a statement of conservation of energy.
matched filter, SNR, AWGN channel
Assume is a zero meam wide-sense stationary(WSS) random process. The auto-correlation function is given by .
Let , is the power spectral density of .
Let is the output random process and the input process of a LTI system is , which is a WSS random process, then is also a WSS random process.
Principal Component Analysis (PCA) 主成分分析
Karhunen–Loève theorem(Karhunen-Loève Expansion)
is a random process. , , covariance function , . Let be a linear operator .
Since is a linear operator, it makes sense to talk about its eigenvalues and eigenfunctions , which are found solving the homogeneous Fredholm integral equation of the second kind
The covariance function satisfies the definition of a Mercer kernel.
An important special case
Remark. A symmetric real matrix
is said to be positive-definite if for all . Assume , then , we have .
e.g.,
is positive definite.
eigenvalue's geometric multiplicity
The geometric multiplicity of an eigenvalue is the dimension of the linear space of its associated eigenvectors (i.e., its eigenspace).
The dimension of the eigenspace associated with , or equivalently the maximum number of linearly independent eigenvectors associated with , is referred to as the eigenvalue's geometric multiplicity . Because E is also the nullspace of , the geometric multiplicity of is the dimension of the nullspace of , also called the nullity of , which relates to the dimension and rank of as
By Mercer's theorem(對EE 太難), there consequently exists a set of eigenvalues and eigenfunctions of forming an orthonormal basis of , and can be expressed as
.
converge in
. Furthermore,
and
Sorting the eigenvalues in decreasing order , where
Implementing a Principal Component Analysis (PCA) 主成分分析
Discrete Fourier Transform(DFT)
, form an orthogonal basis over the set of N-dimensional complex vectors.
,
is the DFT of .
and mathematical optimization
Proof.
Assume , a hyperplane with a normal vector at is given by .
Hence defines a tanget plane. is perpendicular to the level curve at .
Q.E.D.
您可能要先假設存在性(困難的部分), 方法是對的 (wiki 寫的不一定是對的, 還有最好看英文的)
Lagrange multiplier
. Find the local maxima and minima of via
Let
. Lagrange multiplier
Taylor series with remainder
The Taylor polynomials for only provide accurate approximations in the range . For , Taylor polynomials of higher degree provide worse approximations.
, where .
, if there exists a linear functional and such that , where and .
.
A linear functional on a real vector space is a function , which satisfies the following properties
Taylor Polynomials of Functions of Two Variables
這本好書只適合 數學好的資電人士 (博士班以上)
Gâteaux derivative:
Optimization by Vector Space Methods: Luenberger, David G.
Gateaux derivative
Optimization by Vector Space Methods: Luenberger, David G.
An affine transformation preserves:
collinearity between points: three or more points which lie on the same line (called collinear points) continue to be collinear after the transformation.
parallelism: two or more lines which are parallel, continue to be parallel after the transformation.
convexity of sets: a convex set continues to be convex after the transformation. Moreover, the extreme points of the original set are mapped to the extreme points of the transformed set.
ratios of lengths of parallel line segments
barycenters (center of mass) of weighted collections of points
比較好的書是 (會用到 functional analysis):
Optimization by Vector Space Methods, David G. Luenberger
The Laplacian of : . The Laplacian is invariant under all Euclidean transformations: rotations and translations.
邊緣偵測 - 拉普拉斯算子 ( Laplacian Operator )
OpenCV 基礎篇-邊緣檢測(edge detection)
Statistical Inference, detection and estimation
Least-Squeres Estimate and Projection Theorem
is am vector and matrix with linearly independent columns . Then there is a unique vector which minimize over all and the estimator if the inverse matrix exists.
Proof.
By projection theorem and is linearly independent, is the projection onto the span of , where projection operator. Let , then , .
Hence . If exists, then .
Gauss-Markov estimate
An symmetric real matrix is said to be positive-definite if for all non-zero in .
is a r.v., e.g., ~
, and . Assume is positive definite.
Proof.
. Then is an inner product.
Hence we can find the estimator by projection theorem.
A square matrix
is Hermitian if and only if it is equal to its adjoint, that is, it satisfies , i.e., , .
e.g., Covariance (共變異數, 又稱協方差) matrix
square matrix
is Hermitian if and only if it is unitarily diagonalizable with real eigenvalues.
A matrix is positive-definite iff is symmetric or Hermitian, and all its eigenvalues are real and positive.
HW. Why?
The estimator is said to be unbiased if and only if
Kalman Filter (recursive estimation)
Probability space
n-dimensional dynamic model of a random process on
Ref. Optimization by Vector Space Methods. David G. Luenberger
證明重點 就是 Projection Theorem 式子很長不想打字了
convex, concave 於優化時local minimum, local maximun 存在性 很重要 (注意 有些文章兩個 相反)
distribution function 有些左連續 有些右連續
concave function
convex set
non-convex set
Let f be a function of many variables defined on a convex set . Then is convex if for all , for all and all we have
likelihood: the chance that something will happen
In probability theory, odds provide a measure of the likelihood of a particular outcome, e.g., ,
convex function on
convex function on
log-convex
A function is log-convex if , D is a convet set and , i.e., is convex on .
Key Point:
A function of many variables defined on a convex set is convex if and only if the set of points on or above its graph is convex: is convex.
Jensen's inequality
A function of many variables defined on a convex set is convex if and only if for all for all and all with . Let , then
3.3 Concave and convex functions of many variables
Bernoulli sampling
is a binary random variable with a- probabilities and .
The receiver of a communication system receives a random variable (maybe continuous random variable).
denotes the estimator of .
The conditional densities , are called . . The a- probability of is given by (Bayes' theorem).
Let and threshold .
is called the .
if .
if .
This is called a .
: maximum likelihood estimator
Probability of error
Hence,
實際上 要注意 Arithmetic Overflow
is small,
The log likelihood ratio, is convenient for use with Gaussian noise with .
,
Hence,
space
e.g., and . Assume , where . iid random sequence with distribution.
MAP (Maximum A Posterior Estimation)
e.g., , i.e., at
Assume IID,
, since IID.
Ref. Principles of Digital Communication, by Robert G. Gallager
Sufficient statistic 充分統計量
A statistic is a function of the random sample .
In statistics, a statistic is sufficient with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter"
Roughly, given a set of independent identically distributed data conditioned on an unknown parameter , a sufficient statistic is a function whose value contains all the information needed to compute any estimate of the parameter (e.g., a maximum likelihood estimate).
A statistic is sufficient for underlying parameter θ precisely if the conditional probability distribution of the data , given the statistic , does not depend on the parameter .
e.g., sample mean
Suppose we have a statistical model, parameterized by a real number , giving rise to a probability distribution for observed data, , and a statistic which serves as an estimator of based on any observed data .
where denotes expected value over the distribution . An estimator is said to be unbiased if its bias is equal to zero for all values of parameter .
e.g.,
sample mean : unbiased estimator.
sample variance I : biased estimator, since
sample variance II unbiased estimator.
statistic
HW. Find , if iid and .
A:
Hint:
if and only
statistic
Nonparametric statistics(無母數統計)
Nonparametric statistics is a type of statistical analysis that does not rely on the assumption of a specific underlying distribution (such as the normal distribution), or any other specific assumptions about the population parameters (such as mean and variance).
Fisher–Neyman factorization theorem
If the probability density function is , then is sufficient for if and only if nonnegative functions and can be found such that i.e. the density can be factored into a product such that one factor, , does not depend on and the other factor, which does depend on , depends on only through .
e.g.,
(a) A statistical hypothesis.
(b) A test of a hypothesis against an alternative hypothesis and the associated concept of the critical region of the test.
© The power of a test.
Definition 1. Let C be that subset of the sample space which, in accordance with a prescribed test, leads to the rejection of the hypothesis under consideration. Then C is called the critical region of the test.
Definition 2. The power function of a test of a statistical hypothesis against an alternative hypothesis is that function, defined for all distributions under consideration, which yields the probability that the sample point falls in the critical region of the test, that is, a function that yields the probability of rejecting the hypothesis under consideration. The value of the power function at a parameter point is called the power of the test at that point.
Definition 3. Let denote a hypothesis that is to be tested against an alternative hypothesis in accordance with a prescribed test. The significance level of the test (or the size of the critical region C) is the maximum value (actually supremum) of the power function of the test when is true.
Type II error (false negative, 偽陰性): the test result says you don’t have coronavirus, but you actually do.
偽陽性(明明沒確診卻誤判為有)和偽陰性(明明確診卻誤判為沒有)
(英語:False positives and false negatives)
type I error , type II error
Neyman-Pearson Theorem
Let denote a random sample from a distribution with density function . Then the joint p.d.f. of is . Let , . Let be a subset of the sample space such that:
Then is the best critical region of size for the simple hypothesis against the
Example
If , then
The ciritical region
Assume samples, , and ~.
(type I error), hence given , one can find and find the probabilty
If , then ~, ~ and the power of of the best test of again is 0.05 when is true. ~ when is true.
Type II error:
Proof.
If there is another critical region of size , then
claim
Since at each point of and at each point of , then
. Hence
If the random variables are of the discrete type, the proof is the same, with integration replaced by summation.
Type I & Type II Errors | Differences, Examples, Visualizations
Neyman-Pearson Lemma: Definition
Introduction to Mathematical Statistics
A First Course on Statistical Inference
Confidence interval(信賴區間)
Confidence Intervals
Understanding the Confusion Matrix (II)
Ref. Introduction to Mathematical Statistics, R. V. Hogg and A. T. Craig
Successive Approximation
Taylor's formula with remainder
TAYLOR’S THEOREM WITH REMAINDER
以前下課時間 很喜歡 問學生一問題 events A, B probability independent . Why?
通常得到 有些說 定義, 很多茫茫然的眼神~
A:
實驗次數
發生 event 的次數, 發生 event 的次數
發生 的次數
,
如果 event 的發生與否 不會 影響 對 event 的統計, i.e.,
, when (i.e., P(A|B)=P(A) conditional probability)
, as
Hence,
events A, B probability independent
Probability measure by CrazyDog
還有 微積分 最重要的定理是什麼? Why?茫茫然…
A:
微積分基本定理(Fundamental theorem of calculus)
orthonormal basis 用在除了線性代數外你們哪門 課? 茫茫然…
is a orthonormal basis of ( or ) space, then
, .
Simulation:
Assumme is a random variable, .
Let (generalized inverse function) , where .
Assume is a uniform random variable on .
Then .
e.g., exponential distribution, .
Successive Approximation, Newton's Method, Fixed Point Theorem
Successive Approximation 重要的是 選擇 的初始值 能否 收斂 到 "正確解" 與 收斂 的效率
Definition. A sequence ${x_i } in a normed space is said to be a Cauchy sequence if as .
Remark. If , then .
Definition. A normed linear vector space is complete if every Cauchy sequence from has a limit in . A complete normed linear vector space is called a Banana space.
Remark. Hilbert space is complete.
contraction mapping
is a normed space, . Then is a contraction mapping if there exists such that for all .
Contraction Mapping Theorem
is a Hilbert (or Banach) space, and is a closed set of . There is a unique satisfying . Furthermore, can be obtained by the method of Successive Approximation starting from an arbitary initial point in .
Proof. Select an arbitrary element .
Let , then . Therefor, . Then
Hence . is a Cauchy sequence. Since is closed and is complete, there exists a such that .
Claim
Proof.
. By appropriate choice of , the above inequality can be made arbitrarily small. Thus ; .
Claim is unique.
Proof.
Assume and are fixed points. Then . Thus .
Newton's Method
is a better approximation than for the root of the function (blue curve)
wiki
Iteration typically improves the approximation
wiki
Solving an equation of the form .
Let . A fixed point of is the solution of .
至於 擴充至 Hilbert (Banana) space, 對 EE 太難
Exterior algebra
Let be a vector space over a field .
Let be a basis of the vector space . Definition. The exterior algebra is the vector space with basis formal symbols
where a ≥ 0 and is a strictly increasing sequence. The exterior product is
where in the first case the sequences and have an element in common. is called exterior product (or wedge product).
e.g.,
The cross product (blue vector) in relation to the exterior product (light blue parallelogram). The length of the cross product is to the length of the parallel unit vector (red) as the size of the exterior product is to the size of the reference parallelogram (light red).
The Cartesian plane is a real vector space equipped with a basis consisting of a pair of unit vectors .
vectors, real numbers.
, , .
平行四邊形 面積 =| ad-bc |
Such an area is called the signed area of the parallelogram.
Let then . Let ,
Determinant
平行六面體的體積= | det |
vectors,.
oriented volume of =
Let be an n-dimentional real vector space.
A form of degree 1 (or a 1-form) is a linear function , i.e., , where and .
2 form is a function on pairs of vectors , which is bilinear and skew symmetric.
e.g., oriented area is a 2 form.
k form is a function of k vectors which is k-linear and antisymmetric.
不懂 permulation 的請回憶 行列式
e.g., oriented volume on
Differentiable manifold (微分流形)
A chart is an open set in the enclidean coordinate space together with a one-to-one mapping of onto some subset of , .
A set is given the structure of a differential manifold if is provided with a finite or countable collection of charts, so that every point is represented in at least one chart.
1-form on the tanget space
Let be a differentiable function on the manifold (imagine a function of many variables ). The differential of at is a linear map of the tangent space to at into the real line.
tangent bundle
Definition. A 1-form on a manifold is a smooth map of the tangent bundle of to the line, linear on each tangent space .
Differential k-form
Definition. A differential k-form at a point of a manifold is an exterior k-form on the tangett space to at , i,e, a k-linear skew-symmetric function of k vectors tanget to at .
Theorem. Every differential k-form on with a given coordinate system can be written uniquely in the form where the are smooth functions on .
In mathematics, a field is a set on which addition, subtraction, multiplication, and division are defined and behave as the corresponding operations on rational and real numbers do.
建構式數學教的是最基本的數學觀念 不是錯~只是 對於 國高中小 是考試與教育走火入魔, 感嘆~ 就像 兩個事件獨立一樣~對學生 叫做 定義
GF(2) GF:Galois field
The modulo-2 addition can be
implemented with an XOR () gate and the
modulo-2 multiplication can be implemented
with an AND () gate.
The set {0,1} together with modulo-2 addition
and multiplication is called a binary field ,
denoted GF(2).
Remark.
Vector Space over GF(2)
A binary n-tuple
is an ordered sequence with components from GF(2).
Define an addition operation for any two binary
vector , where =XOR.
Define a scalar multiplication between an element in GF(2) and a binary n-tuple as
follows: , where = AND.
The set together with the addition defined for any two binary n-tuple in and the scalar multiplication defined between an element in GF(2) and a binary n-tuple in is called a vector space over GF(2).
Hamming distance
HW:
Show that Hamming distance is a metric.
An (n,k) convolutional encoder (CE) will encode a k-bit input block into an n-bit ouput block, which depends on the current input block and the m preceding input blocks
A convolutional encoder is a discrete linear time-invariant system.
Remark. Why LTI system?
input of CE CE
input of CE CE
input of CE CE and
CE ,
since the convolutional encoder is finite state machine (FSM) with initial state . Assume , then CE is a LTI system. Hence we can define impulse response of the CE.
(2,1) convolution encoder:
Assume discrete time impulse response is given by and over GF(2).
input sequence=
output and
output sequence=
Decoding convolutional codes:
Viterbi algorithm…by VLSI or Software
Statistical Inference (統計推論)
Formally, the partial derivative with respect to of the natural logarithm of the likelihood function is called the score.
=score=.
e.g.,
, iid.
In general, , is called the parameter space.
Maximum likelihood estimator (MLE)
Therefore, MLE can be estimated by solving or if it exists, since log(.) is a monotone increasing function and
, .
HW.
The expected value (the first moment) of the score, evaluated at the true parameter value , is 0.
The Fisher information is defined to be the variance of the score.
Proof.
score vector
the log-likelihood function
score vector
Under mild regularity conditions, the expected value of the score is equal to zero:
Score Function and Fisher Information Matrix
Fisher information
The information matrix is the covariance matrix of the score.
Fisher information matrix (FIM)
If is an unbiased estimator of
then the Cramér–Rao bound reduces to
Cramér–Rao bound (Rao-Cramér inequality, Rao-Cramér lower bound)
Score Function and Fisher Information Matrix
Let with . Then is said to be congruent to modulo , denoted (同餘).
Every integer greater than has a prime divisor.
If is a composite number(合成數), there exist such that , e.g.,
Let be a composite number(合成數). Then has a prime divisor with .
不是質數也不是 composite number. Every positive integer can be written as a product of prime numbers. e.g., . In general, , if and prime numbers.
, not all zero. If , are said to be relative prime. If for all pairs $i,j $ with , then are said to be pairwise relatively prime.
A set of integers such that every integer is congruent modulo to exactly one integer of the set is said to be a complete residue(餘數) system modulo .
e.g.,
is a complete residue system modulo .
The set of integers {1,5} is a reduced residue system modulo 6.
,
Definition: A arithmetric function is a function whose domain is the set of positive integer.
An arthmetric function (算術函數) is said to be multiplicate if where and are relative prime positive numbers.
the least common multiple (最小公倍數) of two integers and = the smallest positive integer that is divisible by both and
Definition. Let . A congruence of the form is said to be a linear congruence in the variable (一元線性同餘方程).
Definition. Let with and . The order of a modulo , denoted , is the least positive integer for which .
Definition. Let with and . Then is said to be a primitive root (原根) modulo if .
e.g.,
is a primitive root modulo since .
is a primitive root modulo ,
is not a primitive root modulo , since
Theorem. Let and . If , then the congruence has no solution in ; if , then the congruence has exactly incongruent solutions modulo in .
Proof. Elementary Number Theory, chap 2.2, by STRAYER.
Definition. Any solution of the linear congruence in one variable to be said to be a (multiplicative) inverse of a modulo .
e.g., the inverse 5 mod 16= 13, since
瑞士數學家 Euler 大師就是大師 腦袋不知裝時麼 寫下這 重要的理論 Public key, Priviate key 就是用到這定理
Euler's Theorem.
Let with . If , then , where Euler phi-function (Eulur's totient function), denoted , .
Remark.
1. cardinality of the set (the number of elements of the set ). is the number of positive integers less than or equal to that are relatively prime to . e.g.,
2. , if is a prime number.
4. two prime numbers, for , . Euler phi-finction is multiplicative.
Proof.
Consider the matrix , if , then no entry of th-row is relatively prime to ( ), since if ,then and . Hence . Consider the row with , , each of this intergers is relatively prime to . of these integers are relatively prime to . Since integers are relatively prime to , they are relatively prime to . The residue system modulo is . The residues that are not relatively prime to are the set , the set and 0. Hence
Proof. Let and . Consider . Since , the inverse of =. Hence if with then , . This is impossible. So the least non-negative resdiues modulo of the integers . Then . Hence , since , we have
.
Compute Euler Phi-Function:
Theorem. Let be a prime number and let , then
Proof. The positive integers not exceeding that are not relatively prime to are , hence .
Theorem. Let , then .
e.g.,
,
Proof. Assume and with distinct prime number and postive integers. Then since Euler phi-finction is multiplicative. Thus $
The RSA Encryption/Decryption Scheme
Let and be distinct odd prime number (typically, very large), let and let be a positive integer such that . The ordered pair is the public key. Let , then is the private key.
Encryption
block to a new block
Decryption
block to block
Remark. , , by Euler's theorem.
Example.
select two prime numbers, p=7 and q=17
n=pq=119
select is relative prime to , e.g., .
and , then .
public key = {5,119}
private ket ={77,119}
Diffie-Hellman Key Exchange Algorithm
Global Public Emements
prime number, and is primitive root of
User A Key Generation
Select priviate ,
Calculate Public ,
User B Key Generation
Select priviate ,
Calculate Public , .
Generation of Secret Key by User A
Generation of Secret Key by User B
Ref. Cryptography and Network Security, by William Stallings
Ref. Introduction to Higher Mathematics, by Patrick Keef and David Guichard.
中國餘數定理, 孫子定理, 韓信點兵 (Chinese Remender Theorem)
Let be pairwise relative prime positive numbers and let be any integers. The system of linear congruences in one variables given by
…
has a unigue solution modulo .
Proof.
Existence
Let and let . Then since they are pairwise relative prime positive numbers. Thus has a soultion for each . Form for each .
Uniqueness
Let be another solution. We have . . Then for all . Then , from which .
e.g.,
, , , , , then
Tensor Product of Hilbert Spaces
Let and be Hilbert spaces over a filed , e.g., . For each , , let dentote conjugate (for complex number) bilinear form which acts on (tensor product) by
Let be the set of finite linear conbinations of such conjugate biliear forms; we define an inner product on by defining and extending by linearity to (tensor: a multilinear mapping, multidimensional array…)
Fact 1. is well defined and positive definite.
Fact 2. If and are orthonormal bases for and , then is an orthonormal basis for .
Then we have tensor product: .
Fact 3.
Ref. Functional Analysis, M. Reed, B. Simon
Appendix
如果您數學程度不錯, 大數學家( Soviet and Russian mathematician) Vladimir Arnold 的書不錯(需 數學系程度)
"將數學嚴謹性與物理直覺相結合的清晰的寫作風格"
有助 別人理解
Mathematical Methods of Classical Mechanics, by V. I. Arnold
(此書也要有數學研究所程度, 適合 要念研究所與進攻博士的人)
大學部線代與機率推薦書(好像都是 數學系的~要有觀念不是計算, 計算與應用是幫助理解):
Linear Algebra:
Introduction to Linear Algebra, Gilbert Strang
Probability:
A First Course in Probability, Sheldon Ross
Ref.
Mathematical Methods and Algorithms for Signal Processing, by Todd Moon (Author), Wynn Stirling (Author)
Appendix 有簡潔的 measure theory
這書適合有心念研究所的同學 或 想加強自己能力的 資電人員, 我以前 用過 這本 教 estimation and prediction theory
Stochastic Processes, by Sheldon M. Ross (沒有用到 measure theory, 適合 EE, 資工 統計 應數 研究所)
Linear Operator Theory in Engineering and Science (Applied Mathematical Sciences, 40), by Arch W. Naylor (Author), George R. Sell (Author)
Real Analysis and Probability
Probability and Mathematical Statistics, by Robert B. Ash
(數研所程度, 較深)
有空再來 寫 神奇的 Dirac delta function
當時
Riemann integral, Lebesque integral 是不行的
天才物理學家,量子力學的奠基者之一 Paul Dirac
還有一位 Évariste Galois, Coding 的原理就是 年輕的 他 奠立的~現稱 Évariste Galois
年輕 20歲 鬥劍 過世
後來才有數學理論:
Schwartz distribution or generalized function
這三本書 需要數學系以上的基礎
Applied Functional Analysis (Dover Books on Mathematics) Reprint 版本
作者 D.H. Griffel (Author)
這對 EE 比較容易懂, Dover Books 的書 很多是 沒有 版權問題的 好書 重印, 不貴, 這本就是 評價 很高的書, 要修 數研所 的課 沒那麼簡單