Try   HackMD

Metric Space,Vector Space,Normed Space,Inner Product Space,Hilbert Space

Author:

CrazyDog, Ph.D. in Electrical Engineering, NCTU, R.O.C.
email: kccddb@gmail.com
Date: 20250304

Copyright: CC BY-NC-SA



Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

站在巨人的肩膀

腦中真書藏萬卷,掌握文武半邊天

Mathematics (logic, calculus, linear algebra, probability, statistics), Data Structure, Algorithm, DSP, Micro-processor and OS 是內功心法

基礎要紮實

network programming, embedded system 是戰鬥工法

所有矩陣運算要注意電腦計算錯誤與overflow的可能性

家扶基金會

台灣之心愛護動物協會(Heart of Taiwan Animal Care, HOTAC)

有些圖取至 wiki, 感謝這些圖的創作者

學而不思則罔 思而不學則殆 學而後思則悟

簡單 整合 重要的 metric, norm, inner product, Schwarz inequality,

Lp norm 引入
cos(θ)
相關性 等工程 物理等重要觀念

Metric Space (賦距空間)

Set

X and define a function
d:(X,X)R

  1. d(x,y)0
  2. d(x,y)+d(y,z)d(x,z)
  3. d(x,y)=d(y,x)
  4. d(x,y)=0
    iff
    x=y

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

We say

(X,d) is a metric space (賦距空間) with metric
d(.,.)
.
e.g., Euclidean Space.

Distance between

xX and the subset
AX

d(x,A)=lim inf{d(x,y):yA}

Applications:

dp(x,y)=[|(x(t)y(t))|pdt]1/p,
p=1,2,3,...

is also a metric. (Minkowski Inequality) for good functions.

inner product

<.,.> on vector space
X
:
x,y,z
vectors
X

<.,.>:XX
R
or
C

  1. <x+y,z>=<x,z>+<y,z>
  2. <αx,y>=α<x,y>
  3. <x,y>=<y,x>
  4. <x,x>≠0
    when
    x0

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

Define norm:

||x||=<x,x>
0

||x+y||p||x||p+||y||p triangle inequality

e.g.,

  1. x,yRn,
    <x,y>=i=1nxiyi

  2. <f,g>≐Ifg, where f, g real function on
    I=[a,b]
    (interval)
    Remark.

    Ifg
    : Lebesque integral (如果在 I 區間Riemann integral存在, 則兩個相等)
    e.g., Let
    I=[0,1]
    ,
    f(x)=x2
    ,
    g(x)=0
    , if
    xA={0,1/2,1/22,...}
    ,
    g(x)=1IA
    then
    If=If(x)dx
    (Riemann integral) and
    Ig=1
    . Riemann integral
    g(x)dx
    不存在.

Probability Measure and Conditional Expectation
Functions are Vectors
3.

<f,g>≐fg, where f, g are complex functions.

||f||<f,f><,
L2
norm

Lp norm=
(|f|p)1p
,
1p<
,
pN

L2 space (平方可積分函數),
l2
space (平方和 有限 的數列) 用在 數位通訊, 量子力學 等

L2 space is also a Hilbert space(特例:
L2,l2
).

這裡

是 Lebesque Integral
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Definition. A Hilbert space is a complete inner product space.

科學與工程 最重要的就是

L2,l2 space. 有距離 與角度 的觀念

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Schwarz inequality in a unit circle of the Euclidean plane.

抽象空間(甚至是函數) 角度, e.g.,

L2 space

Define

cos(θ)=<x,y>||x||||y||1 when
||x||||y||>0
, via Schwarz inequality

||fnf||L20 can not imply
fnf
.

e.g., Gibbs phenomenon

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Trace of a matrix

A is a
n×n
real matrix,
tr(A)i=1nA[i,i]
. Then

tr(A+B)=tr(A)+tr(B)
tr(αA)=α tr(A)

tr(AT)=tr(A)

complex numbers

The inner product of matrices is given by

tr(BA).

Let be a subfield of

F (complex numbers). Then
Mm,n(F)
is an inner product space.

證明不難, 請自學

Matrices form an inner-product space

e.g.,

<f,g>=If(t)g(t)dt (Lebesque integral) is a inner product on
L2(I)
, where
I=[a,b]
and provided
Ifg
exists .

For any interval

I=[a,a+δt]R,
δt>0
, the length (i.e., measure) of the interval =
δt
.

Schwarz Inequality:

<x,y>≤||x||||y||

with equality holding in the Cauchy–Schwarz Inequality if and only if

x and
y
are linearly dependent.

The Schwarz inequality allows one to extend the notion of "angle between two vectors" to any real inner-product space.

Define

cos(θ)=<x,y>||x||||y||1 when
||x||||y||>0

cos(θ) 代表 兩向量的相關性 (correlation)

Random Variables

Probability space

(Ω,F,P)

X,Y random variables, by Swartz inequality, we have the covariance inequality:
<X,Y>≐E[XY]
, since
E[XY]X(ω)Y(ω)dP
(Lebesque integral)
cov(X,Y)=<XE[X],YE[Y]>

var(X)=E[(XE[X])2]

var(X)var(Y)cov(X,Y)2

X=[X1,...,Xn] random vector

covariance matrix(dispersion matrix)

COVX(i,j)=E[[XiE[Xi][XjE[Xj]]

eigenspace

H is a Hibert space,
U:HH
bounded linear operator, there exists an elelmet
x00
with the property that
Ux0=λx0
, then
λ
is the eigenvalue,
x0
eigenvector. The eigenvectors belonging to a given eigenvalue
λ
form the eigenspace (or the characteristic space of
U
)
Hλ
corresponding to
λ
.

Lemma.

H is a Hilbert space,
U:HH
,
Ux=λ1x
and
Ux=λ2x
,
λ1λ20
then the eigensapce of
λ1
the eigensapce of
λ2
.

Proof. Assume
Ux=λ1x
,
Uy=λ2y
.
<x,y>=1λ1<Ux,y>=λ2λ1<x,y>
, which is possible only if
<x,y>=0
.

Rayleigh quotient

Any covariance matrix is symmetric and positive semi-definite and its main diagonal contains variances.

A square matrix is called positive definite if it is symmetric and all its eigenvalues

λ are positive, that is
λ>0
.

If

A is positive definite, then it is invertible and
detA>0
.

A real symmetric matrix

A is positive definite if and only if
xTAx>0
for every column
x0
in
Rn
.

e.g.,

A=(210121012) is positive-definite.

N=(1111) is not positive-definite in
C2
(and not in
R2
), since
zNz=2+2i
, where
z=[1,i]T
.

M is a real-valued
n×n
symmetric positive-definite matrix.
Let
<x,My>Rn=<x,y>M=xTMy
is an inner product.

Rayleigh quotient

R(M,x)=<x,Mx>||x||2

R(M,x){λmin,λmax} where
λmin,λmax
are respectively the smallest and largest eigenvalues of
M
.

The geometric multiplicity of an eigenvalue is the dimension of the linear space of its associated eigenvectors (i.e., its eigenspace).

characteristic polynomial

fA(λ)=det(λIA)

A:RnRn,
dim(N(λIA))+dim(R(λIA))=n
, where
null space
N(L)={xX:Lx=0}

and
range space
R(L)={y=Lx:xX}

geometric multiplicity of
λ=dim(N(λIA))
.

e.g.,
In multivariate statistics and probability theory, the scatter matrix is a statistic that is used to make estimates of the covariance matrix.

M=COVX

Scatter Matrix S

classification

scatter1

Given n samples of m-dimensional data, represented as the

m×n matrix,
x=[x1,...,xn]
, the sample mean is
x¯=1ni=1nxi

The scatter matrix is the
m×n
positive semi-definite matrix

S=i=1n(xix¯)(xix¯)T

Hermitian 矩陣特徵值的變化界定
Scatter matrix , Covariance and Correlation Explained
Scatter Plot


Applications:

數位通訊
機率統計
檢測與估計
machine learning


Assume

I=[0,2π],
<f,g>=If(t)g(t)dt
and
cos(θ)=<f,g>||f||||g||1
(i.e., f,g 也有抽象夾角觀念),

02πcos(ωt)sin(ωt)dt =0

Define

ex=cos(ωt)/||cos(ωt)|| and
ey=sin(ωt)/||sin(ωt)||
in
[0,2π]
.

cos(θ)=<ex,ey>=0 and
||ex||=||ey||=1

ex,ey 當一 vector space basis 就像
R2
一樣, 有 距離, 內積 與 角度 觀念的 vector space.

Probability space

(Ω,F,P)
Ref. Probability measure and conditional expectation for EE
Assume
n
is noise and
<f,g>=0,f,gL2(Ω[0,T])
(random processes, Stochastic Processes or random functions in
[0,T]
). Implicity, we consider
f,gL2 or l2
(Hilbert Space)

Correlation Receiver (

<f,g>=0Tfg)~Matched Filter Receiver

Assume

<f,g>=0.

  1. <f+n,f>=<f,f>+<n,f>=||f||2+<n,f>
  2. <f+n,g>=<f,g>+<n,g>=<n,g>

Notice that

<f,h>≤||f||||h|| by Schwarz Inequality.

Assume

f,gL2 or
l2
(random process,random sequence),
<f,g>≐E[0Tfgdt]=E[sample path integral]
.
<n,f>≤||n||||f||
and
<n,f||n||>≤||f||
by inner product properity and Schwarz Inequality.

Let

n=<n,h> and
||n||=N=E<

Maximizing​ SNR(Signal-to-noise ratio)
||<f,h>||2||n||2=average signal power×Taverage noise power×T
(digital communication) to get good performance.

||<f,h>||||f||||h||, the two sides are equal if and only if
f
and
h
are linearly dependent
, i.e.,
h=αf
,
α0
. Let
g(t)=f(Tt)
,
fg=<f,f>
in
[0,T]

這是 基本的數位通訊Correlation Receiver or matched filter 的數學原理


Lineary Independent
A sequence of vectors

v1,v2,...,vk from a vector space V is said to be linearly dependent, if there exist scalars
a1,a2,...,ak
not all zero, such that
i=1kaivi=0
. A set of vectors is linearly dependent if and only if one of them is zero or a linear combination of the others.

A sequence of vectors

v1,v2,...,vk is said to be linearly independent if it is not linearly dependent, that is, if the equation
i=1kaivi=0
can only be satisfied by
ai=0
,
i=1,2,...,k
.

The kernel of a linear map, also known as the null space or nullspace, is the linear subspace of the domain of the map which is mapped to the zero vector.
A linear map

L:VW between two vector spaces
V
and
W

the kernel of

L:
ker(L)={vV|L(v)=0}L1(0)

The image of

L:
img(L){L(v):vV}

W is a subspace of
V
,
img(L(W))={L(v):vW}
.

e.g.,

B=(100010001)

ker(B1I)3=ker(000000000)=R3
img(B)=R3

eigenvalues of
B
: 1, 1, 1

Klker(AλI)l, then
0K1K2K3...

Why?

rank

Assume

f:VmVn,
f:xAx,xVm
, where
Vi
is a
i
-dimentional vector space,
A
is an
n×m
matrix.

rank of
f
= rank
Adim f(Vm)
.

A sequence of vectors

V(A) denotes the set of all (finite) linear combinations of the points in
A
, i.e., the linear subspace spanned by
A
(or linear span of A~span(A), 線性生成空間).

e.g.,

A={x1,x2,...xn},
span(A)={x:x=i=1nλixi, λiR}

Lemma.

dimf(Vm)+dim ker(f)=m

Proof.

Let dim

f1(0)=mr and
{ur+1,...,um}
be a basis of
f1(0)
. Then choose
ui,1ir
such that
{u1,...ur,...,um}
is a basis of
Vm
.
Claim
f(u1),...f(ur)
is a basis of
f(Vm)
and dim
f(Vm)=r
.
f(Vm)=span{f(u1),...,f(ur),...,f(um)}
. Since
f(ur+1)=...=f(um)=0
, then
f(Vm)=span{f(u1),...,f(ur)}
.

Claim

f(u1),...,f(ur) are linearly independent.

Suppose

i=1αif(ui)=0. Then
f(i=1αiui)=f(αiui)=0
or
i=1rαiuif1(0)
. There exists a
αi,r+1im
such that
i=1rαiui+i=r+1mαiui=0
. Since
ui,1im
are linearly independent then
αi=0,i=1,...,m
. Hence
f(u1),...,f(ur)
are linearly independent and dim
f(Vm)=r
.

Corollary. Let

f be a linear transformation of
Vn
(into itself). Then
f
is one-to-on if and only if
f
is onto.

Proof.

f(x1)=f(x2)f(x1x2)=0. If
f1(0)={0}
, then
x1=xx
. Hence
f
ono-to-one. Conversly, it is trival.
f1(0)={0}
dim f1(0)=0dim f(Vn)=nf(Vn)=Vn
.

Corollary.

n×m matrix
A=(aij)
. Let
a1,...,am
denote the column vectors of
A
. For
xVm,x=[x1,...xm]T
,
f(x)=Ax=ixiai
and
f(Vm)=span{a1,...,am}
.
rank A=
the maximal number of linearly independent columns of
A

Corollary. Let

n×m matrix
A
and
m×l
matrix B, we have
rankABrank A,rank B
, since
rank AB=dim ABVldim AVm=rank A
, and
rank ABBVl=rank B
.

Corollary.

P,Q nonsingular (invertible), then
rank PAQ=rank PA= rank AQ=rankA
.

Corollary. Let

A be an
m
by
n
matrix. Then
rank(A)=m
iff rows of
A
are linearly independent.

Corollary.

A,B
n×m
matrices.
rank (A+B)rank A+rank B
, since
rank (A+B)=dim(A+B)Vmdim(AVm+BVm)dimAVm+dimBVm=rank A+rank B

這就是您 線代 解 rank 的方法之原理.

Let

H be a Hilbert space,
M
is a subset of
H
. Define
M
, the othogonal complemet of
M
by

M={xH:<x,y>=0 for all yM}.

Closed set
A set

A in a metric space
(X,d)
is a closed set if and only if if every convergent sequence
{xn}
with
xnA
has it limit in
A
.

e.g.,

an=11n+1,
n=1,2,....
,
limnan=1

an(0,1)
, however
1(0,1)
, interrval (0,1) not closed set.

interval [0,1] is a closed set.

Projection theorem
Let

M be any closed linear subspace of a Hilbert space
H
, Then
H=M+M
(linear subspace generated by
{M,M}
). Moreover, each
xH
can be expressed uniquely
x=m+n
, where
mM
and
nM
and
||x||2=||m||2+||n||2
.

Let

P^x=m,
P^
projection operator. Then
<P^x,n>=0

Let

A be a nonempty set in a linear space
X
. The set is lineary independent if and only if for each
x0
in
V(A)
, there is one and only one finite subset of
A
, say
{x1,x2...xn}
, and a unique
n
tuple of nonzero scalars,
{a1,a2,...an}
, such that
x=a1x1+...+anxn
.

{a1,a2,...an}: coordinates
{x1,x2...xn}
: basis

Hilbert projection theorem

e.g.,

L2 and
l2
.

Operator Norm

A:XY
||A||p=supx0||Ax||p||x||p
,
1p

  1. ||A+B||||A||+||B||
  2. ||AB||||A||||B||
  3. ||A||=||A||

P: projection operator, then
P2=P
and
||P||=1
.

  1. <x,y>=<y,x>

Definition. Let

X,
Y
be Hibert spaces,
A:XY
bounded linear operator. The adjoint operator (共軛算子)
AYX
is defined by the equation

<x,Ay>X=<Ax,y>Y.

Taxonomy_of_Complex_Matrices.svg

Remark (重要).

  1. The adjoint may also be called the Hermitian conjugate.
  2. Hermitian matrix (or self-adjoint matrix)
    A=[aij]=[aji¯]
  3. A square matrix
    A
    is Hermitian if and only if it is unitarily diagonalizable with real eigenvalues, i.e.,
    A=QΛQ
    (spectral decomposition of A), where
    Λ
    is a diagonal matrix.
  4. A:HH
    is called self-adjoint if
    A=A
    .
  5. A:HH
    is a bounded
    self-adjoint operator iff
    <x,Ax>=<Ax,x>=<x,Ax>
    is real valued for all
    xH
    , since
    <x,y>=<y,x>
  6. T:HH
    is a boundded linear transformation.
    T
    is said to be normal if
    TT=TT
    .
  7. If
    T
    self-adjoint, then it is normal.
  8. A square matrix
    UCn×n
    is unitary if
    UU=UU=I
    .
  9. A square matrix
    URn×n
    is orthogonal if
    UTU=UUT=I
    .

e.g.,

A=(110011101)

is normal, since

AA=AA and its eigenvalues are
2,1±i32
.
A
is neither unitary, Hermitian, nor skew-Hermitian matrix.

A=(21+i1i3)

A is self-adjoint, and it can be checked (e.g., using the characteristic polynomial) that the eigenvalues of
A
are
1, 4
.

B=(100010002)
B
is self-adjoint, and it can be checked (e.g., using the characteristic polynomial) that the eigenvalues of
B
are
1, 1, 2
.
Geometric multiplicity of
λ=1
is
2
.

Let

T(x)=Ax, then
||T||
is the square root of the largest eigenvalue of the symmetric matrix
ATA
, i.e.,
||T||=sup{λ:λ eigenvalue of AA}
.
Why?

e.g.,

A=(200302)
Then
ATA=(1306000604)

, which has eigenvalues
{0,1,16}
, so
||A||=4

The normed space of all bounded linear operators from the

X into
Y
is denoted
B(X,Y)
.
Lemma 1.
X,Y,Z
Hilbert spaces,
SB(X,Y)
and
TB(Y,Z)
, then
||TS||||T||||S||
.

Lemma 2. 類似您學過的矩陣運算

X,Y,Z Hilbert spaces.

  1. If
    A1,A2B(X,Y)
    ,
    (A1+A2)=A1+A2
    .
  2. If
    αR
    ,
    (αA)=αA
  3. A1B(X,Y)
    ,
    A2B(Y,Z)
    then
    (A2A1)=A1A2
    .
  4. If
    A1
    exists, then
    (A1)=(A)1
    .
  5. (A)=A

H a Hilbert space,
AB(X,X)
,
A
is said to be self-adjoint operator (自共軛算子) if
A=A
.
||AA||||A||||A||=||A||2

AAx=λx

軛:在車衡兩端扼住牛、馬等頸背上的曲木。

e.g.,

Covariance matrix

e.g.,

A:RnRn,
AT=A=A
,
AT
transpose of
A
.

e.g.,

B=(100010002)
B
is a self-adjoint positive definite operator.

e.g.,
Fredholm (Volterrra) integral equation of the first kind
Let

X=Y=L2[0,1] and define
Ax=01k(t,s)x(s)ds
,
t[0,1]
and
||k(t,s)||<
on
[0,1][0,1]
, where
k(t,s)
is called kernel function. Check
<Ax,y>=x(t)k(s,t)y(s)dsdt
. Then
Ay=k(s,t)y(s)ds
.

FACT(重要):

  1. Every linear operator definited on a finite dimentional normed space is compact(bounded operator, and so continuous).

  2. For any matrix

    A, the eigenvalues of
    AA
    are aways real and
    non-negative
    (in other words,
    AA
    and
    AA
    are positive definite).
    Proof. Assume
    <x,x>=1
    and
    <Sx,x>=<x,Sx>=λ=<Sx,x>=λ
    , then
    0<<Ax,Ax>=xAAx=λ<x,x>
    .

  3. Singular Value Decomposition(重要)
    Any matrix

    ACm×n can be decomposed as
    A=UΣV
    and
    UCm×m
    and
    VCn×n
    are unitary matrices. The matrix
    ΣCm×n
    is “diagonal,” with non-negative elements on the main diagonal. The non-zero elements of
    Σ
    are called the singular values of
    A
    , and satisfy
    σ=ith eigenvalue of AA
    .
    Proof. Assume
    rank(A)=m
    . Since
    AA
    is Hermitian, there exists a diagonal matrix
    Λ=diag(λ1,...,λm)=diag(σ12,...,σm2)=Σ1>0
    such that
    UΛU=AA
    .
    Let
    V1=Σ11UA
    . Then
    V1V1=I
    . Construct
    V=[V1,V2]Cn×n
    by choosing the columns in
    V2
    so that
    V
    is unitaary and
    Σ=[Σ1,0]Rn×n
    .
    ΣV=Σ1V1=UA
    , i.e.,
    A=UΣV
    .

Proposition.
Let

H1 and
H2
be Hilbert spaces with inner product
<.,.>1
and
<.,.>2
respectively. Let
L:H1H2
be a boundded linear operator. Define the adjoint
L:H2H1
so that
<y,Lx>2=<Ly,x>1
holds for all
xH1
and all
yH2
.

  1. L
    is a boundded linear operator.
  2. ||L||=||L||
  3. L=L
  4. U:H1H2
    is unitary iff
    UU=I1
    and
    UU=I2
    , where $I_i is the identity operator on
    Hi
    .

A self-adjoint bounded linear operator on a real Hilbert space
H
.
A
is said to be positive semidefinite (positive definite) if
<x,Ax> 0(>0)
, for all
xH
.

e.g.,

(Ω,F,P) is a probability space,
XiL2(P)
i.i.d.
E[Xi2]>0
and
E[Xi]=0

(sample) covariance matrix (重要)

The covariance matrix

[KXiXj]=[E[XiXj]] is semi-positive definite and symmetric.

Proof: Positive semi-definiteness of the covariance matrix

Hence

[KXiXj]V=λV, the eigenvalue
λ>0
if
[KXiXj]
is positive definite.

HOME WORK. Prove it.

Fact.

||A||=||A|| (需要 Hahn-Banach Theorem, 對EE太難)

Sample covariance matrix Q

For a sample of vectors

xi=(xi1,xi2,...,xik)T,i=1,...,n, where
n
is the sample size. Sample meam is
x¯=1ni=1nxi

Sample covariance matrix is
qik=1ni=1n(xix¯)(xix¯)T

Assume

y0Rk and
Q=[qij]
, then
yTQy=1ni=1n((xix¯)y)20
.

Sample covariance matrix is always positive semi-definite.

L2 random process 經由 Karhunen–Loève theorem 也可得covariance matrix 用於 Principal component analysis

實務上 會有
MIT OpenCourseWare:
Singular values
Matrix Perturbations
Dynamic Systems And Control
Generalized eigenvalue problem
線代啟示錄


Geometric Analysis of Operator (投影運算子的幾何分析特性)

這觀念一定要懂, 這也是 Hilbert space 的重要 觀念, 也 很多證明 對 你們 太難, 我用 最簡單的 的 來解釋, 沒有 很好 的數學 應是 一堆 聽不懂的 "這很正常" 別 自責

Null space

N(L)
L:XY
is a linear transformation,
N(L){xX:Lx=0}

Range space
R(L)

R(L){y=Lx:xX}

P:HH is an orthongonal projection on a Hilbert space
H
, then

  1. P is a identity operator on
    R(P)
    and
  2. P is the zero operator on
    N(P)
    .

Hence,

H=R(P)+N(P).

Let

{P1,P2,...Pm} pojection operators,
PiPj=0
if
ij
,
Pi0
, and
I=i=1mPi
, then
H=i=1mR(Pi)
.

Assume

λiC,
i=1,2,..m
,
λiλj
, if
ij
. Let
L=λiPi
.

寫到 "連續" 學生就不懂了, 至少 有限維度(矩陣) 與 bounded (物理與工程應用) 是對的

Assume

Pi continuious projection operator (e.g., finite dimentional projection operator, bounded operator, ODE, ), for any vector
xH
, there exits
xiR(Pi)
and
Lx=i=1mλixi

Eigenvalue-Eigenvector

λxLx=0 has a solution
λi
,
i=1,...,m
(eigenvalue), then
xiR(Pi)
(eigenvector) and
R(Pi)=N(λiIL)

Lx=λx
Intuitively, an eigenvector is a vector whose direction remains unchanged when a linear transformation is applied to it.

Riesz Representation Theory

H is a Hilbert space and let
I
be a bounded linear functional (
||l||<
) on
H
. Then there is one and only one vector
yH
such that

I(x)=<x,y> for all
xH
.

e.g.,

xL2[a,b], yL2[a,b],
f(x)=[a.b]y(t)x(t)dt
. Then
f
is a linear functional and
f=<x,y>
.

Hilbert Space 的連續線性泛函都可以表示成內積。
Remark. A linear operator is bounded if and only if it is continuous.


以電路而言,

<f,g> 積分器 , 有沒有看出 SNR 的重要性而且用 power?

l2 space
x={xk},y={yk},k=1,2,...,n

<x,y>=xiyi
is an inner product on
l2


Probability space

(Ω,F,P)

Ωϕ sample space
F
is the
σ
-field of subsets of sample space
Ω

P:F[0,1]
probability measure with
P(ϕ)=0

AF
,
AΩ
is an event

X is a random variable(
F
-measurable function from
ΩR
).

P(Xx)=Prob{ωΩ|X(ω)x}

X,
Y
L2(Ω,F,P)
are random variables with
E[X2]<
and
E[Y2]<

<X,Y>≐E[XY] is an inner product
<XEX,YEY>≐E[(XE[X])E(YE[Y])]
is an inner product

||X||<X,X> is the norm of
X

Markov's inequality

If X is a nonnegative random variable and a > 0,

P(Xa)E[X]a

If

ϕ is a monotonically increasing nonnegative function for the nonnegative reals, X is a random variable,
a0
, and
ϕ(a)>0
, then

P(Xa)E[ϕ(X)]ϕ(a)

e.g.,

P(|X|a)E|X|nan

Let

Y=XE[X], then Chebyshev's inequality.

顯示了隨機變數的「幾乎所有」值都會「接近」平均

函數 數列 隨機變數 也可看成 vector

E[X2] is a norm if
E[X2]<
.

correlation coefficient

XE[X],YE[Y]

correlation coefficient=

<XEX,YEY>||XEX||||YEY||
:::

Hamming distance is a metric

Linear independence
Projection operator
basis
orthonormal basis
Independent random variables


請自行看 linear algebra, probability 書~寫起來很長, 若引入 measure theory 可以寫一本書了, 對一般資電又太難, wiki 沒有寫得很好

擴充 為 random vector:

If

fL2 and
f=a1e1+a2e2+...anen
,
||ei||=1
and
<ei,ej>=0
if
ij
(orthonormal basis,
i=1,2,3...
)
then
<f,ei>=ai
.
fonetoonea=(a1,a2,...,an)
, one-to-one mapping

QAM16_Demonstration
Rectangular_constellation_for_QAM.svg

e.g., QAM (Quadrature amplitude modulation)

gi=ai1e1+ai2e2
gi(ai1,ai2)

g=(g1,g2,...g64)
~64 QAM (Quadrature amplitude modulation) 原理
Why?

Wi-Fi 6/6E 1024 QAM
5th generation mobile networks (5G) 512-QAM or 1024-QAM
4G 256-QAM or 64-QAM


Assume

f=(f1,f2,...fm) if
fiL2
and
fi=j=1naijej
. Then
fT=AeT
, where
e=(e1,e2,...en)
and
A=[aij]

f[aij] one-to-one mapping


Homework:

aij=?
Hint:
<fi,ek>

estimation and prediction 基礎

用到機率 covariance matrix
因中文翻譯不同 請用 英文

Xi is a random variable and
E[Xi]=0
,
X=(X1,X2,...Xm)
if
XiL2

e.g.,

X=XE[X] if
E[X]0


Concave vs. Convex

Jensen's inequality

t[0,1],
f(t1x1+(1t1)x2)t1f(x1)+(1t1)f(x2)

​​​​(convex, concave 有些是反過來的 請自行確認)

If

X is a random variable and
ϕ
is a convex function, then

ϕ(E[X])E[ϕ(X)]

e.g.,

  1. ϕ(x)=(xC)+=max{xC,0}
    ,
    C0
    .
    ϕ(x)
    is a increasing convex function on
    [0,)

e.g.,
Let

X be traffic rate and
C
bandwidth, then
E[ϕ(X)]
denotes the average loss rate. Assume
X1,...,Xn
iid samples,
i=1nϕ(Xi)ΔtnΔtE[ϕ(X)]
.

geogebra-export

  1. ϕ(x)=ln(1x)
    increasing convex function on
    [0,1)

geogebra-export

  1. ϕ(x)=ln(x)
    decreasing convex function on
    (0,1]

If the function

ϕ is concave, then

ϕ(E[X])E[ϕ(X)]

Jensen's inequality (Random vector

X)
Let
ϕ
be a convex function on a convex set
CRn
,
X
is a random vector with
P(XC)=1
, then
E[X]C
,
E[ϕ(X)]
exists and
ϕ(E[X])E(ϕ(X))
.


linear time-invariant (LTI) system and convolotion

Let

<δ(t),f(t)>≐f(0), i.e.,
f(0)=f(t)δ(t)dt
for EE (In fact, this is not the Riemann integral).

Then

f(t)=<δ(τt),f(τ)>, by LTI properties.

Notice that

f(t)=<δ(τt),f(τ)> is incorrent if the system is not time-invariant.


Nyquist–Shannon sampling theorem— If a function

x(t) contains no frequencies higher than
W
hertz, then it can be completely determined from its ordinates at a sequence of points spaced less than
12W
, i.e., sample rate
fs>2W

Ref. Principles of Digital Communication, by Robert G. Gallager

這本書 有難度, 沒老師教 不容易懂, 而且數學要有一定程度, 有用到 Lebesgue integration,

L2 space, entropy, Markov chain, Viterbi algorithm ( dynamic programming algorithm)


Let

h(t) be the impulse responce
δ(t)
. Assume
h(t)
exists.

Assume

f(t) is the band-limited input process of the LTI sysyem,
then the output porcess
g(t)=<h(tτ),f(τ)>=<h(τ),f(tτ)>

=h(τ)f(tτ)dτ=h(tτ)f(τ)dτ
,
by LTI properties.

We define

fh(t)=<h(tτ),f(τ)> as the convolution of
f
and
h
.

Convolution3.svg

Convolution_of_spiky_function_with_box2

fLTI system: h(t)fh

(fg)h=f(gh)
fh=hf

f(g+h)=fg+fh

Discrete time

Assume

f(n) is the discrete time band-limited input process of the LTI sysyem.

Let

δ(n)=1 if
n=0
and
δ(n)=0
if
n0
.

Let

<δ(n),f(n)>≐f(0)>

f(n)=<δ(mn),f(m)> by LTI properties.

δ(n) LTI system h(n)
f(n) LTI system: h(n)fh(n)

Assume discrete time impulse response is given by

h(n),
we define
fh(n)=<h(nm),f(m)>
as the convolution of
f
and
h
.

Then

fh(n) is the output sequence of the LTI system by LTI properties.

Discrete Fourier transform (DFT)

x={x0,x1...,xN1} complex numbers, DFT of
x Xk=i=0N1xie2πjkn/N

xn=1Nk=0N1Xke2πjkn/N

uk=e2πkn/N,k=0,...,N1 form an orthogonal basis over the set of
N
-dimensional complex vectors.
Claim
<uk,uj>=Nδkj
, i.e.,
<ukN,ujN>=δkj
, where
δkj=1
if
k=j
and
δkj=0
if
kj
.

DFT_2sin(t)_+_cos(4t)_25_points.svg

Parseval's theorem

<x,y>=1N<X,Y>=1NkXkYk

If we define DFT via

1Nk=0N1Xke2πjkn/N, then we have
<x,y>=<X,Y>

In practice, we can choose

N~
2n

There are two main types of DFT errors: "aliasing" and “leakage"

Increase sampling rate +Window Function

183FFT系列:甚麼是洩漏leakage?窗函數window如何改善洩漏leakage?

n-dimentional DFT

x=[x1,x2,...,xn]
n=[n1,n2,...,nn]

k=[k1,k2,...,kn]

N=N1,N2,...,Nn
,
0ki<Ni

Fx(k)=n1=0N11...nn=0Nn1fx(n1,n2...,nn)exp(2πj(n1k1N1+...+nnknNn)).

fx(n)=1N1...Nnk1=0N11...kn=0Nn1exp(2πj(i=1n(niki/Ni)))
2D_Convolution_Animation

2D Convolution Animation(by Michael Plotke)

Image Filtering and Edge Detection

Z-transform

The Z-transform converts a discrete-time signal, which is a sequence of real or complex numbers, into a complex frequency-domain (the z-domain or z-plane) representation.

Bilateral Z-transform

X(z)=n=xnzn

Unilateral Z-transform

X(z)=n=0xnzn

inverse Z-transform

image
where
C
is a counterclockwise closed path encircling the origin and entirely in the region of convergence (ROC).
image

In control theory, a causal system (also known as a physical or nonanticipative system) is a system where the output depends on past and current inputs but not future inputs—i.e., the output

y(t0) depends only on the i[nput
x(t)
for values
tt0
.

Cauchy's integral formula


Image Analysis by Andrew Zisserman


Vector Integral Calculus


Q:
Assume input signal is

f(t),t[0,T] and
f(t)L2[0,T]
, find a LTI filter to maximize SNR in
[0,T]
after the signal to the recever over a AWGN channel. Please see books, wiki or 通訊實驗matched filter結報

A: matched filter.
Idea:

<f+n,g> if
g(t)=f(Tt)
, where
n
is a white noice. (Amplication is not good, because the noice for LTI system.)
l2[0,N1]
is also the same result.


這是數位通訊 的 基本原理 例如 QAM 等, 原來 你學的乘法器,積分器 是在進行 inner product, 所以很多研究所 入學考試 有 線性代數 ~因為重要

ei 已知, 只要 傳送
ai
接收端 能收到
ai
, 可還原
f


SNR (Signal-to-noise ratio) Why?

Parseval's Theorem for Hilbert space (e.g.,

L2,l2)

Define Fourier Transform of a real function g(x) as

S(f)(Φg)(f)g(ξ)e2πjfξdξ, where
f
means frequency (not angular fequency).

Fourier inversion integral:

g(t)=S(f)e2πjftdf

F: frequency domain
T: time domain

Then we have

||S(f)||F2=||g||T2

and

<g1,g2>T=<Φg1,Φg2>F=S1(f)S2(f)df.

Φ:LT2LF2 is a unitary operator.
Φ1=Φ
, since let
fi=Φ(gi)
,
<g1,Φ1(f2)>T=<Φ(g1),f2>F
.

Time scaling

Let

aR,a0, if
h(x)=g(ax)
, then
Φh(f)=1|a|(Φg)(fa)

Differentiation

Φ(f)=2πjfΦ(f)
Φ(f(n))=(2πjf)nΦ(f)

multi-dimensional Fourier Transform

S(f)=(Φg)(f)g(ξ)e2πjfξdξ

Fourier inversion integral:

g(x)=S(f)e2πjfxdf,

where

ab=a1b1+a2b2+...anbn

Plancherel theory
The Fourier transform on very nice functions

G(Rn) is an
L2
-isometry: more precisely, for any
f,gG(Rn)
, we have
<f,g>=<S(f),S(g)¯>

conservation of energy

Parseval's Theorem states that the total energy computed in the time domain must equal the total energy computed in the frequency domain. It is a statement of conservation of energy.

Parseval’s Theorem

matched filter, SNR, AWGN channel

Assume

X(t) is a zero meam wide-sense stationary(WSS) random process. The auto-correlation function is given by
RXX(τ)=RXX(t1t2)=E[X(t1)X(t2)]
.

Let

PX(f)=Φ(RXX)(f),
PX(f)
is the power spectral density of
X(t)
.

Let

Y(t) is the output random process and the input process of a LTI system is
X(t)
, which is a WSS random process, then
Y(t)
is also a WSS random process.

Principal Component Analysis (PCA) 主成分分析

Karhunen–Loève theorem(Karhunen-Loève Expansion)

X(t) is a
L2
random process.
X(t)L2(Ω,F,P)
,
E[X(t)]=0
, covariance function
KX(s,t)=E[X(s)X(t)]
,
s,t[a,b]
. Let
TKX:fTKXf=abKX(s,.)f(s)ds
be a linear operator
L2[a,b]L2[a,b]
.
Since
TKX
is a linear operator, it makes sense to talk about its eigenvalues
λk
and eigenfunctions
ek
, which are found solving the homogeneous Fredholm integral equation of the second kind

abKX(s,t)ek(s)ds=λkek(t)

The covariance function

KX satisfies the definition of a Mercer kernel.
An important special case

  1. KX(s,t)=KX(t,s)
    symmetric continuous function
  2. KX
    is said to be positive-definite if and only if
    K(xi,xj)cicj>0
    or all finite sequences of points
    x1,...,xn
    of
    [a,b]
    and all choices of real numbers
    c1,...,cn
    .

Remark. A

N×N symmetric real matrix
A
is said to be positive-definite if
xTAx>0
for all
xRN0
. Assume
Ax=λx
, then
xTλx>0
, we have
λ>0
.

e.g.,

I=|100010001| is positive definite.

eigenvalue's geometric multiplicity

The geometric multiplicity of an eigenvalue is the dimension of the linear space of its associated eigenvectors (i.e., its eigenspace).

The dimension of the eigenspace

E associated with
λ
, or equivalently the maximum number of linearly independent eigenvectors associated with
λ
, is referred to as the eigenvalue's geometric multiplicity
γA(λ)
. Because E is also the nullspace of
(AλI)
, the geometric multiplicity of
λ
is the dimension of the nullspace of
(AλI)
, also called the nullity of
(AλI)
, which relates to the dimension and rank of
(AλI)
as
γA(λ)=nrank(AλI)

By Mercer's theorem(對EE 太難), there consequently exists a set

λk,ek(t) of eigenvalues and eigenfunctions of
TKX
forming an orthonormal basis of
L2([a,b])
, and
KX
can be expressed as

  1. KX(s,t)=k=1λkek(s)ek(t).

  2. X(t)=k=1Zkek(t) converge in
    L2

  3. Zk=abX(t)ek(t)dt. Furthermore,

  4. E[Zk]=0 and
    E[ZiZj]=δijλj

  5. Sorting the eigenvalues in decreasing order

    α1α2,...,αn..., where
    αi{λ1,...,λn,...}

Implementing a Principal Component Analysis (PCA) 主成分分析

Discrete Fourier Transform(DFT)

uk=e2jπNkn,
n=0,1,...N1
form an orthogonal basis over the set of N-dimensional complex vectors.

x=[x0,x1,...xN1]T,
u=[u0,u1,...uN1]T

Xk=<x,u>=xTu is the DFT of
x
.


f(x) and mathematical optimization

Proof.
Assume

xRn, a hyperplane with a normal vector
n
at
n0
is given by
<xn0,n>=0
.

Hence

<xn0,f(x)>=0 defines a tanget plane.
f(x)
is perpendicular to the level curve
f(x)=c
at
n0
.
Q.E.D.

您可能要先假設存在性(困難的部分), 方法是對的 (wiki 寫的不一定是對的, 還有最好看英文的)
Lagrange multiplier

x=[x1,x2,...,xn],λ=[λ1,...,λm],gi(x)=0,i=1,...,m, g(x)=[g1,...,gm]. Find the local maxima and minima of
f(x)
via

Let

L(x,λ)=f(x)<λ,g(x)>

x,λL(x,λ)=0.
λ1,...,λm
Lagrange multiplier

Lagrange Multipliers~examples

Karush–Kuhn–Tucker conditions


Taylor series with remainder

LogTay.svg

The Taylor polynomials for

ln(1+x) only provide accurate approximations in the range
1<x1
. For
x>1
, Taylor polynomials of higher degree provide worse approximations.

f(x)=f(a)+f(a)(xa)+f(xa)2/2!+R(h), where
limh0R(h)=0
.

f:RnR, if there exists a linear functional
L:RnR
and
h:RnR
such that
f(x)=f(a)+L(xa)+h(x)||xa||
, where
limxah(x)=0
and
L=df(a)
.

df(a)(v)=i=1nfxivi=<f,v>.

A linear functional on a real vector space

V is a function
T:V>R
, which satisfies the following properties

  1. T(v+w)=T(v)+T(w)
    , and
  2. T(αv)=αT(v)
    .

Taylor Polynomials of Functions of Two Variables

這本好書只適合 數學好的資電人士 (博士班以上)
Gâteaux derivative:
Optimization by Vector Space Methods: Luenberger, David G.


Affine transformation 仿射轉換,

y=Ax+b

200px-Fractal_fern_explained
Geometric_affine_transformation_example

Gateaux derivative
Optimization by Vector Space Methods: Luenberger, David G.

An affine transformation preserves:

  1. collinearity between points: three or more points which lie on the same line (called collinear points) continue to be collinear after the transformation.

  2. parallelism: two or more lines which are parallel, continue to be parallel after the transformation.

  3. convexity of sets: a convex set continues to be convex after the transformation. Moreover, the extreme points of the original set are mapped to the extreme points of the transformed set.

  4. ratios of lengths of parallel line segments

  5. barycenters (center of mass) of weighted collections of points

比較好的書是 (會用到 functional analysis):
Optimization by Vector Space Methods, David G. Luenberger

The Laplacian of

f:
Δf=i22xif
. The Laplacian is invariant under all Euclidean transformations: rotations and translations.

邊緣偵測 - 拉普拉斯算子 ( Laplacian Operator )

OpenCV 基礎篇-邊緣檢測(edge detection)


Statistical Inference, detection and estimation

Linear regression(線性迴歸)


Least-Squeres Estimate and Projection Theorem

y is am
m
vector and
W=[w1,w2,...,wn]
m x n
matrix with linearly independent columns
(mn)
. Then there is a unique
n
vector
β^
which minimize
||yWβ||
over all
β
and the estimator
β^=(WTW)1WTy
if the inverse matrix exists.

Proof.

By projection theorem and

{w1,w2,...wn} is linearly independent,
z=p^y
is the projection onto the span of
{w1,w2,...wn}
, where
p^
projection operator. Let
y=x+z
, then
<wi,y>=<wi,z>
,
i=1,2,...,n
.
Hence
WTz=WTy=WWTβ^
. If
(WTW)1
exists, then
β^=(WTW)1WTy
.

Gauss-Markov estimate
An

n×n symmetric real matrix
M
is said to be positive-definite if
xTMx>0
for all non-zero
x
in
Rn
.

ϵ is a r.v., e.g.,
ϵ
~
N(0,σ2)

y=Wβ+ϵ,
E[ϵ]=0
and
Q=E[ϵϵT]
. Assume
Q
is positive definite.

Proof.

<x,y>Q:=xTQy=<x,Qy>. Then
<x,y>Q
is an inner product.
Hence we can find the estimator
β^
by projection theorem.

A square matrix

A
A
is Hermitian if and only if it is equal to its adjoint, that is, it satisfies
<Ax,y>=<x,Ay>
, i.e.,
A=AH
,
x,yCn
.
e.g., Covariance (共變異數, 又稱協方差) matrix

A square matrix

A is Hermitian if and only if it is unitarily diagonalizable with real eigenvalues.

A matrix

Q is positive-definite iff
Q
is symmetric or Hermitian, and all its eigenvalues are real and positive.
HW. Why?

The estimator is said to be unbiased if and only if

E[β^]=E[β]

Kalman Filter (recursive estimation)
Probability space

(Ω,F,P)
n-dimensional dynamic model of a random process on
L2(P)

  1. x(k):ΩRn,u(k):ΩRn
    ,
    x(k+1)=Φ(k)x(k)+u(k)
    ,
    Φ(k)n×n
    matrix,
    E[u(k)]=0
    and
    E[u(k)u(l)T]=Q(k)δkl
    .
  2. x(0)^
    initial estimate of the initial random vector
    x(0)
  3. P(0)=E[(x^(0)x(0))(x^(0)x(0))T]
  4. Measurement of the process, assumed to be of the form
    v(k)=M(k)x(k)+w(k)
    , where
    M(k)m×n
    matrix and
    w(k)
    m-dimentsional measurement error having zero mean and
    E[w(k)w(j)T]=R(k)δkj
    , where
    R(k)
    positive definite.
  5. x^(k|j)
    the optimal estimate of
    x(k)
    given the observation
    v(0),v(1),...v(j)
    . Let
    V(j)=
    span
    {v(0),v(1),...,v(j)}
  6. covariance matrix
    P(k)=E[x^(k|k1)x[k]](x^(k|k1)x[k])T]
  7. Assume
    u(k)
    is orthogonal to
    v(k)
    and
    x(k)
    .
  8. If
    Y1,Y2L2(P)
    , the projection of a vector
    β
    onto
    Y1+Y2=Y1Y^2
    , where
    Y2^
    is orthogonal to
    Y1
    . Hence we can find the projection
    β
    onto the subspace of
    Y1+Y2
    , denoted
    β^
    .
  9. The optimal estimate of
    Φ(k)x(k)
    is
    Φ(k)x^(k|k)
    , since
    u(k)
    is orthogonal to
    v(k)
    and
    x(k)
    .
  10. Get the optimal estimate
    x^(k+1)|k)
    of
    x(k+1)=Φ(k)x^(k)+u(k)
    , given the oservation
    v(k)
    .

Ref. Optimization by Vector Space Methods. David G. Luenberger

證明重點 就是 Projection Theorem 式子很長不想打字了


convex, concave 於優化時local minimum, local maximun 存在性 很重要 (注意 有些文章兩個 相反)

distribution function 有些左連續 有些右連續

ConcaveDef

concave function


convex function

Convex_polygon_illustration1.svg
convex set

330px-Convex_polygon_illustration2.svg
non-convex set

Let f be a function of many variables defined on a convex set

S. Then
f
is convex
if for all
xS
, for all
yS
and all
α (0,1)
we have
f(αx+(1α)y)αf(x)+(1α)f(y)

300px-ConvexFunction.svg

Convex_vs._Not-convex
Grafico_3d_x2+xy+y2

maximum likelihood test

likelihood: the chance that something will happen

In probability theory, odds provide a measure of the likelihood of a particular outcome, e.g.,

odds=p1p,
p(0,1)

desmos-graph (1)

ln(1x)=x+x22+x33+... convex function on
[0,1)

desmos-graph (1)

ln(x) convex function on
(0,1]

log-convex
A function

f(x):DRnR is log-convex if
f(θx+(1θ)y)f(x)θf(y)1θ
, D is a convet set and
0<θ<1
, i.e.,
logf(x)
is convex on
D
.

Key Point:

A function

f of many variables defined on a convex set
S
is convex if and only if the set of points on or above its graph is convex:
{(x,y):xS and yf(x)}
is convex.

Jensen's inequality
A function

f of many variables defined on a convex set
S
is convex if and only if for all
n2
f(λ1x1+...+λnxn)λ1f(x1)+...+λnf(xn)
for all
x1S,...,xnS
and all
λ10,...,λn0
with
λi=1
. Let
yi=f(xi)
, then
f(<λ,x>)≤<λ,y>

3.3 Concave and convex functions of many variables

Bernoulli sampling

B(k)=pk(1p)nk
f(k)=ln(B(k))=kln(p)+(nk)ln(1p)

p=0.1,n=100,f(k)

desmos-graph (2)

g(x)=xk(1x)nk

n=50,k=6,g(x)
desmos-graph (2)


S is a binary random variable with a-
priori
probabilities
P(S=a0)=p0
and
P(S=a1)=p1
.
The receiver of a communication system receives a random variable
S
(maybe continuous random variable).

S^ denotes the estimator of
S
.

The conditional densities

fR|S(R=r|ai),i{0,1}, are called
likelihoods
.
fR(r)=p0fR|S(r|a0)+p1fR|S(r|a1)
. The a-
posteriori
probability of
S
is given by
pS|R(ai|r)=pifR|S(r|ai)fR(r)
(Bayes' theorem).

Let

Λ(r)=fR|S(r|a0)fR|S(r|a1) and threshold
η=p1/p0
.

Λ(r) is called the
likelihood
ratio
.

S^=a0 if
Λ(r)η
.
S^=a1
if
Λ(r)<η
.
This is called a
maximum likelihood test
.
S^
: maximum likelihood estimator

Probability of error

P(e)

P(e|S=a0)=P(S^=a1|S=a0)=P(Λ(r)<η|S=a0)
P(e|S=a1)=P(S^=a0|S=a1)=P(Λ(r)η|S=a1)

Hence,

P(e)=p0P(Λ(r)<η|S=a0)+p1P(Λ(r)<η|S=a1)

實際上 要注意 Arithmetic Overflow

Λ(λ) is small,
e.g.109

The log likelihood ratio,

LLR(r)=ln(Λ(r)) is convenient for use with Gaussian noise with
Z=N(0,N0/2)
.

V=S+Z,
fV|S(v|a0)=1(πN0)exp((va0)2N0)

Λ(v)=exp((va0)2+(va1)2N0)

Hence,
LLR(v)=(va1)2(va0)2N0=||va1||2||va0||2N0=2va02va1+a12a02N0

RN space
e.g.,
S=[S0,S1,...SN1]
and
R=[R0,R1,...RN1]
. Assume
(R)=S+Z
, where
Z=[Z0,Z1,...,ZN1]
.
Zi
iid random sequence with
N(0,σ2)
distribution.

MAP (Maximum A Posterior Estimation)

arg maxxSf(x){xS:f(s)f(x) for all sS}

e.g.,

arg maxxR(1|x|)={0}, i.e., at
x=0

Assume

X1,X2,...,Xn IID,
θ^MAP=arg maxθf(θ|X1,X2,...,Xn)=arg maxθf(X1,X2,...,Xn|θ)g(θ)/h(X1,X2,...,Xn)

=arg maxθf(Xi|θ)g(θ)
, since IID.

Ref. Principles of Digital Communication, by Robert G. Gallager


Sufficient statistic 充分統計量
A statistic is a function

T=f(X1,X2,···,Xn) of the random sample
X1,X2,···,Xn
.

In statistics, a statistic is sufficient with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter"

Roughly, given a set

X of independent identically distributed data conditioned on an unknown parameter
θ
, a sufficient statistic is a function
T(X)
whose value contains all the information needed to compute any estimate of the parameter (e.g., a maximum likelihood estimate).

A statistic

t=T(X) is sufficient for underlying parameter θ precisely if the conditional probability distribution of the data
X
, given the statistic
t=T(X)
, does not depend on the parameter
θ
.

e.g., sample mean

μ^=T(X;μ)=i=1NXiN

Suppose we have a statistical model, parameterized by a real number

θ, giving rise to a probability distribution for observed data,
Pθ(x)=P(xθ)
, and a statistic
θ^
which serves as an estimator of
θ
based on any observed data
x
.

image
where
Exθ[.]
denotes expected value over the distribution
P(xθ)
. An estimator is said to be unbiased if its bias is equal to zero for all values of parameter
θ
.

e.g.,
sample mean

μ^: unbiased estimator.
sample variance I
S2=1Ni=1N(Xiμ^)2
: biased estimator, since
E[S2]=(11N)σ2<σ2

sample variance II
V2=1N1i=1N(Xiμ^)2
unbiased estimator.
statistic
Tmax=max{X1,...,XN}

HW. Find
E[Tmax]
, if
Xi
iid and
uinform(0,θ)
.
A:
NN+1θ

Hint:

{ωΩ|Tmax(ω)x} if and only
{ωΩ|Xi(ω)x,for all i=1,...,N}

statistic

Tmin=min{X1,...,XN}

Nonparametric statistics(無母數統計)

Nonparametric statistics is a type of statistical analysis that does not rely on the assumption of a specific underlying distribution (such as the normal distribution), or any other specific assumptions about the population parameters (such as mean and variance).

Fisher–Neyman factorization theorem

If the probability density function is

ƒθ(x), then
T
is sufficient for
θ
if and only if nonnegative functions
g
and
h
can be found such that
fθ(x)=h(x)gθ(T(x))
i.e. the density
ƒ
can be factored into a product such that one factor,
h
, does not depend on
θ
and the other factor, which does depend on
θ
, depends on
x
only through
T(x)
.

e.g.,

f(x1,x2,...,xn;λ)=f(x1;λ)...f(xn;λ)

Statistical Hypothesis Testing(統計假說檢定)

(a) A statistical hypothesis.
(b) A test of a hypothesis against an alternative hypothesis and the associated concept of the critical region of the test.
© The power of a test.

Definition 1. Let C be that subset of the sample space which, in accordance with a prescribed test, leads to the rejection of the hypothesis under consideration. Then C is called the critical region of the test.

Definition 2. The power function of a test of a statistical hypothesis

H0 against an alternative hypothesis
H1
is that function, defined for all distributions under consideration, which yields the probability that the sample point falls in the critical region
C
of the test, that is, a function that yields the probability of rejecting the hypothesis under consideration. The value of the power function at a parameter point is called the power of the test at that point.

Definition 3. Let

H0 denote a hypothesis that is to be tested against an alternative hypothesis
H1
in accordance with a prescribed test. The significance level of the test (or the size of the critical region C) is the maximum value (actually supremum) of the power function of the test when
H0
is true.

Type II error (false negative, 偽陰性): the test result says you don’t have coronavirus, but you actually do.

偽陽性(明明沒確診卻誤判為有)和偽陰性(明明確診卻誤判為沒有)
(英語:False positives and false negatives)
type I error , type II error

Neyman-Pearson Theorem
Let

X=X1,X2...,Xn denote a random sample from a distribution with density function
f(x;θ)
. Then the joint p.d.f. of
X=X1,X2...,Xn
is
L(x;θ)=f(x1;θ)....f(xn;θ)
. Let
k>0
,
Ω={θ:θ=θ1,θ2}
. Let
C
be a subset of the sample space such that:

  1. Λ(x;θ1,θ2)=L(x;θ1)L(x;θ2)k
    ,
    xC
  2. L(x;θ1)L(x;θ2)k
    ,
    xC
  3. power function 檢定力:
    α=P(XC;H0)
    ( the probability of falsely rejecting the
    H0
    hypothesis is
    α
    ,
    type I error, False positives, 偽陽性)
  4. level of significance (significant level)
    α
    (顯著水平(顯著水準))

Then

C is the best critical region of size
α
for the simple hypothesis
H0:θ=θ0
against the
H1:θ=θ1

Example

f(x;θ)=12πexp((xθ)2/2)
H0:θ=θ0=0

H1:θ=θ1=1

Λ(x;0,1)=exp(xi+2/n)
If
k>0
, then
xin/2ln k=c

The ciritical region
C={x:xic}

Assume

n samples,
c1=c/n
,
X¯=i=1nXi/n
and
X¯
~
n(0,1/n)
.

P(X¯c1;H0)=α (type I error), hence given
α
, one can find
c1
and find the probabilty
P(X¯c1;H1)=12π1/n
c1exp((x¯1)2/(2×1/n))

  1. density function
  2. H0
    H1
  3. level of significance
    α
    (顯著水準)
  4. Type I error, Type II error
  5. sample size

If

n=25,α=0.05, then
c1
~
0.329
,
k
~
exp(n/2c1)
and the power of of the best test of
H0
again
H1
is 0.05 when
H0
is true.
P(X¯c1;H1)
~
0.999
when
H1
is true.

Type II error:

β=P(Fail to Reject H0|H0 is False)=1P(X¯c1;H1)


Proof.
If there is another critical region

A of size
α
, then

claim

CL(θ2)AL(θ2)0

CL(θ2)AL(θ2)
=CAL(θ2)+CAL(θ2)ACL(θ2)ACL(θ2)

=CAL(θ2)ACL(θ2)

Since

L(θ2)1kL(θ1) at each point of
C
and
L(θ2)1kL(θ1)
at each point of
C
, then

CAL(θ2)ACL(θ2)1k(CAL(θ!)ACL(θ1))
=1k(CAL(θ1)CAL(θ1)+CAL(θ1)+ACL(θ1))

=1k(CL(θ1)AL(θ1))=0
. Hence
CL(θ2)AL(θ2)0

If the random variables are of the discrete type, the proof is the same, with integration replaced by summation.

Type I & Type II Errors | Differences, Examples, Visualizations

Neyman-Pearson Lemma: Definition

Introduction to Mathematical Statistics

Neyman–Pearson lemma

A First Course on Statistical Inference

Confidence interval(信賴區間)
Confidence Intervals
Understanding the Confusion Matrix (II)

Ref. Introduction to Mathematical Statistics, R. V. Hogg and A. T. Craig


Successive Approximation

Taylor's formula with remainder

TAYLOR’S THEOREM WITH REMAINDER


以前下課時間 很喜歡 問學生一問題 events A, B probability independent

P(AB)=P(A)P(B). Why?

通常得到 有些說 定義, 很多茫茫然的眼神~


A:

n 實驗次數
nA
發生 event
A
的次數,
nB
發生 event
B
的次數
nAB
發生
AB
的次數

nAnP(A)
nBnP(B)

nABnP(AB)
,

如果 event

B 的發生與否 不會 影響 對 event
A
的統計, i.e.,

nABnBP(A), when
nB
(i.e., P(A|B)=P(A) conditional probability)

nABnBnnAn, as
nA,nB,n

Hence,

P(AB)=P(A)P(B) events A, B probability independent

Probability measure by CrazyDog


還有 微積分 最重要的定理是什麼? Why?茫茫然
A:
微積分基本定理(Fundamental theorem of calculus)

orthonormal basis 用在除了線性代數外你們哪門 課? 茫茫然

B is a orthonormal basis of
H
(
L2
or
l2
) space, then

xH,
x=eB<x,e>e
.

Simulation:

Assumme

X is a random variable,
FX(x)=P(Xx)
.
Let (generalized inverse function)
FX1(u)=infx{x|FX(x)u}
, where
0<u<1
.

Assume

U is a uniform random variable on
[0,1]
.
Then
P(F1(U)x)=P(Ux)=FU(x)
.

e.g.,

X exponential distribution,
FX(x)=1eλx
.

FX1(u)=ln(1u)/λ

Inverse transform sampling


Successive Approximation, Newton's Method, Fixed Point Theorem

Successive Approximation 重要的是 選擇 的初始值 能否 收斂 到 "正確解"收斂 的效率

Definition. A sequence ${x_i } in a normed space is said to be a Cauchy sequence if

||xnxm||0 as
n,m
.

Remark. If

xnx, then
||xnxm||||xnx||+||xxm||0
.

Definition. A normed linear vector space

X is complete if every Cauchy sequence from
X
has a limit in
X
. A complete normed linear vector space is called a Banana space.

Remark. Hilbert space is complete.

contraction mapping

X is a normed space,
SX,T:SS
. Then
T
is a contraction mapping if there exists
α[0,1)
such that
||T(x1)T(x2)||α||x1x2||
for all
x1,x2S
.

Contraction Mapping Theorem

X is a Hilbert (or Banach) space,
SX
and
S
is a closed set of
X
. There is a unique
x0S
satisfying
x0=T(x0)
. Furthermore,
x0
can be obtained by the method of Successive Approximation starting from an arbitary initial point in
S
.

Proof. Select an arbitrary element

x1X.
Let
xn+1=T(xn)
, then
||xn+1xn||=||T(xn)T(xn1)||α||xnxn1||
. Therefor,
||xn+1xn||αn1||x2x1||
. Then
||xn+pxn||||xn+pxn+p1||+...+||xn+1xn||

Hence
||xn+pxn||αn11α||x2x1||
.
{xn}
is a Cauchy sequence. Since
S
is closed and
X
is complete, there exists a
x0S
such that
xnx0
.

Claim

x0=T(x0)
Proof.
||x0T(x0)||=||x0xn+xnT(x0)||||x0xn||+||xnT(x0)||||x0xn||+α||xn1x0||
. By appropriate choice of
n
, the above inequality can be made arbitrarily small. Thus
||x0T(x0)||=0
;
x0=T(x0)
.

Claim

x0 is unique.
Proof.
Assume
x0
and
y0
are fixed points. Then
||x0y0||=||T(x0)T(y0)||α||x0y0||
. Thus
x0=y0
.

Newton's Method


xn+1
is a better approximation than
xn
for the root
x
of the function
f
(blue curve)

wiki


Iteration typically improves the approximation
wiki

Solving an equation of the form

P(x)=0.

xn+1=xnP(xn)P(xn)

Let

T(x)=x[P(x)]1P(x). A fixed point of
T
is the solution of
P(x)=0
.

至於 擴充至 Hilbert (Banana) space, 對 EE 太難


Exterior algebra

Let

V be a vector space over a field
K
.
Let
e1,...,en
be a basis of the vector space
V
. Definition. The exterior algebra
ΛV
is the vector space with basis formal symbols

ei1Λ...Λeia
where a ≥ 0 and
i1<...<ia
is a strictly increasing sequence. The exterior product is

(ei1Λ...Λeia)Λ(ej1Λ...Λejb)=(0 or ±ek1Λ...Λeka+b)
where in the first case the sequences
i
and
j
have an element in common.
Λ
is called exterior product (or wedge product).

e.g.,

ΛR3=KRΛ2RΛ3R
Grassmannian

The cross product (blue vector) in relation to the exterior product (light blue parallelogram). The length of the cross product is to the length of the parallel unit vector (red) as the size of the exterior product is to the size of the reference parallelogram (light red).

The Cartesian plane

R2 is a real vector space equipped with a basis consisting of a pair of unit vectors
ex,ey
.

u,v,w vectors,
α,s
real numbers.
vΛv=0

vΛu=uΛv

(u+v)Λw=uΛw+uΛw

αvΛsu=αs(vΛu)

v=aex+bey,
w=cex+dey
,
vΛw=acexΛex+bceyΛex+adexΛey+bdeyΛey=bcexΛey+adexΛey=
|abcd|exΛey
.
Alternating1

平行四邊形 面積 =| ad-bc |

Such an area is called the signed area of the parallelogram.

Let

u=u1e1+u2e2+u3e3,v=v1e1+v2e2+v3e3 then
uΛv=(u1v2u2v1)e1Λe2+(u2v3v3u2)e2Λe3+(u3v1u1v3)e3Λe1
. Let
w=w1e1+w2e2+w3e3
,

Determinant

det=|u1u2u3v1v2v3w1w2w3|

uΛvΛw=det e1Λe2Λe3

平行六面體的體積= | det |

uiRn vectors,
ui=aijej
.
u1Λu2...Λun=|a11a12...a21......an1...|e1Λe2...Λen

oriented volume of

ui,i=1,2,...n=
|a11a12...a21......an1...|

Let

Rn be an n-dimentional real vector space.

A form of degree 1 (or a 1-form) is a linear function

ω:RnR, i.e.,
ω(λ1η1+λ2η2)=λ1ω(η1)+λ2ω(η2)
, where
λiR
and
ηiRn
.

2 form is a function on pairs of vectors

ω2: Rn×RnR, which is bilinear and skew symmetric.

e.g., oriented area

S(η1,η2)=|η11η12η21η22| is a 2 form.

k form is a function of k vectors which is k-linear and antisymmetric.

ω(ηi1,...ηik)=(1)vω(η1,...ηk)

v={0  if the permulation i1,...,ik is even;1  if the permulation i1,...,ik is odd.

不懂 permulation 的請回憶 行列式

e.g., oriented volume on

Rn

Differentiable manifold (微分流形)

A chart is an open set

U in the enclidean coordinate space
q=(q1,...,qn)
together with a one-to-one mapping
ϕ
of
U
onto some subset of
M
,
ϕ:UϕUM
.

A set

M is given the structure of a differential manifold if
M
is provided with a finite or countable collection of charts, so that every point is represented in at least one chart.

1-form

dfx on the tanget space
TMx

Let

f:MR be a differentiable function on the manifold
M
(imagine a function of many variables
f:RnR
). The differential
df|x
of
f
at
x
is a linear map
dfx:TMxR
of the tangent space to
M
at
x
into the real line.

tangent bundle

TM=xTMx

Definition. A 1-form on a manifold is a smooth map

ω:TMR of the tangent bundle of
M
to the line, linear on each tangent space
TMx
.

Differential k-form

Definition. A differential k-form

ωk|x at a point
x
of a manifold
M
is an exterior k-form on the tangett space
TMx
to
M
at
x
, i,e, a k-linear skew-symmetric function of k vectors
η1,...,ηk
tanget to
M
at
x
.

Theorem. Every differential k-form on

Rn with a given coordinate system
x1,...,xn
can be written uniquely in the form
ωk=i1<...<ikai1,...ik(x)dxi1Λ...Λxik
where the
ai1,...ik(x)
are smooth functions on
Rn
.


GF(2) Galois Field

In mathematics, a field is a set on which addition, subtraction, multiplication, and division are defined and behave as the corresponding operations on rational and real numbers do.

建構式數學教的是最基本的數學觀念 不是錯~只是 對於 國高中小 是考試與教育走火入魔, 感嘆~ 就像 兩個事件獨立一樣~對學生 叫做 定義

GF(2) GF:Galois field
The modulo-2 addition can be
implemented with an XOR (

) gate and the
modulo-2 multiplication can be implemented
with an AND (
)
gate.

The set {0,1} together with modulo-2 addition
and multiplication is called a binary field ,
denoted GF(2).

Remark.

  1. Every element
    x
    of GF(2) satisfies
    xx=0
    and therefore
    x=x
  2. x,yGF(2)
    ,
    xy=xy
  3. 11=1
    , the only invertible element is 1. (e.g., rational number
    4×(1/4)=1,1/4:=41,a/b:=a×b1,b0
    )
  4. Hence, GF(2) is a field.

Vector Space over GF(2)
A binary n-tuple

a=(a1,a2,...an)
is an ordered sequence with components from GF(2).
Define an addition operation for any two binary
vector
a+b=(a1b1,...,anbn)
, where
=XOR.
Define a scalar multiplication between an element
c
in GF(2) and a binary n-tuple
a=(a1,a2,,an)
as
follows:
c×a=(ca1,ca2,...,can)
, where
= AND.

The set

Vn together with the addition defined for any two binary n-tuple in
Vn
and the scalar multiplication defined between an element in GF(2) and a binary n-tuple in
Vn
is called a vector space over GF(2).

Hamming distance

d(a,b)=|aibi|=|aibi|

HW:
Show that Hamming distance is a metric.

An (n,k) convolutional encoder (CE) will encode a k-bit input block into an n-bit ouput block, which depends on the current input block and the m preceding input blocks
A convolutional encoder is a discrete linear time-invariant system.

Remark. Why LTI system?
input of CE

0 CE
0

input of CE
δ(n)
CE
g

input of CE
δ(nm)
CE
g(nm)
and
aδ(n)
CE
ag(nm)
,
aGF(2)

since the convolutional encoder is finite state machine (FSM) with initial state
0
. Assume
gVn
, then CE is a LTI system. Hence we can define impulse response of the CE.

(2,1) convolution encoder:

Assume discrete time impulse response is given by

g0(n)=[1,0,1,1] and
g1(n)=[1,1,1,1]
over GF(2).

input sequence=

U
output
V0=g0U
and
V1=g0U

output sequence=
(V00,V01,V10,V11,...)

Decoding convolutional codes:
Viterbi algorithmby VLSI or Software


Statistical Inference (統計推論)

Formally, the partial derivative with respect to

θ of the natural logarithm of the likelihood function is called the score.
s(X;θ)
=score=
θlogL(X|θ)
.
e.g.,

L(X|θ)=i=1NfX(x|θ),
X1,...,XN
iid.

p(xi|μ,σ)=12πσ2e12σ2(xiμ)2
L(X|μ,σ)=ip(xi|μ,σ)

In general,

{f(x|θΘ)},
Θ
is called the parameter space.
290px-Log_(2).svg

Maximum likelihood estimator (MLE)

θ^=arg maxθL(x|θ)
Therefore, MLE can be estimated by solving
θlogL(X|θ)
or
θlog(L(X|θ))
if it exists, since log(.) is a monotone increasing function and

ddxln(f(x))=f(x)f(x),
f(x)>0
.

HW.

arg maxx[0,2π]cos(x)=?

The expected value (the first moment) of the score, evaluated at the true parameter value

θ , is 0.

  1. E[s(X;θ)|θ]=0
  2. I(θ)=E[s(X;θ)2|θ]

The Fisher information

I(θ)is defined to be the variance of the score.

Proof.

d lnf(x)dx=f(x)f(x)

E[s(X;θ)|θ]

=θlnL(x|θ)L(x|θ)dx

= L(x|θ)θL(x|θ)L(x|θ)dx=θL(x|θ)dx=0

score vector

θ=[θ1,...,θn]T
the log-likelihood function
L(x;θ)

score vector
I(θ)=θL(x;θ)

Under mild regularity conditions, the expected value of the score is equal to zero:

E(I(θ)|θ)=0

Score Function and Fisher Information Matrix
Fisher information

The information matrix is the covariance matrix of the score.
Fisher information matrix (FIM)

θ=[θ1,...,θn]T
I(θ)=(I[θ])ij=E[θlogL(X|θ)θlogL(X|θ)|θ]T]

If

Tθ(X) is an unbiased estimator of
θ

then the Cramér–Rao bound reduces to
COVθ(Tθ(X))I(θ)1

Cramér–Rao bound (Rao-Cramér inequality, Rao-Cramér lower bound)
Score Function and Fisher Information Matrix


Number Theory

  1. Let

    a, b, mZ with
    m>0
    . Then
    a
    is said to be congruent to
    b
    modulo
    m
    , denoted
    ab mod m
    (同餘).

  2. Every integer greater than

    1 has a prime divisor.

  3. If

    a is a composite number(合成數), there exist
    a,bZ
    such that
    n=ab
    ,
    1<a<n,1<b<n,
    e.g.,
    14=2×7.

  4. Let

    n be a composite number(合成數). Then
    n
    has a prime divisor
    p
    with
    pn
    .

  5. 1 不是質數也不是 composite​ number. Every positive integer can be written as a product of prime numbers. e.g.,
    60=(22)(3)(5)
    . In general,
    n=p1a1...pnan
    , if
    n>1
    and
    pi
    prime numbers.

  6. aiZ,
    a1,...,an
    not all zero. If
    (a1,...,an)=1
    ,
    a1,...,an
    are said to be relative prime. If
    (ai,aj)=1
    for all pairs $i,j $ with
    ij
    , then
    a1,...,an
    are said to be pairwise relatively prime.

  7. A set of integers such that every integer is congruent modulo

    m to exactly one integer of the set is said to be a complete residue(餘數) system modulo
    m
    .
    e.g.,
    {0,1,...,m1}
    is a complete residue system modulo
    m
    .
    The set of integers {1,5} is a reduced residue system modulo 6.

  8. 6a=6b mod 3a=b mod 3

  9. a,b,cZ,
    ca=cb mod mab mod (m/(c,m))

  10. Definition: A arithmetric function is a function whose domain is the set of positive integer.

  11. An arthmetric function (算術函數)

    f is said to be multiplicate if
    f(mn)=f(m)f(n)
    where
    m
    and
    n
    are relative prime positive numbers.

  12. the least common multiple (最小公倍數) of two integers

    a and
    b
    =
    lcm(a,b)=
    the smallest positive integer that is divisible by both
    a
    and
    b

Definition. Let

a,bZ. A congruence of the form
axbmod m
is said to be a linear congruence in the variable
x
(一元線性同餘方程).

Definition. Let

a,mZ with
m>0
and
(a,m)=1
. The order of a modulo
m
, denoted
ordm a
, is the least positive integer
n
for which
an1 mod m
.

Definition. Let

r,mZ with
m>0
and
(r,m)=1
. Then
r
is said to be a primitive root (原根) modulo
m
if
ordmr=ϕ(m)
.

e.g.,

3 is a primitive root modulo
7
since
ord73=6=ϕ(7)
.
2
is a primitive root modulo
13
,
ord132=ϕ(13)

2
is not a primitive root modulo
7
, since
ord7=3ϕ(7)

Theorem. Let

axbmod m and
d=(a,m)
. If
db
, then the congruence has no solution in
Z
; if
d|b
, then the congruence has exactly
d
incongruent solutions modulo
m
in
Z
.
Proof. Elementary Number Theory, chap 2.2, by STRAYER.

Definition. Any solution of the linear congruence in one variable

ax1mod m to be said to be a (multiplicative) inverse of a modulo
m
.

e.g., the inverse 5 mod 16= 13, since

5×13 mod 161

瑞士數學家 Euler 大師就是大師 腦袋不知裝時麼 寫下這 重要的理論 Public key, Priviate key 就是用到這定理

Euler's Theorem.

Let

a,mZ with
m>0
. If
(a,m)=1
, then
aϕ(m)1 mod m
, where Euler phi-function (Eulur's totient function), denoted
ϕ(n)
,
ϕ(n)=|{xZ:1xn;(x,n)=1}|
.

Remark.
1.

|S| cardinality of the set
S
(the number of elements of the set
S
).
ϕ(n)
is the number of positive integers less than or equal to
n
that are relatively prime to
n
. e.g.,
ϕ(1)=1, ϕ(3)=2, ϕ(5)=4, ϕ(6)=2, ϕ(8)=4

2.
ϕ(p)=p1
, if
p
is a prime number
.
4.
p,q
two prime numbers,
for
n=pq
,
ϕ(n)=ϕ(pq)=ϕ(p)ϕ(q)=(p1)(q1)
. Euler phi-finction
ϕ(n)
is multiplicative.

Proof.
Consider the matrix
[kp+i],1ip,0kq1
, if
(p,i)=d>1
, then no entry of
i
th-row is relatively prime to
p
(
pq
), since if
d>1
,then
d|p
and
d|i
. Hence
(p,i)=1
. Consider the row
i
with
(p,i)=1
,
[i,p+i,2p+i,...,(q1)p+i]
, each of this intergers is relatively prime to
p
.
ϕ(q)
of these integers are relatively prime to
q
. Since
ϕ(q)
integers are relatively prime to
p
, they are relatively prime to
pq
. The residue system modulo
ϕ(n)
is
[0,1,...,pq1]
. The residues that are not relatively prime to
n
are the set
{p,2p,...,(q1)p}
, the set
{q,2q,...,(p1)q}
and 0. Hence
ϕ(n)=(pq)[(q1)+(p1)+1]=ϕ(p)ϕ(q)

  1. aϕ(m)+1a mod m

Proof. Let

0<r1,r2,...rϕ(m)m and
(ri,m)=1,i=1,...,ϕ(m)
. Consider
ria,r2a,...rϕ(m)a
. Since
(a,m)=1
, the inverse of
a
=
a
. Hence if
riarjamod m
with
ij
then
riaa=rjaamodm
,
rirj mod m
. This is impossible. So the least non-negative resdiues modulo
m
of the integers
r1a,r2a,...,rϕ(m)
. Then
(r1a)(r2a)...(rϕ(m)a)r1...rϕ(m) mod m
. Hence
m|aϕ(m)r1...rϕ(m)r1...rϕ(m)
, since
(r1,...,rϕ(m),m)=1
, we have
m|aϕ(m)1
.

Compute Euler Phi-Function:
Theorem. Let

p be a prime number and let
0<aZ
, then
ϕ(pa)=papa1

Proof. The positive integers not exceeding
pa
that are not relatively prime to
pa
are
p,2p,3p,...pa1p
, hence
ϕ(pa)=papa1
.

Theorem. Let

0<zZ, then
ϕ(n)=np|n,p prime(11p)
.

e.g.,

ϕ(12)=12p|12,p prime(11p)=12(11/2)(11/3)

504=23327,
ϕ(504)=504(112)(113)(117)=144

Proof. Assume

n>1 and
n=p1a!...prar
with
p1,...,pr
distinct prime number and
a1...ar
postive integers. Then
ϕ(n)=ϕ(p1a1...prar)=ϕ(p1a1...ϕ(prar)
since Euler phi-finction is multiplicative. Thus
ϕ(n)=(p1a1p1a11)...(prarprar1)=np|n,p prime(11p)
$


The RSA Encryption/Decryption Scheme

Let

p and
q
be distinct odd prime number (typically, very large), let
m=pq
and let
e
be a positive integer such that
(e,ϕ(m))=1
. The ordered pair
{e,m}
is the public key. Let
de1mod ϕ(m)
, then
{d,m}
is the private key.

Encryption
block

P to a new block
C

PeC mod m, 0C<m

Decryption

block

C to block
P

CdP mod m, 0P<m

Remark.

ed=1mod ϕ(m)
ed=kϕ(m)+1
,
Cd(Pe)dPedPkϕ(m)+1Pkϕ(m)PP mod m
, by Euler's theorem.

Example.

  1. select two prime numbers, p=7 and q=17

  2. n=pq=119

  3. ϕ(n)=(p1)(q1)=96

  4. select

    e is relative prime to
    ϕ(n)=96
    , e.g.,
    e=5
    .

  5. de1 mod 96 and
    d<96
    , then
    d=77
    .
    public key = {5,119}
    private ket ={77,119}

    Diffie-Hellman Key Exchange Algorithm

    Diffie-Hellman-Schlüsselaustausch.svg
    DiffieHellman

    1. Global Public Emements

      q prime number,
      α<q
      and
      α
      is primitive root of
      q

    2. User A Key Generation
      Select priviate

      XA,
      XA<q

      Calculate Public
      YA
      ,
      YAαXA mod q

    3. User B Key Generation
      Select priviate

      XB,
      XB<q

      Calculate Public
      YB
      ,
      YBαXB mod q
      .

    4. Generation of Secret Key by User A

      KYBXA mod q

    5. Generation of Secret Key by User B

      KYAXB mod q

Ref. Cryptography and Network Security, by William Stallings

Ref. Introduction to Higher Mathematics, by Patrick Keef and David Guichard.

中國餘數定理, 孫子定理, 韓信點兵 (Chinese Remender Theorem)

Let

m1,m2,...,mn be pairwise relative prime positive numbers and let
b1,...bn
be any integers. The system of linear congruences in one variables given by

xb1 mod m1
xb2 mod m2


xbn mod mn

has a unigue solution modulo
m1m2...mn
.
Proof.
Existence
Let
M=m1m2...mn
and let
Mi=M/mi
. Then
(Mi,mi)=1
since they are pairwise relative prime positive numbers. Thus
Mixi1 mod mi
has a soultion for each
i
. Form
xi=1nbiMixi0+0+...bi+0...+0 mod mibi mod mi
for each
i
.

Uniqueness
Let

x be another solution. We have
xbi mod m
.
xx mod m
. Then
mi|(xx)
for all
i
. Then
M|(xx)
, from which
xx mod M
.

e.g.,

x2mod 3
x1mod 4

x3mod 5

M=(3)(4)(5)=60,
M1=20,M2=15,M3=12
,
x12 mod 3
,
x23 mod 4
,
x33 mod 5
, then
x=b1M1x1+b2M2x2+b3M3x323353 mod 60

Tensor Product of Hilbert Spaces

Let

H1 and
H2
be Hilbert spaces over a filed
K
, e.g.,
R
. For each
ϕ1H1
,
ϕ2H2
, let
ϕ1ϕ2
dentote conjugate (for complex number) bilinear form which acts on
H1H2
(tensor product) by

(ϕ1ϕ2)<ψ1,ψ2>=<ψ1,ϕ1>H1<ψ2,ϕ2>H2.

Let

E be the set of finite linear conbinations of such conjugate biliear forms; we define an inner product
< >E
on
E
by defining
<ϕψ,ημ>E=<ϕ,η><ψ,μ>
and extending by linearity to
E
(tensor: a multilinear mapping, multidimensional array)
Fact 1.
< , >E
is well defined and positive definite.
Fact 2. If
{ϕi}
and
{ψi}
are orthonormal bases for
H1
and
H2
, then
ϕiψi
is an orthonormal basis for
H1H2
.

Then we have tensor product:

H1H2...Hn.
Fact 3.
V(WU)(VW)U

VKKVV

Ref. Functional Analysis, M. Reed, B. Simon


Appendix

如果您數學程度不錯, 大數學家( Soviet and Russian mathematician) Vladimir Arnold 的書不錯(需 數學系程度)

"將數學嚴謹性與物理直覺相結合的清晰的寫作風格"
有助 別人理解

Mathematical Methods of Classical Mechanics, by V. I. Arnold
(此書也要有數學研究所程度, 適合 要念研究所與進攻博士的人)

大學部線代與機率推薦書(好像都是 數學系的~要有觀念不是計算, 計算與應用是幫助理解):

Linear Algebra:
Introduction to Linear Algebra, Gilbert Strang

Probability:
A First Course in Probability, Sheldon Ross


Ref.
Mathematical Methods and Algorithms for Signal Processing, by Todd Moon (Author), Wynn Stirling (Author)
Appendix 有簡潔的 measure theory

這書適合有心念研究所的同學 或 想加強自己能力的 資電人員, 我以前 用過 這本 教 estimation and prediction theory


Stochastic Processes, by Sheldon M. Ross (沒有用到 measure theory, 適合 EE, 資工 統計 應數 研究所)

Linear Operator Theory in Engineering and Science (Applied Mathematical Sciences, 40), by Arch W. Naylor (Author), George R. Sell (Author)

Real Analysis and Probability
Probability and Mathematical Statistics, by Robert B. Ash
(數研所程度, 較深)

有空再來 寫 神奇的 Dirac delta function

δ(t)

當時

δ
(t)dt=1
Riemann integral, Lebesque integral 是不行的

天才物理學家,量子力學的奠基者之一 Paul Dirac

還有一位 Évariste Galois, Coding 的原理就是 年輕的 他 奠立的~現稱 Évariste Galois
年輕 20歲 鬥劍 過世

後來才有數學理論:

Schwartz distribution or generalized function

這三本書 需要數學系以上的基礎

Applied Functional Analysis (Dover Books on Mathematics) Reprint 版本
作者 D.H. Griffel (Author)
這對 EE 比較容易懂, Dover Books 的書 很多是 沒有 版權問題的 好書 重印, 不貴, 這本就是 評價 很高的書, 要修 數研所 的課 沒那麼簡單