Lien de la note Hackmd

Introduction

Hyperspectral images

Les acquisitions dans le domaine spectral (les bandes) presente un echantillonage beaucoup plus fin.

L'image est en fausse couleur, on a des tenseurs a 3 voies:

\(x\),
\(y\)
et les bandes spectrales. Sur cette image on a associe 3 bandes au RGB, c'est une reconstruction partielle mais ca permet de visualiser.

C'est des images aeriennes du massif du Mont Blanc.

Quelle est la variable physique qui nous interesse ?

Ici, la variable physique qui nous interesse si on veut faire une analyse continue de la scene est la reflectance.

S'il n'y a pas de transmission, la reflectance est directement liee a l'absorbance.

On souhaite avoir des images exploitable, on veut un rapport image/bruit suffisant. On a typiquement un seul capteur qui a un systeme de diffraction optique (prisme, etc.), la lumiere va arriver et etre diffractee et reflechie dans differentes longueurs d'onde.

On n'arrive pas a voir un bloc entier tout d'un coup lors d'une acquisition

Si on veut 600 bandes, on va devoir faire un compromis sur la resolution spatiale et spectrale.

On va avoir des imageurs qui ont une faible resolution spatiale (

\(\sim 30m\)) mais il y a un 2e capteur associe qui fait l'acquisition d'une bande panchromatique.

On arrive a avoir des informations assez precises sur la reflectance des differents materiaux. On a

\(\sim 600\) echantillons pour la reflectance.

Des qu'on passe dans l'infrararouge, on a une reflectance plus importante, due a la presence de la chlorophyle.

Toutes les bandes sur le "red edge" (

\(\sim 0.7\mu m\)), ou on a la montee raide du spectre de reflectance de la vegetation, qui permet de discriminer certaines especes.

C'est la variable physique d'interet que l'on essaie d'extraire.

Example

Quel est l'interet de faire des acquisitions au-dela du domaine visible ?

Regardons differentes bandes des plantes:

Dans le proche IR:

On trouve une difference dans la 3e plante (elle est en plastique)

Applications

Dans des contextes pas forcements lie a la teledetection:

  • Detection d'hydrocarbure dans l'eau

L'huile superposee a de l'eau a un spectre relativement proche de celui de l'eau

Si on fait le traitement d'une image avec plus d'acquisition:

On extrait de l'information "cachee"

  • Monitorage et caracterisation des differents mineraux

  • Biomedical
    • Detection de tumeurs de la peau

  • Astronomie
    • Telescope "Muse"

  • L'art
    • Certaines oeuvres ont des proprietes de transmittance variant selon la longueur d'onde
    • C'est possible de detecter des couches invisibles a l'oeil nu

  • Controle non-destructif
    • Evolution d'un poisson dans le temps
    • Detection precoce de la peremption de l'echantillon

Spectral Unmixing

Une potentielle limitation de cette imagerie qu'on trouve assez souvent: la resolution spatiale faible

\(\to\) certains objets ne sont pas completement resolus

On mesure des combinaisons en fonction des spectres des elements constituant la scene.

On souhaite des echantillons en reflectance, on a une conversion a faire depuis la radiance.

Si on traite une image RGB, chaque pixel est un vecteur avec

\(3\) composantes. Ici, on a
\(600\)
composantes, c'est une problematique liee a la grande dimension des donnees.

What to mine ?

On peut utiliser une bibliotheque/catalogue de spectres de differents materiaux pour l'unmixing

On a 2 possibilites de traitement:

Spectral processing

  • Information resides in the spectral signature of the pixels
  • Pixels can be processed independently
  • Approaches issuing from multivariate statistics and linear algebra
  • Objects of interest could by sub-pixel size
  • Analysis done on the full image

Spatial processing

  • Information resides in the spatial organization of pixels
  • Pixels are processed together (analysis done on local parts of the image)
  • Use image processing tools
  • Objects of interest are fully resolved

Analysis of the spectral domain

HSI scene classification

Spectral classification

High number of features ?

When the dimensionality of the problem is high

  • How calssification accuracy depends upon the dimensionality (and amount of training data)?
  • Computational complexity of designing the classifier ?

Classification accuracy

  • Bayes error depends on the number of statistically independant features
  • Exampe: consider binary classification problem with
    \(p(x\vert \omega_j)\sim\mathcal N(\mu_j,\Sigma_j)\)
    \((j=1,2)\)
    , when
    \(P(\omega_{1,2})=0.5\)
    :

\[ P(e) = \frac{1}{\sqrt{2\pi}}\int_{r/2}^{+\infty}e^{-\frac{u^2}{2}}du \]

with

\(r^2=(\mu_1-\mu_2)^T\Sigma^{-1}(\mu_1-\mu_2)\) the squared Mahalanobis distance

  • \(P(e)\searrow\)
    for
    \(r\nearrow\)
  • In the case of conditionally independent features
    \(\Sigma = \text{diag}(\sigma_1^2,\dots,\sigma_d^2)\)
  • \(r^2 = \sum_{i=1}^d(\frac{\mu_{i,1}-\mu_{i,2}}{\sigma_2})^2\)

Il y a des zones ou on a un recouvrement
On peut augmenter la dimensionnalite, rajouter un descripteur
Attention a la malediction de la dimensionnalite

  • Intuition fails in high dimensions
  • Curse of dimensionality (Bellman, 1961): many algorithms working fine in low dimensions become intractable when the input is high-dimensional
  • Generalizing correctly becomes exponentially harder as the dimenonality grows, because a fixed-size training set covers a smaller fraction of the input space
  • In high dimensional space, the concept of proximity, distance or nearest neighbor may not even be qualitatively meaningful
  • Similarity measures based on
    \(l_k\)
    norms loose meaning with respect to
    \(k\)
    in high dimensions
  • \(l_1\)
    norm (Manhattan distance metric) is more preferable thant the Euclidean distance metric
    \((l_2)\)
    for high dimensional data mining

HSI in high dimensions

Volume of a hypersphere

  • The volume of a hypersphere of radius
    \(r\)
    in a
    \(p\)
    -dimensional

\[ V_s(r)=\frac{r^p\pi^{\frac{p}{2}}}{\Gamma(\frac{p}{2}+1)} \]

  • Volume of a hypercube
    \([-r, r]^p\)

\[ V_c(r) = (2r)^p \]

  • The fraction of the volume contained in the inscribed hypersphere

\[ f_{p_1}=\frac{V_s(a)}{V_c(a)}=\frac{\pi^{\frac{p}{2}}}{2^p\Gamma(\frac{p}{2}+1)} \]

  • Fraction of the volume of a thn spherical shell defined by a sphere of radius
    \(r\)
    inscribede inside a sphere of radius
    \((r-\varepsilon)\)
    to the volume of the entire sphere:

\[ \begin{aligned} f_{p_2}&=\frac{V_s(r)-V_s(r-\varepsilon)}{V_s(r)}\\ &=\frac{r^p-(r-\varepsilon)^p}{r^p}\\ &= 1-\biggr(1-\frac{\varepsilon}{r}\biggr)^p \end{aligned} \]

On veut voir le rapport du volume entre une sphere et le carre qui inscrit la sphere.

  • Small sample size Number of samples for accurate classification:

Si on n'a pas assez d'echantillon pour notre estimation, notre estimation ne sera pas robuste

  • Curse of dimensionality !
  • Computational complexity

The blessing of non-uniformity

  • In most application examples are not spread uniformly throughout the instance space, but are concentrated on or near a lower-dimensional manifold
  • Intrinsic dimensionality of the data might be difficult to estimate in real data

Dimensionality reduction

Dimension reduction aims at representing data in a reduced number of dimensions

Reasons:

  • Easier data analysis
  • Improved classifcation accuracy
  • More stable representation
  • Removal of redundant or irrevelant information
  • Attempt to discover underlying structure by obtaining a graphical representation

Dimensionality reduction is usually obtained by feature selection or extraction

Feature selection keeps only some of the features according to a criterion leading to new subset of features with lower dimensionality

\[ x'=[x_1,x_2,x_3,x_4,\dots,x_d]^T\\ x'=A^Tx \]

with

\[ A=\begin{pmatrix} \color{red}{1}&0&0&0&\dots&0\\ 0&\color{red}{0}&0&0&\dots&0\\ 0&0&\color{red}{1}&0&\dots&0\\ 0&0&0&\color{red}{0}&\dots&0\\ \vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&0&0&0&\dots\color{red}{1} \end{pmatrix} \]

Feature extraction transform the data in a space of lower dimensionality with an arbitrary function

\(f\)

\[ x'=f(x)\quad\text{with } f:\mathbb R^d\to\mathbb R^n, n\lt d \]

Example: Color composite

The pigment in plant leaves, chlorophyll, strongly absorbs visible light (from

\(0.4\) to
\(0.7\mu m\)
) for use in photosynthesis. The cell structure of the leaves, on the other hand, strongly reflects near-infrared light (from
\(0.7\)
to
\(1.1\mu m\)
). The more leaves a plant has, the more these wavelengths of light are affected, respectively.

Normalized Difference Vegetation Index (NVDI)

\[ \text{NDVI} = \frac{b_{NIR}-b_{RED}}{b_{NIR}+b_{RED}} \]

Example

Exploratory analysis

Covariance matrix

A partir du moment ou c'est tres correle, on peut reduire les dimensions tout en conservant une partie de l'information

Quelle est la definition de la matrice de covariance ?

C'est ce qui permet de visualiser la dependance des bandes entre elles

Sur l'image ci-dessus, les variables globalement entre

\(80\) et
\(100\)
ont une correlation relativement elevee.

Correlation matrix

Si on affiche les valeurs de la diagonale de la matrice:

Feature extraction

Eigen decomposition of the covariance matrix:

\[ \Sigma = \phi \Lambda \phi^T \]

with

\(\Lambda\) the matrix of eigenvalues (values only on the diagonal) and
\(\phi\)
the matrix of eigenvectors

Principal Component Analysis

Application

Denoising

Test

Let us consider the data

\(X\in\mathbb R^{b\times n}\) with
\(n\)
samples of
\(b\)
bands and centered at the origin. Matrix
\(\Phi=[\phi_1,\dots,\phi_d]\)
is composed of
\(d\lt b\)
eigenvectors extracted from the
\(n\times n\)
covariance matrix
\(\Sigma\)
of the data
\(X\)

Which transformation would you apply to the data for denoising based on the concepts seen so far ?

  • \(Y=X_{[1:d,:]}\)
  • \(Y=\Sigma X\)
  • \(Y=\Phi X\)
  • \(Y=\Phi^T X\)
  • \(Y=\Phi\Phi^T X\)
  • \(Y=\Phi^T\Phi\Phi^T X\)

Spectral Mixture Analysis

Spectral mixing

Linear mixing model

\[ x=\sum_{k=1}^ma_ks_k+e=Sa+e \]

  • \(x\)
    : Spectrum of a pixel
  • \(a\)
    : Coefficients in the mixture (abundance)
  • \(S\)
    : Spectra of the sources of the mixture (endmembers)
  • \(e\)
    : Noise

Contraintes:

  • Sum to
    \(1\)

\[ \sum_{k=1}^ma_k=1 \]

  • Non negativity

\[ \begin{aligned} a_g\ge 0\\ S_{k,\lambda}\ge 0 \end{aligned} \quad \forall k \]

Geometrical interpretation

On a un cas tres simple:

\[ \begin{cases} x = a_1s_1 + a_2s_2\\ a_1+a_2 = 1 \end{cases} \]

D'un point de vue representation, si on considere les vecteurs

\(s_1\) et
\(s_2\)
, toutes les valeurs de
\(x\)
definies par l'equation ci-dessus sont retrouvees dans le segment
\(s_1\leftrightarrow s_2\)

Endmember determination technique

Principles:

  • Endmembers are the vertexes of the simplex
    \(\to\)
    find extrema when projecting the data on a line
  • The convex-hull of the data encloses the simplex
    \(\to\)
    find endmembers such as to maximise the volume

Abundance

  • If the endmembers are available: Solve a minimization problem of the form:

\[ \hat A = \text{arg}\min_A\Vert X-AS\Vert^2_F\quad \text{s.t. Constraints} \]

  • If the endmembers are not available: Use alternating minimization techniques (e.g., Non-negative matrix factorization)

Hyperspectral in nature

Mantis shrimp visual system

  • \(12\)
    different types of color photoreceptors
  • see in the UV, VIS and NIR spectral domains
  • \(3\)
    focal points per eye (
    \(6\)
    in total, we have
    \(2\)
    )
  • see polarized light (linear vs circular)

Bonus