Probabilités et Statistiques pour l'image

--- title: Probabilités et Statistiques pour l'image date: tags: Image, PRSTIM --- Courriel : edwin.grappin@screenseed.com ## Glossaire **TCL** Théorème central limite **LGN** Loi des grands nombres # PRSTIM : Probabilités et Statistiques pour l'image Programme : - Rappels de probas - Proba multivariée - Indépendance - Vecteurs Gaussiens (loi normales et gaussiennes) - Convergence, TCL (Théorème Central Limite), LGN (Loi des Grands Nombres) ## Rappels $\mathbb{P}$ :::info * $\mathbb{P}$ -> Probabilité * $\Omega$: Ensemble * $\omega \in \Omega:$ élément de cet ensemble * $A \subset \Omega$: $A$ une partie d'$\Omega$ * $A \cup \Omega$: l'union de $A$ et $\Omega$ * $A \cap \Omega$: l'intersection de $A$ et $\Omega$ * $A^c := \bar{A} = \Omega \backslash A$ * $X$ *(majuscule)* variable aléatoire * $x$ *(miniscule)* réalisation de cette variable ::: Soient $\Omega$, $X$, $A \subset \Omega$ $$\mathbb{P}_X(A) = \sum_{\omega \in A}\mathbb{P}_X(\omega) = \sum_{e_i \in A}p_i$$ $$\sum_{\omega \in \Omega}\mathbb{P}_X(\omega) = 1$$ Dans le cas continu : $$\mathbb{P}_X(A) = \int_{\omega \in A}f(\omega)d\omega$$ $$\int_{\omega \in \Omega}f(\omega)d\omega = 1$$ $$\forall \omega \in \Omega , f(\omega) \geq 0$$ **Variable aléatoire:** Fonction qui part d'un ensemble $\Omega$ et arrive dans $X(\Omega)$ $$ \begin{array}{ccccc} X & : & \Omega & \to & X(\Omega) \\ & & \omega & \mapsto & X(\omega) = x \\ \end{array} $$ ++Exemple:++ $$(X_1, X_2) \in [[1, 6]]^2 \quad \Omega = \{ (1, 1), (1, 2), ..., (1, 6), (2, 1), ..., (6, 6)\} \\ Y = X_1 + X_2$$ **Fonction de répartition** : ==$F : t \mapsto \mathbb{P}(X \leq t)$== Cas discret: $$F(t) = \sum_{x \in \Omega, x \leq t} \mathbb{P}_X(x)$$ Cas continu: $$F(t) = \int_{x \in \Omega, x \leq t}f(x)dx$$ $$ \begin{array}{ccccc} \mathbb{P}_X & : & \mathbb{B}(\Omega) & \to & X(\Omega) \\ & & \omega & \mapsto & X(\omega) = x \\ \end{array} $$ Mdr j'ai pas eu la fin , on est dans la merde^ Connaitre la **FDR** (fonction de répartition) permet de comprendre le comportement de la loi de la variable aleatoire $X$. $P_x$ est une $\mathbb{P}$ de la variable aléatoire $X$: $$\forall E \subset \Omega, \mathbb{P}(E) \geq 0 \\\mathbb{P}(\Omega) = 1 \\ E\subset A \Rightarrow \mathbb{P} (E) \leq \mathbb{P}(A)$$ $$A \cap B = \emptyset \Rightarrow \mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B) \\ \mathbb{P}(\Omega) = 1, A \cup A^c = \Omega, A \cap A^c = \emptyset \\ \mathbb{P}(A) = 1 - \mathbb{P}\mathbb(A^c)$$ :::info ++Propriété:++ $$\mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B) - \mathbb{P}(A \cap B)$$ :::spoiler Démonstration $$A \cup B = (A \backslash B) \cup (B \backslash A) \cup (A \cap B) \\ \text{Or} A \backslash B, B \backslash A \text{ et } A \cap B \text{ sont disjoints} \\ P(A \cup B) = P(A \backslash B) + P(B \backslash A) + P(A \cap B) \\ = P(A) - P(A \cap B) + P(B) - P(A \cap B) + P(A \cap B)$$ ::: ### Esperance & Variance **Espérance:** $$X \in \mathbb{R}, \quad \mathbb{E}(X) = \sum_{\omega \in \Omega}w\mathbb{P}(X = \omega) \\ \mathbb{E}(X) = \int_\Omega\omega f(\omega) d\omega$$ **Variance:** $$\begin{equation} \begin{split} \mathbb{V}(X) & = \mathbb{E}((X - \mathbb{E}(X))^2) \\ & =^\text{pé} \mathbb{E}(X^2) - \mathbb{E}(X)^2 \end{split} \end{equation}$$ :::info ++Propriétés:++ $\text{Soient a, b} \in \mathbb{R}$ * $$ \mathbb{E}(aX + b) = a\mathbb{E}(X) + b$$ * $$\mathbb{V}(a) = 0$$ * $$\mathbb{V}(aX +b) = a^2\mathbb{V}(X)$$ :::spoiler Démonstration * cf. Linéarité de la somme/intégrale * $\mathbb{E}(a) = a \Rightarrow \mathbb{E}(a^2) - \mathbb{E}(a)^2 = 0$ * $\mathbb{V}(aX+b) = \mathbb{E}(aX + b - \mathbb{E}(aX + b)^2) = \mathbb{E}(a^2(X - \mathbb{E}(X)^2)) = a^2\mathbb{V}(X)$ ::: [Lois discrètes usuelles](https://fr.wikipedia.org/wiki/Loi_de_probabilit%C3%A9#Loi_uniforme_discr%C3%A8te) [Lois continues usuelles](https://fr.wikipedia.org/wiki/Loi_de_probabilit%C3%A9#Lois_absolument_continues) ## Probabilité multivariée ### Loi jointe, loi marginale $\text{Soient} \quad X, Y, \mathcal{X}, \mathcal{Y}.$ $\qquad \mathbb{P}_{(X, Y)}(X \in A, Y \in B) \rightarrow \mathbb{P}$ ==jointe== $\qquad \mathbb{P}_X(X \in A) = \sum_{y \in Y}\mathbb{P}_X(X \in A \cap Y = y) \rightarrow \mathbb{P}$ ==marginale== $\mathbb{P}_{X,Y}(X=x_i, Y=y_i) := p_{i,j}$ $\mathbb{P}_X(X = x_i) = p_{i \cdot}$ ### Loi conditionnelle $X|Y \rightarrow$ loi de $X$ **conditionnellement** à $Y$ $\mathbb{P}(X \in A | Y \in B)$ $$\mathbb{P}(X = x_i | Y = y_j) := \mathbb{P}_{Y = y_i}(X = x_i)$$ :::info ++Propriété:++ $$\mathbb{P}(X = x_i | Y = y_j) = \frac{\mathbb{P}(X = x_i \cap Y = y_j)}{\mathbb{P}(Y = y_j)}$$ (Si $\mathbb{P}(Y = y_j) \neq 0$) ::: ## Indépendance ++Définition:++ $X \amalg Y$ On dit que deux variables aléatoires $X$ et $Y$ sont **indépendantes** si $$ \forall x \in \mathcal{X}, \forall y \in \mathcal{Y}, \mathbb{P}(X = x, Y = y) = \mathbb{P}_X(X = x)\mathbb{P}_Y(Y = y) $$ :::warning On appelle **image** de variable aléatoire l'application d'une v.a. à une autre v.a. : $$X, Y \rightarrow \varphi(X, Y)$$ ::: :::info **Somme de v.a.** $\varphi(X, Y) = X + Y$ * $\qquad \qquad \mathbb{E}(X+Y) = \mathbb{E}(X) + \mathbb{E}(Y)$ * $\qquad \qquad \mathbb{V}(X+Y) = \mathbb{V}(X) + \mathbb{V}(Y) + 2cov(X, Y)$ * $\qquad \qquad cov(X, Y) = \mathbb{E}((X - \mathbb{E}(X))(Y - \mathbb{E}(Y)))$ :::spoiler Démonstration * $\mathbb{E}(X+Y) = \mathbb{E}(X) + \mathbb{E}(Y)$ $\int_{x \in \mathcal{X}, y \in \mathcal{Y}} (x+y) f_{X, Y}(x, y)d(x, y) = \int_{x \in \mathcal{X}}\int_{y \in \mathcal{Y}}(x + y)f_{X, Y}(x, y)dydx \\ = \int_{x \in \mathcal{X}}x\int_{y \in \mathcal{Y}}f_{X, Y}(x, y)dydx + \int_{x \in \mathcal{X}}\int_{y \in \mathcal{Y}}yf_{X, Y}(x, y)dydx \\ = \int_{x \in \mathcal{X}}xf(x)dx + \int_{y \in \mathcal{Y}}\int_{x \in \mathcal{X}}yf_{X, Y}(x, y)dxdy \\ = \mathbb{E}(X) + \int_{y \in \mathcal{Y}}y\int_{x \in \mathcal{X}}f_{X, Y}(x, y)dxdy \\ = \int_{y \in \mathcal{Y}}yf(y)dy = \mathbb{E}(X) + \mathbb{E}(Y)$ * $V(X+Y) = E((X+Y - E(X+Y))^2) \\ = E([(X - E(X)) + (Y - E(Y))]^2) \\ = E((X - E(X))^2)) + E((Y - E(Y))^2) + E(Z(X-E(X))(Y-E(Y))) \\ = V(X) + V(Y) + 2cov(X, Y)$ ::: ++Exemple:++ $\text{Soient} \quad X = \{-1, 1\}, \quad Y = \{-1, 1\}$ $P_X(X = 1) = P_X(X=-1) = \frac{1}{2}$ $P_Y(Y = 1) = P_Y(Y=-1) = \frac{1}{2}$ $\Rightarrow cov(X, Y) = \mathbb{E}(XY)$ ++Cas 1 :++ $X = Y \rightarrow P(Y = 1 | X = 1) = 1 \quad \& \quad P(Y = -1 | X = -1) = 1$ $\mathbb{E}(XY) = P(X = 1, Y = 1) + P(X = -1, Y = -1)$ $= P(X = 1)P(Y = 1 | X = 1) + P(X = -1)P(Y = -1 | X = -1)$ $= P(X = 1) + P(X = -1) = 1$ ++Cas 2 :++ $X = -Y \rightarrow P(Y = 1 | X = -1) = 1 \quad \& \quad P(Y = -1 | X = 1) = 1$ $\mathbb{E}(XY) = P(X = 1, Y = -1) * (-1 + P(X = -1, Y = 1) * (-1) = -1$ ++Définition++ _Espérance conditionnelle_ $$\mathbb{E}(X | Y=y) := \int_{\mathcal{X} } x f_{Y=y}(x)\text{d}x = \varphi(y)$$ :::info ++Propriété:++ $$\text{Si} \quad X \amalg Y \Rightarrow cov(X, Y) = 0 \\ \text{(La réciproque est fausse.)}$$ ::: :::info ++Remarque:++ On peut voir la covariance comme un produit scalaire: $$Cov(X, Y) = \mathbb{E}((X - \mathbb{E}(Y))(Y - \mathbb{E}(X))) = <X - \mathbb{E}(Y), Y - \mathbb{E}(X)>$$ Car la covariance est: * bilinéaire * symétrique * postive et définie ::: Soient $X, Y \in \mathbb{R}^p$ $$cov(X, Y) = E((X - E(X))(Y - E(Y))^T) \subseteq \mathcal{M}_{\mathbb{R}^{p*p}}$$ On peut définir la matrice de la covariance des termes un-à-un : $$X\in \mathbb{R}^p: \quad \Sigma_X :=$$ ## Esperances conditionnelles, le retour $$\text{En discret}:\varphi(y) = \mathbb{E}(X | Y = y) = \sum_{x \in \mathcal{X}}xP(X = x | Y = y) \\ \text{En continu}:\varphi(y) = \mathbb{E}(X | Y = y) = \int_{x \in \mathcal{X}}xf_{X|Y}(x | Y = y)dx$$ :::info ++Théorème de l'espérance totale:++ $$\mathbb{E}_X(X) = \mathbb{E}_Y(\mathbb{E}_X(X | Y))$$ Remarque : E(X|Y) est elle-même une variable aléatoire, qui dépend de $Y$, d'où $\mathbb{E}_Y$. ::: $S_{++}^p(\mathbb{R})$ : ensemble des matrices carrées de $\mathbb{R}^{p*p}$ symétriques, définies, positives et vérifiant la propriété : $$A\in S_{++}^p \Rightarrow x^TAx > 0, \quad \forall x \in \mathbb{R}^p_*$$ ## Loi normales multivariées **Définition d'une loi normale** Soit $X$ v.a. $\in \mathbb{R}^{p}, \quad X \sim \mathcal{N} (\mu, \Sigma) \quad$ avec $\quad \quad \mu \in \mathbb{R}^{p}, \quad \Sigma \in S_{++}^p$ $$f_X(x) = \frac{1}{\sqrt{(2\pi)^p|\Sigma |}}\times e^{-\frac{1}{2}\times(x - \mu)^T\times\Sigma^{-1}\times(x - \mu)}$$ **Définition:** $$\text{Si } X(\Omega) \subset \mathbb{R}^p, \: X \sim \mathcal{N}, \text{ alors } \forall a \in \mathbb{R}^p, \; a^TX \sim \mathcal{N}(\mu(a), \Sigma(a))$$. **Définition:** $$W \sim \mathcal{N}(0_p, I_p) \Rightarrow w_i \amalg w_j \quad \text{si } i \neq j$$ $$X \sim \mathcal{N} \text{ si } \exists(C \in \mathbb{R}^{p*p}), \; CW + \mu = X$$ :::info Contrairement au cas général, pour une loi normale ($X_1, X_2 \sim \mathcal{N}$): $$cov(X_1, X_2) = 0 \Rightarrow X_1 \amalg X_2$$ > (une covariance nulle implique que les lois sont indépendantes) La réciproque est vraie, comme pour toute loi de probabilité. ::: :::info ++NB :++ La stabilité par **projection linéaire** de la loi normale caractérise la loi normale. ::: :::info ++Propriété++ * $X \sim \mathcal{N} (\mu, \Sigma) \quad C \in \mathbb{R}^{k, p}, d \in \mathbb{R}^k$ $$Y = CX + d \Rightarrow Y \sim \mathcal{N}(C\mu + d, C\Sigma \Sigma^TC^T)$$ * $X \in \mathbb{R}^{p_x}, Y \in \mathbb{R}^{p_y}, \begin{pmatrix} X \\ Y \end{pmatrix} \in \mathbb{R}^{p_x + p_y}$ \begin{equation} \begin{split} X \sim \mathcal{N}(\mu_X, \Sigma_X), Y \sim \mathcal{N}(\mu_Y, \Sigma_Y) & \Rightarrow \begin{pmatrix} X \\ Y \end{pmatrix} \sim \mathcal{N}(\begin{pmatrix} \mu_x \\ \mu_y \end{pmatrix}, \begin{pmatrix} (\Sigma_X) & ( )\\ () & (\Sigma_Y) \end{pmatrix}) \\ & \Rightarrow (X | Y) \sim \mathcal{N} \end{split} \end{equation} ::: ## Convergences :::success ++Types de convergences :++ * presque sûrement: p.s. * $\mathbb{P}$ : probabilité * $\mathcal{L}$ : distribution ::: ++Soit:++ $(X_n)_{n\in\mathbb{N}} \text{ suite de v.a. dans } \mathbb{R}$ **Définition** Convergence p.s. $(X_n) \rightarrow_\infty^{p.s.} X \quad \text{si} \quad \exists A \subset \Omega \quad \text{tq}:$ * $\forall a \in A, X_n(a) \rightarrow_{\infty}^{p.s.} X(a)$ * $\mathbb{P}(A) = 1$ **Définition** Convergence en proba ($\mathbb{P}$) $(X_n) \rightarrow_\infty^{\mathbb{P}} X \quad \text{si}$ * $\forall \varepsilon > 0, \quad \lim_{\infty} \mathbb{P}(|X_n - X| > \epsilon) = 0$ **Définition** Convergence en proba vers un réel $(X_n) \rightarrow_\infty^{\mathbb{P}} c \quad \text{si}$ * $\forall \varepsilon > 0, \quad \lim_{\infty} \mathbb{P}(|X_n - c| > \epsilon) = 0$ **Définition** Convergence vers une loi/distribution ($\mathcal{L}$) $X_n \rightarrow_\infty^{\mathcal{L}} X \quad \text{si}$ * $\lim_\limits{n \rightarrow +\infty} \, F_n(x) = F(x) \quad \forall x \in \mathcal{C}_F$ :::info ++Propriétés:++ * Converge ps $\Rightarrow$ convergence $\mathbb{P}$ $\Rightarrow$ convergence loi * $\mathbb{P}(A) = 1 \quad a \in A \quad X_n(a) \rightarrow X(a)$ * $n \text{ assez grand} \quad |X_n(a) - X(a)| \leq \varepsilon \Rightarrow \mathbb{1}_{\delta_a} = 1$ $\mathbb{E}(\mathbb{1}_{\delta_A}) = \int_A^{n \text{ assez grand}}{1\mathbb{P}(|X_n(a) - X(a)| \leq \varepsilon)} = 1$ ::: ## Inégalités et théorêmes de convergence **Markov** Soit $X$ une v.a. tq $P(X \geq 0) = 1$ $$P(X \geq a) \leq \frac{\mathbb{E}(X)}{a}$$ ::: spoiler Démonstration $I = 1_{\{X \geq a\}}$ $$\mathbb{E}(aI) \leq \mathbb{E}(X) \\ \mathbb{E}(I) \leq \frac{\mathbb{E}(X)}{a} \\ O *P(I = 0) + 1 * P(I = 1) \leq \frac{\mathbb{E}(X)}{a} \\ P(X \geq a) \leq \frac{\mathbb{E}(X)}{a}$$ ::: **Inégalité de (Bienaymé-)Tchebychev** $$\mathbb{P}(|X - \mathbb{E}(X)| \geq \varepsilon) \leq \frac{\mathbb{V}(X)}{\varepsilon^2}$$ ::: spoiler Démonstration $\text{Soit} \quad Y = |X - E(X)|^2$ $$P(Y \geq a) = \frac{E(Y)}{a} \\ \text{En particulier, pour a =}\varepsilon^2, P (|X - E(X)|^2 \geq \varepsilon^2) \leq \frac{E((X-E(X))(X - E(X)))}{\varepsilon^2} \\ P (|X - E(X)| \geq \varepsilon) \leq \frac{\mathbb{V}(X)}{\varepsilon^2}$$ ::: ## Loi des Grand Nombres > On parle de loi faible des grands nombres quand il y a convergence en proba ($\rightarrow^\mathbb{P}$) > On parle de loi forte des grands nombres quand il y a convergence presque sûre ($\rightarrow^{ps}$) **iid**: indépendantes et identiquement distribuées :::info **Théorème** Loi faible *(resp forte)* des Grands nombres $(X_n)_{n \in \mathbb{N}_*}$, suite v.a. **iid**, $\mathbb{E}(|X_1|) < \infty$ Soit $S_n := \Sigma^n_{i=1}X_i$ Alors, $\frac{S_n}{n} \rightarrow^{\mathbb{P} (resp\,ps)}_{n\infty} \mathbb{E}(X_i)$ ::: ## Théorème central limite (TCL) $(X_n)_{n \in \mathbb{N}_*} \quad \mathbb{E}(X_1) = \mu \quad \text{et} \quad \mathbb{V}(X_1) = \sigma^2$ $S_n = \Sigma^n_{i=1}X_i$ $\frac{(S_n - n\mu)}{\sigma\sqrt{n}} \rightarrow_{n\infty}^\mathcal{L} \mathcal{N}(0, 1)$ Grossièrement: $S_n \rightarrow^{\mathcal{L}} \mathcal{N}(n\mu, \sigma^2n)$ Où $S_n$ converge vers $n\mu$ avec $\sigma^2n$ la "vitesse de convergence". ## Estimation de paramètres > c.f Jupyter notebook (Guillaume Tochon) * On observe $x$. On sait que $x$ est une réalisation particulière de $X \sim \mathcal{N}(\mu, \sigma^2)$ avec $\mu$ inconnue, $\sigma^2 = 4$ On veut estimer $\mu$ à partir de $x$; $\hookrightarrow$ Compliqué, mais $\hat{\mu} = x$ est la valeur la plus probable pour $\mu$. $\hookrightarrow$ Estimation plus précise si on a observé $n$ réalisations particulières $x_1, ..., x_n$ $\rightarrow$ On utilise la **moyenne empirique** $\overline{x_n} = \frac{1}{n}\sum_{i=1}^nx_i$ Estimation de paramètres : On a observé des réalisations particulières $x_1, x_2, …, x_n$ qui découlent toutes du même phénomène aléatoire, modélisable par la densité de probabilité $f_X(x, \theta)$ ++Question:++ Comment estimer $\theta$ à partir des données observées $x_1, ..., x_n$ ? $\hookrightarrow$ on essaie d'inverser la ddp pour avoir accès au paramètre $\theta$ caché. $\hookrightarrow$ problème inverse. Pour la moyenne empirique: $\overline{x_n} = \frac{1}{n}\sum_{i=1}^{n}{x_i}$ Cas général: On combine les valeurs observées d'une certaine manière pour que le résultat estimé $\hat{\theta}$ soit le **plus proche possible** de $\theta$ $$\hat{\theta} = g(x_1, ..., x_n)$$ $\hat{\theta}$ est un estimateur ponctuel $\equiv$ nombre. :::warning La probabilité que la moyenne tombe **exactement** sur le paramètre recherché est nulle, car la variable aléatoire est continue. ::: ++Questions:++ * 1: Comment trouver la fonction $g$ ? $\hookrightarrow$ dépend de la ddp et du paramètre en question. > e.g. $x_1, ..., x_n$ réalisations particulières de $X \sim \mathcal{R}(\alpha)$ de ddp $f_X(x, \alpha) = \frac{x}{\alpha^2}e^{-\frac{x^2}{2\alpha^2}}, \quad x \gt 0$. > > $E(X) = \alpha\sqrt{\frac{\pi}{2}} \qquad var(X) = (\frac{4 - \pi}{2})\alpha^2$ * 2: Comment choisir entre deux estimateurs $g_1$ et $g_2$ ? > $\hat{\alpha} = \sqrt{\frac{1}{2n}\sum_{i=1}^{n}{x_i^2}}$ > $\hookrightarrow$ Estimateur du maximum de vraisemblance > $\alpha = \sqrt{\frac{2}{\pi}}E[X] \rightarrow (\overline{x_n}=\sum{...})\sqrt{\frac{2}{\pi}}$ est aussi un estimateur de $\alpha$. * 3: Comment quantifier les performances d'un estimateur ? ++Relations particulières:++ $x_1 \rightarrow X_1 \sim f_X(x_1, \theta)$ $x_2 \rightarrow X_2 \sim f_X(x_2, \theta)$ $...$ $x_i \rightarrow X_i \sim f_X(x_i, \theta)$ $...$ $x_n \rightarrow X_n \sim f_X(x_n, \theta)$ A partir de ces relations particulières, on peut écrire la variable aléatoire associée $\hat{\Theta}$: $\hat{\Theta} = g(X_1, ..., X_n) \rightarrow$ estimateur statistique $\equiv$ variable aléatoire. :::info $\hat{\theta} = g(x_1, ..., x_n)$ est une réalisation particulière de $\hat{\Theta} = g(X_1, ..., X_n)$ ::: ### Qualité d'un estimateur $\hat{\Theta}$ $\hookrightarrow x_1^{(1)}, ..., x_n^{(1)} \rightarrow \hat{\theta}_1$ $\hookrightarrow x_1^{(2)}, ..., x_n^{(2)} \rightarrow \hat{\theta}_2$ $\hookrightarrow ...$ $\hookrightarrow x_1^{(N)}, ..., x_n^{(N)} \rightarrow \hat{\theta}_N$ En moyenne, $\hat{\theta} \equiv \theta$. On veut que $\mathbb{E}(\hat{\Theta}) = 0$. **Biais d'un estimateur** * $b(\hat{\Theta}) = \mathbb{E}(\hat{\Theta}) - \theta$ * $b(\hat{\Theta}) = 0 \iff$ estimateur non biaisé. ### Moyenne empirique $X_i \sim^{iid} f_X(x, \theta) \quad E[X_i]=\mu \quad var(X_i)=\sigma^2 \quad \text{iid}\equiv\text{indépendants et identiquement distribués.}$ $\overline{x_n} = \frac{1}{n}\sum_{i = 1}^nx_i$ $\hookrightarrow \overline{X_n} = \frac{1}{n}\sum_{i = 1}^nX_i$ * $E[\overline{X_n}] = E[\frac{1}{n}\sum_{i = 1}^nX_i] = \frac{1}{n}E[\sum_{i = 1}^nX_i] = \frac{1}{n}\sum_{i = 1}^n\underbrace{E[X_i]}_{\mu}$ $E[\overline{X_n}] = \frac{1}{n}\sum_{i = 1}^n\mu = \mu \rightarrow E[\overline{X_n}] =\mu, \quad b(\overline{X_n}) = 0$ La moyenne empirique est donc non biaisée. * $\mathbb{V}(\overline{X_n}) = \mathbb{V}(\frac{1}{n}\sum_{i = 1}^nX_i) = \frac{1}{n^2}\mathbb{V}(\sum_{i = 1}^nX_i) = \frac{1}{n^2}\sum_{i=1}^{n}{\underbrace{\mathbb{V}(X_i)}_{\sigma^2}} = \frac{1}{n^2}\sum_{i=1}^n{\sigma^2}$ $\mathbb{V}(\overline{X_n}) = \frac{\sigma^2}{n} \rightarrow_{n \to \infty} 0$ > c.f TCL : les résultats sont cohérents. ### Variance empirique On a observé $x_1, ..., x_n \rightarrow X_i \text{iid}, \quad E[X_i] = \mu, \quad \mathbb{V}(X_i) = \sigma^2$ * Si $\mu$ est connue, $S_n^2 = \frac{1}{n}\sum_{i=1}^{n}{(X_i - \mu)^2}$ $\hookrightarrow \mathbb{E}[S_n^2] = \mathbb{E}[\frac{1}{n}\sum_{i=0}^{n}{(X_i - \mu)^2}] = \frac{1}{n}\sum_{i=1}^{n}{\mathbb{E}[(X_i - \mu)^2]}$ $\mathbb{E}[S_n^2] = \frac{1}{n}\sum_{i=1}^{n}{\sigma^2} = \sigma^2 \rightarrow S_n^2 \text{ non biaisé}$. $\hookrightarrow \mathbb{V}(S_n^2) \rightarrow_{n\rightarrow+\infty}0 \rightarrow S_n^2$ consistant (La variance tend vers 0 quand l'échantillon tend vers l'infini). * Si $\mu$ n'est pas connue $S_n^2 = \frac{1}{n}\sum_{i=1}^n(X_i - \overline{X_n})^2$ $S_n^2 = \frac{1}{n}\sum_{i=1}^n(X_i^2 - 2X_i\overline{X_n} + \overline{X_n^2})$ $S_n^2 = \frac{1}{n}\sum_{i=1}^nX_i^2 - 2\overline{X_n}\underbrace{\frac{1}{n}\sum_{i=1}^nX_i}_{\overline{X_n}} +\overline{X_n^2}$ (on devrait recevoir des dons pour écrire des formules comme ça) amen to that $S_n^2 = \frac{1}{n}\sum_{i=1}^nX_i^2 - \overline{X_n^2}$ $S_n^2$ est toujours consistant, mais est biaisé (*foreshadowing*). $E[S_n^2] = E[\frac{1}{n}\sum_{i=1}^nX_i^2 - \overline{X_n^2}] = \frac{1}{n}\sum_{i=1}^nE[X_i^2] - E[\overline{X_n^2}]$ $\mathbb{V}(X) = E[X^2] - (E[X])^2$ (th de König-Huygens) $\hookrightarrow E[(X - E[X])^2$ $E[S_n^2] = \frac{1}{n}\sum_{i=1}^n(\mathbb{V}(X_i) + (E[X_i])^2 - (\mathbb{V}({\overline{X_n}}) + (E[\overline{X_n}])^2)$ $E[S_n^2] = \frac{1}{n}\sum_{i=1}^n(\sigma^2 + \mu^2) - (\frac{\sigma^2}{n} + \mu^2$ $E[S_n^2] = \sigma^2 + \mu^2 - \frac{\sigma^2}{n} - \mu^2 = \sigma^2(1 - \frac{1}{n}) = \sigma^2(\frac{n - 1}{n})$ $$E[S_n^2] = \sigma^2(\frac{n - 1}{n}) \rightarrow \text{Estimateur biaisé}$$ On cherche alors à **retirer le biais**: $E[S_n^2] = \sigma^2(\frac{n - 1}{n})$ $\frac{n}{n-1}E[S_n^2] = \sigma^2$ $E[\frac{n}{n-1}\frac{1}{n}\sum_{i=1}^{n}{(X_i - \overline{X_n})^2}] = \sigma^2$ $E[\frac{1}{n-1}\sum_{i=1}^{n}{(X_i - \overline{X_n})^2}] = \sigma^2$ Pour une estimation **non-biaisée** on utilise: $$\frac{1}{n-1}\sum_{i=1}^{n}{(X_i - \overline{X_n})^2}$$ ### Estimateur du maximum de vraisemblance (maximum likelihood estimation) Deux visions équivalentes: * On a observé $x_1, ..., x_n$ avec $x_i$ une réalisation particulière de $X_i$, avec $X_i$ iid. * On a observé $(x_1, ..., x_n)$, une réalisation particulière vectorielle (ou un vecteur particulier) du vecteur aléatoire $(X_1, ..., X_n)$ Si $X$ et $Y$ sont des v.a. indépendantes, la ddp du vecteur $(X, Y)$ se factorise : $f_{XY}(x, y)=\underbrace{f_X(x)f_Y(y)}_{\text{ddp marginales}}$ $f_X(x) = \int{f_{XY}(x,y)}dy$ $f_Y(y) = \int{f_{XY}(x,y)}dx$ La ddp jointe de $(X_1, ..., X_n)$ s'écrit $\underbrace{f_{X_1, X_2, ..., X_n}(x_1, x_2, ..., x_n, \theta)}_{\mathcal{L}(x_1,...,x_n,\theta)} = \prod_{i=1}^nf_X(x_i, \theta)$ **Fonction de vraisemblance** En fixant les $x_1, ..., x_n$, $\theta \longmapsto \mathcal{L}(x_1, ..., x_n, \theta) \equiv$ probabilité d'observer $(x_1, ..., x_n)$ étant donné $\theta$. On a effectivement observé $(x_1, ..., x_n)$ $\rightarrow$ On cherche la valeur de $\theta$ qui **maximise** la fonction de vraisemblance $\equiv$ maximise la probabilité d'observer ce qu'on a effectivement observé. :::info ++Propriété:++ En général, $\theta \rightarrow \mathcal{L}(x_1, ..., x_n, \theta)$ est concave. Son maximum est donné par la dérivée de $\mathcal{L}$ par rapport à $\theta$ (lorsque la dérivée s'annule). $$ \frac{\partial \mathcal{L}(x_1, ..., x_n, \theta)}{\partial \theta}\bigg|_{\theta^*} = 0 \\ \frac{\partial^2 \mathcal{L}(x_1, ..., x_n, \theta)}{\partial \theta^2}\bigg|_{\theta^*} < 0 \\ $$ En pratique, on cherche le maximum du log de la vraisemblance, ou log-likelihood (permet de simplifier la dérivation, par exemple). ::: Pour la loi de Rayleigh $X \sim \mathcal{R}(\alpha) \text{ avec } f_X(x, \alpha) = \frac{x}{\alpha^2}e^{-\frac{x^2}{2\alpha^2}}, x \gt 0$. $\begin{equation} \begin{split} \mathcal{L}(x_1, ..., x_n, \alpha) & = \prod_{i=1}^nf_X(x_i, \alpha) \\ & = \prod_{i=1}^n\frac{x_i}{\alpha^2}e^{-\frac{x^2}{2\alpha^2}} \\ & = \frac{\prod_\limits{i=1}^nx_i}{\alpha^2n}e^{-\frac{x^2}{2\alpha^2}} \end{split}\end{equation}$ $\ln(\mathcal{L}(x_1,...,x_n,\theta)) = \sum_\limits{i=1}^n\ln(x_i) - 2n \ln(\alpha) - \frac{1}{2\alpha^2} \sum_\limits{i=1}^n x_i^2$ $\hookrightarrow$ plus facile à dériver $\begin{equation} \begin{split} \frac{\partial \ln\mathcal{L}}{\partial \alpha} & = -\frac{2n}{\alpha} - \frac{1}{2}\sum_{i = 1}^nx_i^2 * (-\frac{2}{\alpha^3}) \\ & = -\frac{2n}{\alpha} + \frac{1}{\alpha^3}\sum_{i = 1}^nx_i^2 \end{split} \end{equation}$ On cherche $\alpha^*$ tel que $\frac{\partial \ln\mathcal{L}}{\partial \alpha}|_{\alpha^*} = 0$ $\Rightarrow -\frac{-2n}{\alpha^*} + \frac{1}{(\alpha^*)^3}\sum_\limits{i=1}^n x_i^2 = 0\\ \Rightarrow \frac{2n}{\alpha^*} = \frac{1}{(\alpha^*)^3} \sum_\limits{i=1}^n x_i^2\\ \Rightarrow (\alpha^*)^2 = \frac{1}{2n} \sum_\limits{i=1}^n x_i^2\\ \Rightarrow \alpha^* = \underbrace{\hat{\alpha}_{EMV}}_{\text{estimateur ponctuel}} = \sqrt{\frac{1}{2n}\sum_\limits{i=1}^n x_i^2}$ En théorie, il faut aussi vérifier que la dérivée seconde $\frac{\partial^2ln\mathcal{L}}{\partial\alpha^2}\big|_{\hat{\alpha}_{EMV}} \lt 0$. $\rightarrow$ L'estimateur statistique du maximum de vraisemblance pour la loi Rayleigh est $\hat{A}_{EMV} = \sqrt{\frac{1}{2n}\sum_\limits{i=1}^nX_i^2}$ biais ? consistance ? $\mathfrak{THE \; END}$

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.