--- tags: metric, memo, public --- # Likelihood Ratio Test ## Notation - $H_0$: null hypothesis. - $H_1$: alternative hypothesis. - $\Theta$: parameter space, a subset of $\mathbb{R}^l$. - $\theta$: parameter, $\theta \in \Theta$. - $\Theta_0$: subsets of $\Theta$, corresponding to null hypothesis - $\Theta_1$: subsets of $\Theta$, corresponding to alternative hypothesis. - $X$: random variable, with known distribution family and unknow parameter. - $L(\theta)$: $X$'s pdf/pmf/likelihood function, which depends on $\theta$. - $S_1$: rejection region/critical region, lossely, $S=S_1$. - $d$: decision, can be $1$ (reject) or $0$ (accept). - $\delta:X \to \{0,1\}$: decision function. With randomization test, it becomes $\delta: X \to [0,1]$. ## Problem There is a random variable $X$. Before observation, we know it belongs to (some) distributions with a parameter $\theta \in \Theta$, but we do not know the exact $\theta$. There is the null hypothesis that $\theta \in \Theta_0$, and the alternative hypothesis is $\theta \in \Theta_1$. Our task is using the observation of $X$ to accept/reject null hypothesis or alternative hypothesis. That is, we need to construc a decision function $\delta: X \to [0,1]$ to make decision. On the other hand, the decision function is based on a critical region $S$, which is a subset of $X$'s support. \begin{align*} \delta (x) =1, \text{ if } x \in S,\\ \delta (x) =0, \text{ if } x \notin S,\\ \end{align*} ## Errors There are two kinds of errors, - Type I: reject $H_0$ when $H_0$ is true. - Type II: accept $H_0$ when $H_1$ is true. We asuume type I error is more serious than type II error. ## $\alpha$ and $\beta$ There are two terms refer to the porbability of type I error. ### Size Given a test $\delta$, the size is the "highest" probability of type I error, $$ size = \sup_{\theta \in \Theta_0} P(\delta(X)=1). $$ ### Significant level On the other hand, we can set an $\alpha \in [0,1]$ as the significant level, that is, the probability of type I error cannot exceed $\alpha$, $$ P(\delta(X)=1) \le \alpha \quad \forall \theta \in \Theta_0 $$ ### Power function For the same test $\delta$, the probability of rejecting $H_0$ may vary on different $\theta$. The power function is the probability $\delta$ rejects $H_0$ given $\theta$, $$ \beta(\theta) = P(\delta(X) = 1; \theta). $$ ## Likelihood Suppose $X$ is a random variable, it (mostly) has a probability density function (pdf) or probability mass function (pmf), $f(x)$. Here, we rename them as likelihood function. $$ L(\theta) = f(\theta) $$ Suppose we observe a realization $x$. The intuition is, given a pdf/pmf, we can evaluate how "likely" of a specific realization $x$. $$ L(x; \theta). $$ ### Maximum Likelihood Estimator Suppose we know $X$ follow a distribution with unknow parameter $\theta \in \Theta$. With the observation $x$, we can estimate $\theta$ as, $$ \hat{\theta}_{MLE} = \sup_{\theta \in \Theta} L(x; \theta). $$ ## Likelihood Ratio Test Following the same notion in MLE, we can use likelihood function to construct test. Given the observation $x$, $\Theta_0$, $\Theta_1$, we can calculate, $$ r=\frac{\sup_{\theta \in \Theta_1} L(x; \theta)}{\sup_{\theta \in \Theta_0} L(x; \theta) } $$ with a decision rule, $$ \delta(x) = \begin{cases} 1 & \text{ if } r > k\\ 0 & \text{ if } r \le k\\ \end{cases} $$ where $k$ is a threshold to be calibrated by $\alpha$/size. ## Neyman-Pearson lemma ### Statement Consider a test with hypotheses $H_0 : \theta = \theta_0$ and $H_{1}:\theta =\theta_{1}$, with pdf/pmf $L(x ; \theta_0), L(x;\theta_1)$, respectively. Let $k$ be a positive number and $S$ be a subset of the sample space such that 1. $$\frac{ L(x; \theta_0)}{ L(x; \theta_1) }\le k, \quad \forall x \in S.$$ 2. $$\frac{ L(x; \theta_0)}{ L(x; \theta_1) } \ge k, \quad \forall x \in S^c.$$ 3. $$\alpha = P[X \in S | \theta = \theta_0]. $$ Then this test is the most powerful test of size $\alpha$ for testing the simple hypothesis $H_0: \theta=\theta_0$ against $H_1: \theta=\theta_1$. <details> <summary> Proof: </summary> Consider the case of continuous variable. Let $A$ be any other critical region with size $\alpha$. We wnat to show that $$ P[X \in S | \theta = \theta_1]- P[X \in A | \theta = \theta_1] = \int_S L(\theta_1) - \int_A L(\theta_1) \ge 0 $$ We can decompose the two regions as the union of two disjoint set, \begin{align*} S&= \left (S \cap A \right) \cup \left (S \cap A^c \right)\\ A&= \left (A \cap S \right) \cup \left (A \cap S^c \right)\\ \end{align*} Hence, \begin{align*} \int_S L(\theta_1) - \int_A L(\theta_1) &= \int_{S \cap A} L(\theta_1) + \int_{S \cap A^c} L(\theta_1) - \int_{A \cap S} L(\theta_1) -\int_{A \cap S^c} L(\theta_1) \\ &= \int_{S \cap A^c} L(\theta_1) -\int_{A \cap S^c} L(\theta_1). \end{align*} However, by the condition 1, $L(\theta_1)\ge k^{-1} L(\theta_0)$ at each point of $S$, and hence at each point of $S \cap A^c$, thus $$ \int_{S \cap A^c} L(\theta_1) \ge \frac{1}{k} \int_{S \cap A^c} L(\theta_0). $$ By condition 2, $L(\theta_1)\le k^{-1} L(\theta_0)$ at each point of $S^c$, and hence at each point of $A \cap S^c$, thus, $$ \int_{A \cap S^c} L(\theta_1) \le \frac{1}{k} \int_{A \cap S^c} L(\theta_0). $$ Combine these two inequalities, we can get, $$ \int_{S \cap A^c} L(\theta_1) -\int_{A \cap S^c} L(\theta_1) \ge \frac{1}{k} \int_{S \cap A^c} L(\theta_0) - \frac{1}{k} \int_{A \cap S^c} L(\theta_0). $$ It also implies, $$ \int_{S} L(\theta_1) -\int_{A } L(\theta_1) \ge \frac{1}{k} \left[ \int_{S \cap A^c} L(\theta_0) - \int_{A \cap S^c} L(\theta_0) \right]. $$ However, \begin{align*} \int_{S \cap A^c} L(\theta_0) - \int_{A \cap S^c} L(\theta_0) &= \int_{S \cap A^c} L(\theta_0) + \int_{S \cap A} L(\theta_0) - \int_{S \cap A} L(\theta_0)- \int_{A \cap S^c} L(\theta_0)\\ &= \int_{S } L(\theta_0) - \int_{A } L(\theta_0)\\ &= P[X \in S | \theta = \theta_0]- P[X \in A | \theta = \theta_0]\\ &= \alpha -\alpha =0. \end{align*} Hence, $$ \int_{S} L(\theta_1) -\int_{A } L(\theta_1) \ge \frac{1}{k} \left[ \int_{S \cap A^c} L(\theta_0) - \int_{A \cap S^c} L(\theta_0) \right] = \frac{1}{k} \cdot 0. $$ For the discrete random variables, just replace the integration with summation. ### Monotone Likelihood Ratio The real-parameter family of densities $f(\theta)$ is said to have monotone likelihood ratio if there exists a real-valued function $T(x)$ such that for any $\theta < \theta'$ the distributions $F(x;θ)$ and $F(x;θ')$ are distinct, and the ratio $f(x; \theta')/f(x; \theta)$ is a nondecreasing function of $T(x)$. </details> ### Monotone Likelihood ratio We say that th e likelihood $L(\theta; x)$ has monotone likelihood ratio iin the sstatistic $T(x)$ is, for $\theta_1 < \theta_2$, the ratio $$ \frac{L(\theta_1, x)}{L(\theta_2, x)} $$ is a monotone function of $T(x)$. ## Reference https://en.wikipedia.org/wiki/Likelihood-ratio_test https://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma Hogg, R. V., & Craig, A. T. (1995). Introduction to mathematical statistics.(5"" edition). Englewood Hills, New Jersey.