---
tags: metric, memo, public
---
# Likelihood Ratio Test
## Notation
- $H_0$: null hypothesis.
- $H_1$: alternative hypothesis.
- $\Theta$: parameter space, a subset of $\mathbb{R}^l$.
- $\theta$: parameter, $\theta \in \Theta$.
- $\Theta_0$: subsets of $\Theta$, corresponding to null hypothesis
- $\Theta_1$: subsets of $\Theta$, corresponding to alternative hypothesis.
- $X$: random variable, with known distribution family and unknow parameter.
- $L(\theta)$: $X$'s pdf/pmf/likelihood function, which depends on $\theta$.
- $S_1$: rejection region/critical region, lossely, $S=S_1$.
- $d$: decision, can be $1$ (reject) or $0$ (accept).
- $\delta:X \to \{0,1\}$: decision function. With randomization test, it becomes $\delta: X \to [0,1]$.
## Problem
There is a random variable $X$. Before observation, we know it belongs to (some) distributions with a parameter $\theta \in \Theta$, but we do not know the exact $\theta$.
There is the null hypothesis that $\theta \in \Theta_0$, and the alternative hypothesis is $\theta \in \Theta_1$.
Our task is using the observation of $X$ to accept/reject null hypothesis or alternative hypothesis. That is, we need to construc a decision function $\delta: X \to [0,1]$ to make decision.
On the other hand, the decision function is based on a critical region $S$, which is a subset of $X$'s support.
\begin{align*}
\delta (x) =1, \text{ if } x \in S,\\
\delta (x) =0, \text{ if } x \notin S,\\
\end{align*}
## Errors
There are two kinds of errors,
- Type I: reject $H_0$ when $H_0$ is true.
- Type II: accept $H_0$ when $H_1$ is true.
We asuume type I error is more serious than type II error.
## $\alpha$ and $\beta$
There are two terms refer to the porbability of type I error.
### Size
Given a test $\delta$, the size is the "highest" probability of type I error,
$$
size = \sup_{\theta \in \Theta_0} P(\delta(X)=1).
$$
### Significant level
On the other hand, we can set an $\alpha \in [0,1]$ as the significant level, that is, the probability of type I error cannot exceed $\alpha$,
$$
P(\delta(X)=1) \le \alpha \quad \forall \theta \in \Theta_0
$$
### Power function
For the same test $\delta$, the probability of rejecting $H_0$ may vary on different $\theta$. The power function is the probability $\delta$ rejects $H_0$ given $\theta$,
$$
\beta(\theta) = P(\delta(X) = 1; \theta).
$$
## Likelihood
Suppose $X$ is a random variable, it (mostly) has a probability density function (pdf) or probability mass function (pmf), $f(x)$. Here, we rename them as likelihood function.
$$
L(\theta) = f(\theta)
$$
Suppose we observe a realization $x$. The intuition is, given a pdf/pmf, we can evaluate how "likely" of a specific realization $x$.
$$
L(x; \theta).
$$
### Maximum Likelihood Estimator
Suppose we know $X$ follow a distribution with unknow parameter $\theta \in \Theta$. With the observation $x$, we can estimate $\theta$ as,
$$
\hat{\theta}_{MLE} = \sup_{\theta \in \Theta} L(x; \theta).
$$
## Likelihood Ratio Test
Following the same notion in MLE, we can use likelihood function to construct test. Given the observation $x$, $\Theta_0$, $\Theta_1$, we can calculate,
$$
r=\frac{\sup_{\theta \in \Theta_1} L(x; \theta)}{\sup_{\theta \in \Theta_0} L(x; \theta) }
$$
with a decision rule,
$$
\delta(x) = \begin{cases} 1 & \text{ if } r > k\\
0 & \text{ if } r \le k\\
\end{cases}
$$
where $k$ is a threshold to be calibrated by $\alpha$/size.
## Neyman-Pearson lemma
### Statement
Consider a test with hypotheses $H_0 : \theta = \theta_0$ and $H_{1}:\theta =\theta_{1}$, with pdf/pmf $L(x ; \theta_0), L(x;\theta_1)$, respectively. Let $k$ be a positive number and $S$ be a subset of the sample space such that
1. $$\frac{ L(x; \theta_0)}{ L(x; \theta_1) }\le k, \quad \forall x \in S.$$
2. $$\frac{ L(x; \theta_0)}{ L(x; \theta_1) } \ge k, \quad \forall x \in S^c.$$
3. $$\alpha = P[X \in S | \theta = \theta_0]. $$
Then this test is the most powerful test of size $\alpha$ for testing the simple hypothesis $H_0: \theta=\theta_0$ against $H_1: \theta=\theta_1$.
<details>
<summary>
Proof:
</summary>
Consider the case of continuous variable.
Let $A$ be any other critical region with size $\alpha$. We wnat to show that
$$
P[X \in S | \theta = \theta_1]- P[X \in A | \theta = \theta_1] = \int_S L(\theta_1) - \int_A L(\theta_1) \ge 0
$$
We can decompose the two regions as the union of two disjoint set,
\begin{align*}
S&= \left (S \cap A \right) \cup \left (S \cap A^c \right)\\
A&= \left (A \cap S \right) \cup \left (A \cap S^c \right)\\
\end{align*}
Hence,
\begin{align*}
\int_S L(\theta_1) - \int_A L(\theta_1) &= \int_{S \cap A} L(\theta_1) + \int_{S \cap A^c} L(\theta_1) - \int_{A \cap S} L(\theta_1) -\int_{A \cap S^c} L(\theta_1) \\
&= \int_{S \cap A^c} L(\theta_1) -\int_{A \cap S^c} L(\theta_1).
\end{align*}
However, by the condition 1, $L(\theta_1)\ge k^{-1} L(\theta_0)$ at each point of $S$, and hence at each point of $S \cap A^c$, thus
$$
\int_{S \cap A^c} L(\theta_1) \ge \frac{1}{k} \int_{S \cap A^c} L(\theta_0).
$$
By condition 2, $L(\theta_1)\le k^{-1} L(\theta_0)$ at each point of $S^c$, and hence at each point of $A \cap S^c$, thus,
$$
\int_{A \cap S^c} L(\theta_1) \le \frac{1}{k} \int_{A \cap S^c} L(\theta_0).
$$
Combine these two inequalities, we can get,
$$
\int_{S \cap A^c} L(\theta_1) -\int_{A \cap S^c} L(\theta_1) \ge \frac{1}{k} \int_{S \cap A^c} L(\theta_0) - \frac{1}{k} \int_{A \cap S^c} L(\theta_0).
$$
It also implies,
$$
\int_{S} L(\theta_1) -\int_{A } L(\theta_1) \ge \frac{1}{k} \left[ \int_{S \cap A^c} L(\theta_0) - \int_{A \cap S^c} L(\theta_0) \right].
$$
However,
\begin{align*}
\int_{S \cap A^c} L(\theta_0) - \int_{A \cap S^c} L(\theta_0) &= \int_{S \cap A^c} L(\theta_0) + \int_{S \cap A} L(\theta_0) - \int_{S \cap A} L(\theta_0)- \int_{A \cap S^c} L(\theta_0)\\
&= \int_{S } L(\theta_0) - \int_{A } L(\theta_0)\\
&= P[X \in S | \theta = \theta_0]- P[X \in A | \theta = \theta_0]\\
&= \alpha -\alpha =0.
\end{align*}
Hence,
$$
\int_{S} L(\theta_1) -\int_{A } L(\theta_1) \ge \frac{1}{k} \left[ \int_{S \cap A^c} L(\theta_0) - \int_{A \cap S^c} L(\theta_0) \right] = \frac{1}{k} \cdot 0.
$$
For the discrete random variables, just replace the integration with summation.
### Monotone Likelihood Ratio
The real-parameter family of densities $f(\theta)$ is said to have monotone
likelihood ratio if there exists a real-valued function $T(x)$ such that for any $\theta < \theta'$ the distributions $F(x;θ)$ and $F(x;θ')$ are distinct, and the ratio $f(x; \theta')/f(x; \theta)$ is a nondecreasing function of $T(x)$.
</details>
### Monotone Likelihood ratio
We say that th e likelihood $L(\theta; x)$ has monotone likelihood ratio iin the sstatistic $T(x)$ is, for $\theta_1 < \theta_2$, the ratio
$$
\frac{L(\theta_1, x)}{L(\theta_2, x)}
$$
is a monotone function of $T(x)$.
## Reference
https://en.wikipedia.org/wiki/Likelihood-ratio_test
https://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma
Hogg, R. V., & Craig, A. T. (1995). Introduction to mathematical statistics.(5"" edition). Englewood Hills, New Jersey.