Statistics notes

# Statistics notes ### Poisson distribution formula A discrete random variable X is said to have a Poisson distribution, with parameter $\lambda >0$, if it has a probability mass function given by $f(k; \lambda) = Pr(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$ where - k is the number of occurrences ( k = 0 , 1 , 2... ) - e is Euler's number ( e = 2.71828...) - ! is the factorial function The positive real number λ is equal to the expected value of X and also to its variance: $\lambda = E(X) = Var(X)$. (**lambda is mean**) ### Expectation and Variance #### Mean X: a discrete random variable with p.m.f. p If $\sum |x| p(x) < \inf$, then the **expectation of X** exists and is given by: $E(X) = \sum x \cdot p(x)$ The **mean** of X is defined to be $\mu = E(X)$. #### Variance X: a random variable with finite mean $\mu$ such that $E((X - \mu))$ exists. The **variance** of X is defined to be $\sigma^2 = E( (X - \mu)^2 )$. ### Likelihood **likelihood function** (likelihood) - measures the goodness of fit of a statistical model to a sample of data for given values of the unknown parameters - formed from the joint probability distribution of the sample, but viewed and used as a function of the parameters only -> treating the random variables as fixed at the observed values Let $X$ be a discrete random variable with probability mass function $p$ depending on a parameter $\theta$. Then the function $\mathcal{L} (\theta \mid x) = p_{\theta }(x) = P_{\theta}(X = x),$ considered as a function of $\theta$, is the likelihood function, given the outcome $x$ of the random variable $X$. - "likelihood of a parameter $\theta$ given the data x" = $P(x \mid \theta)$ - likelihoods do not have to integrate (or sum) to 1, unlike probabilities. - Given a parameterized family of probability density functions (or probability mass functions) $x\mapsto f(x\mid \theta )$, where $\theta$ is the parameter, the likelihood function is $\theta \mapsto f(x\mid \theta )$, written ${\mathcal {L}}(\theta \mid x)=f(x\mid \theta )$, where $x$ is the observed outcome of an experiment. In other words, when $f(x\mid \theta )$ is viewed as a function of $x$ with $\theta$ fixed, it is a probability density function, and when viewed as a function of $\theta$ with $x$ fixed, it is a likelihood function. ### Test statistic & p-values - A **statistic** (singular) or sample statistic is any quantity computed from values in a sample that is used for a statistical purpose. - A **test statistic** is a statistic used in statistical hypothesis testing. - a test statistic is selected or defined in such a way as to quantify behaviours that would distinguish the null from the alternative hypothesis (in observed data) - its sampling distribution under the null hypothesis must be calculable (exactly or approximately; p-values to be calculated) - In null hypothesis significance testing, the **p-value** is the probability of obtaining test results at least as extreme as the results actually observed (under the assumption that the null hypothesis is correct). - A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. - **statistical hypothesis**: every conjecture concerning the unknown probability distribution of a collection of random variables representing the observed data $X$ in some study - If we state one hypothesis only and the aim of the statistical test is to see whether this hypothesis is tenable, but not, at the same time, to investigate other hypotheses, then such a test is called a **significance test**; often, we reduce the data to a single numerical statistic $T$ whose marginal probability distribution is closely connected to a main question of interest in the study. - The **p-value** is used in the context of null hypothesis testing in order to quantify the idea of statistical significance of evidence, the evidence being the observed value of the chosen statistic $T$. - Thus, the only hypothesis that needs to be specified in this test and which embodies the counterclaim is referred to as the null hypothesis; that is, the hypothesis to be nullified. - :bulb: when calculating a limit on a signal the null hypothesis is the signal + background assumption - A result is said to be **statistically significant** if it allows us to reject the null hypothesis. (The result being statistically significant is highly improbable if the null hypothesis is assumed to be true.) - rejection of the null hypothesis says nothing about which alternative hypothesis is viable *(A rejection of the null hypothesis implies that the correct hypothesis lies in the logical complement of the null hypothesis. But no specific alternatives need to have been specified. The rejection of the null hypothesis does not tell us which of any possible alternatives might be better supported. However, the user of the test chose the test statistic $T$ in the first place probably with particular alternatives in mind; such a test is often used precisely in order to convince people that those alternatives are viable because what was actually observed was extremely unlikely under the null hypothesis.)* --- ### Monte Carlo $I = \int f(\vec{x}) d\vec{x} \ \overset{MC}{\approx} \ \frac{V}{N} \sum_{i=0}^{N} f(\vec{x}_i) \quad$ with $\quad V = \int d\vec{x}$ $\sqrt{}$