# Journal Club: Higgs Combination Statistics
- decided to have this note instead of a presentation
- talking points for the discussion
- can take more notes during the discussion
- focus on the general limit setting procedure
---
### Note
Procedure for the LHC Higgs boson search combination in Summer 2011
https://cds.cern.ch/record/1379837/files/NOTE2011_005.pdf
---
### 1 Introduction
### 2 Limit setting procedure for the summer 2011
- based on the modified frequentist method, often referred to as CLs
- denote signal and background counts as $s(\theta)$ and $b(\theta)$ (function of nuisance parameters)
- The systematic error pdfs $ρ(θ|\tilde θ)$, where $\tilde θ$ is the default value of the nuisance parameter, reflect our degree of belief on what the true value of θ might be. -> different distributions possible , see later
> Next, we take a conceptual step to re-interpret systematic error pdfs $ρ(θ|\tildeθ)$ as posteriors arising from some real or imaginary measurements $\tildeθ$, as given by the Bayes’ theorem:
> 
- we start with $\rho$ (Bayes approach) and want to express that in $p$ (frequentist approach)
- use the trick to go into frequentist framework
- -> use asymptotic formula then... but can also be used wit the $\rho$ :question: :question: :question:
:question: Bayes vs Frequentist approach?
- Bayes tests model on the data, frequentists test the data on the model
- Bayes uses **prior**, and current data to predict outcome
- Frequentist only uses current data
#### 2.1 Observed limits
1) Construct the likelihood function $L(\text{data}|\mu,θ)$

2) construct the test statistic $\tilde q_μ$

- llr is the most powerful discriminator (type-II error - failing to reject wrong null hypothesis - is smallest)
- this is a **profile log-likelihood-ratio**, denominator is maximum likelihood (the best you can do), nominator is maximum likelihood given a certain $\mu$ (select the best $\theta$)
3) Find the observed value of the test statistic $\tilde q_{\mu}^{obs}$ for the given signal strength modifier μ under test
- just put the data into the formulas (scan over $\mu$)
4) Find values of the nuisance parameters $\hat \theta_0^{obs}$ and $\hat \theta_\mu^{obs}$ best describing the experimentally observed data (i.e. maximising the likelihood as given in Eq. 2), for the background-only and signal+background hypotheses, respectively.
- this is done using step 3) for $\mu$ = 0, too:question:
- you get first $\tilde q_0^{obs}$ and $\tilde q_\mu^{obs}$, then from the obtained values you extract $\hat \theta_0^{obs}$ and $\hat \theta_\mu^{obs}$ respectively
5) Generate toy Monte Carlo pseudo-data to construct pdfs $f(\tilde q_{\mu}|\mu,\hat \theta^{obs}_\mu)$ and $f(\tilde q_{\mu}|0,\hat \theta^{obs}_0)$ assuming a signal with strength μ in the signal + background hypothesis and for the background-only hypothesis (μ=0)

- draw pseudo-data from $\text{Poisson}_{pdf}(\mu \cdot s(\theta) + b(\theta))$ using the $\hat \theta_0^{obs}$ and $\hat \theta_\mu^{obs}$ determined on the data-data; let the $\hat{\theta}$ float in the test statistics determination
- one set of drawn pseudo-data = one toy
>Note that for the purposes of generating a pseudo-dataset, the nuisance parameters are fixed to the values $\hat\theta^{obs}_\mu$ or $\hat\theta^{obs}_0$ obtained by fitting the observed data, but are allowed to float in fits needed to evaluate the test statistic
- $\mathcal L(\text{pseudo-data} | \mu, \theta) = \text{Poisson}(\text{pseudo-data} | \mu\cdot s+b)\cdot p(\tilde \theta| \theta)$, pseudo-data from Poisson$_{pdf}(\mu\cdot s+b)$ or from Poisson$_{pdf}(\text{data})$? -> from Poisson$_{pdf}(\mu\cdot s+b)$ "because it gives better coverage" :question:
- :question: how would $\tilde q_0^{obs}$ and $\tilde q_\mu^{obs}$ look in the plot?
- *Martin's hypothesis: we'd have a different plot for $\tilde q_0^{obs}$*
- Ben's hypothesis (can we write a test-static for that? :P ): blue=test how compatible the background like data is with a s+b model, red = test compatibility of s+b like data with s+b model
- Judith's notes: each test statistics is for one hypothesis: bkg-only or sig+bkg; toys can be generated under both of these hypotheses -> 4 pdfs $f(\tilde q_{X}|X,\hat \theta^{obs}_X)$
6) Having constructed $f(\tilde q_{\mu}|\mu,\hat \theta^{obs}_\mu)$ and $f(\tilde q_{\mu}|0,\hat \theta^{obs}_0)$ distributions, we define two p-values to be associated with the actual observation for the signal+background and background-only hypotheses, $p_\mu$ and $p_b$:


and calculate CLs(μ) as a ratio of these two probabilities

- define $1-p_b$ for integral border-reasons
7) If, for μ= 1, CLs ≤ α, we would state that the SM Higgs boson is excluded with (1−α) CLs confidence level (C.L.)
- low CLs means good exclusion; limit gets better if the red tail gets smaller or when the blue tail gets bigger
8) To quote the 95% Confidence Level upper limit on μ, to be further denoted as μ95%CL, we adjust μ until we reach CLs = 0.05
#### 2.2 Expected limits
most straightforward way for defining the expected median upper-limit and ±1σ and ±2σ bands for background-only hypothesis
- generate a large set of background-only pseudo-data
- calculate CLs and μ95%CL for each of them, as if they were real data
- then, one can build a cumulative probability distribution of results by starting integration from the side corresponding to low event yield
- the point at which the cumulative probability distribution crosses the quantile of 50% is themedian expected value. The ±1σ(68%) band is defined by the crossings of the 16% and 84% quantiles. Crossings at 2.5% and 97.5% define the ±2σ(95%) band.

- slighly different method used for less CPU consumption: the distributions of the teststatistic for a given μdo not depend on the pseudo-data, so they can be computed only once
### 3 Quantifying an excess of events for summer 2011
#### 3.1 Fixed Higgs boson mass mH
test statistics for a fixed Higgs boson mass:

> Following the frequentist convention for treatment of nuisance parameters as discussed in Section 2, we build the distribution f(q0|0,ˆθobs0) by generating pseudo-data for nuisance parameters around ˆθobs0 and event counts following Poisson probabilities under the assumption of the background-only hypotheses.

#### 3.2 Estimating the look-elsewhere effect
### 4 Higgs mass points

### 5 Systematic Uncertainties
#### 5.1 Systematic uncertainty probability density functions
-
#### 5.2 Uncertainties correlated between experiments
##### 5.2.1 Naming convention
##### 5.2.2 Total cross sections
##### 5.2.3 Acceptance uncertainties
##### 5.2.4 Cross section times acceptance uncertainties for gg→H+ 0/1/2-jets
##### 5.2.5 Uncertainties in modelling underlying event and parton showering
##### 5.2.6 Instrumental uncertainties
### 6 Format of presenting results
### 7 Technical combination exercises (validation and synchronisation)
#### 7.1 H→WW→ νν + 0 jets
#### 7.2 H→WW→ νν + 0/1/2−jets
#### 7.3 (H→WW) + (H→γγ) + (H→ZZ→4l)
### 8 Summary