[**:house: Home**](https://hackmd.io/s/rkkDP_l4M) | [:boy: **About**](https://hackmd.io/s/B149Z8v7b) | [**:microscope: Researches**](https://hackmd.io/s/rJPFNKlVz) | [**:rocket: Side projects**](https://hackmd.io/s/H1aS2qe4G) | [**:airplane: Life gallery**](https://hackmd.io/s/HJN4JslNM)
[CERN Approval](https://hackmd.io/KwZgTAnAHNYLRgOxgCZwCwCMwAY5RShDgFMoBjCAQ3JRJIkqA===#cern-approval) **>** b-tagging commissioning
---
# [Validation] b-tagging commissioning
<div style="text-align: center;" markdown="1"><a href="https://cds.cern.ch/record/2138504">CMS-PAS-BTV-15-001</a></div>
*<div style="text-align: center;" markdown="1">`likelihood` `classification` `Monte Carol` `statistics` `collaboration`</div>*
## Introduction
The raw LHC data contains pure digital information from each pattern of detectors. To do the physics data analysis, the digital information has to be reconstructed to particle information, e.g. vertex, tracks, energy and charge, by vertexing algorithms, tracking algorithms, clustering etc...
The $\text{b}$ quark, the third generation of quark with bottom flavour in the Standard Model (SM), is an heavy particle and can not be captured alone, i.e. it always attaches with other particles (*bonded state*) and decays rapidly to more others. The mechanism for the quarks is called [*hadronization*](https://en.wikipedia.org/wiki/Hadronization). Thus, it turns to be a *jet* which is an object containing several particles. The sum of mass and momentum of jet should be the same with origin particle. The $\text{b}$ jet is the jet resulted from $\text{b}$-quark decays and hadronizations. The special feature of $\text{b}$ quark is its long *decay length*, i.e. the lifetime is larger than other quarks. With the sensitive detectors, that makes $\text{b}$ jet can be recognized.
The LHC data contains many $\text{b}$ quarks due to high energy hadron collisions. To recognize the $\text{b}$ jet, the $\text{b}$-tagging algorithms are applied in several data analyses. There are varied algorithms dependent on the difference of vertexing algorithm, topological assumption and classification models. The overall procedures are based on the topological assumption to build the classification model. The algorithms are validated with data-driven sample which contains rich $\text{b}$ jets.
## Techniques
### 1. Basic classification algorithm
The well-known property of $\text{b}$ jet is the relatively large lifetime which is about 1.5 ps ($c_{\tau} \approx 450\ \mu \rm m$), but it is still less than $\pi^\pm$ or $\rm K^\pm$ particles. This property can be observed in high resolution tracking detectors, e.g. the *track of charge particle* is incompatible with the *primary vertex (PV)* , and a *secondary vertex (SV)* displaced from PV as the below figure. Thus, these variety of reconstructed objects can be used to build observables that discriminate between $\text{b}$ and other jets. Several simple and robust algorithms use just a single observable, while others combine several of these objects to achieve a higher discrimination power. Each of these algorithms yields a single discriminator value for each jet by using *impact parameters* with the particular ***likelihood function*** basing on physics model.
<div style="text-align: center;" markdown="1"><img src="http://hep1.phys.ntu.edu.tw/~alpha78718/cv/btagging.png" height="250"></div>
<br>
Two popular algorithms in CMS are the ***Combined Secondary vertex (CSV)*** and ***Combined multivariate algorithm (CMVA)*** algorithm. Both algorithms are used different techniques to ensemble the discriminator values. *CSV* uses the ***neural network (NN)*** with one hidden layer. It ensemble all single discriminators to a powerful discriminator value. While *CMVA* uses the ***Boosted Decision Tree (BDT)*** to train the variables. The top-qaurk pair ($t\bar{t}$) ***Monte Carlo (MC)*** sample is used to be the training samples for both classification algorithms.
### 2. Validation method
The validation is done with the data sample selected for leptonic decaying $\text{t}\bar{\text{t}}$ events (samples), since the top quark decays to $\text{b}$ quark and $\text{W}$ boson, which $\text{W}$ decays to leptons ($e+\nu_e,\,\mu+\nu_{\mu},\,\tau+\nu_{\tau}$). Thus, this is the good source having rich $\text{b}$ jet. We check the variables used for training and compare the behaviors between data and MC samples.
<div style="text-align: center;" markdown="1"><img src="https://i.imgur.com/3JoC43d.png" width="350"><img src="https://i.imgur.com/yrm7iOZ.png" width="350"></div>
<br>
Overall the results have good agreement between both, i.e. the classification algorithm works in real data.
Since the variables are related to the detectors, they are sensitive to the collision condition and electronic devices. Sometimes we can see the discrepancy between data and MC. There happens once in 2016 data. Due to the new high energy, too many produced particles saturated the detector. The detectors have large *deadtime*, i.e. some signal lost. The performance of b-tagging were changed with time-dependency.
<div style="text-align: center;" markdown="1"><img src="https://i.imgur.com/bIw5vep.png" height="300"></div>
## Results
My responsibility is performing the validation of all algorithms. As mentioned in above, I experienced the bad condition of data-taking. The validation had the challenges and had to keep communicating with developers. In the end, we solve the problem by providing the scale factor depending on the taking time for all data analysis in CMS. This gives a chance to show my ability in solving problem. I got the invitation from this international group to give the official summery at CERN, see the [publication](https://cds.cern.ch/record/2160345?ln=en).
## References
- [Identification of b-quark jets with the CMS experiment](http://iopscience.iop.org/article/10.1088/1748-0221/8/04/P04013/meta)
- [Identification of b quark jets at the CMS Experiment in the LHC Run 2](https://cds.cern.ch/record/2138504)
- Github : https://github.com/juifa-tsai/BTaggingCommission
<br>
---
[:ghost: Github](https://github.com/juifa-tsai) | [:busts_in_silhouette: Linkedin ](https://www.linkedin.com/in/jui-fa-tsai-08ba0a93)