# Benchmark $B^+\rightarrow K^+J/\psi \mu^+\mu^-$ analysis
## Work status
### Datasets
* [x] THEORETICAL MOTIVATION
* [ ] CHOICE OF DATASET
* [ ] DATA SAMPLES, STRIPPED
* [ ] 2015 ?
* [x] 2016
* [x] 2017
* [x] 2018
* [ ] SIGNAL SIMULATION SAMPLES B->K CHIC MM
* [ ] 2015 ?
* [x] 2016
* [x] 2017
* [x] 2018
* [x] Truth-matching
* [ ] PEAKING BACKGROUND SIMULATION SAMPLES
* [ ] * [ ] 2015 ?
* [x] 2016
* [x] 2017
* [x] 2018
* [x] Truth-matching
* [x] Stripping line
### Selection
* [x] TRIGGER CUTS
* [x] OFFLINE SELECTION
* [ ] COMBINATORIAL BACKGROUND BOOSTED DECISION TREE
* [x] Variables list
* [x] Performance
* [x] Train-test
* [x] Correlations
* [ ] hyperparameters
* [x] Manual optimization
* [ ] Proper systematic optimisation
* [ ] Epochs plot
* [ ] MISIDENTIFIED BACKGROUND STUDY
* [x] D0->Kpi MC gen-level study, kink study
* [ ] ...
* [ ] MISIDENTIFIED BACKGROUND BOOSTED DECISION TREE
* [x] Variables list
* [x] Performance
* [x] Train-test
* [x] Correlations
* [ ] Hyperparameters choice
* [x] Manuel optimisation
* [ ] optimisation
* [ ] Epochs plot
* [ ] k folding
* [ ] OPTIMISATION OF THE MULTIVARIATE SELECTION
* [ ] Method using FoM and inverted cuts -> validated, final ?
* [x] Combinatorial fit of the upper sideband + extrapolation
* [x] MisID fit including decay in flight and punch-through contributions
* [x] Inverted cut transfer factor exactrction (redo for each cut pair)
* [x] Final 2d scan Figure of merit and result on the optimal cut
* [ ] Other way?
* [ ] PID CORRECTIONS (!!)
* [ ] Correcting PID distributions in (P,PT, ntrack) bins using PIDGen package
* [ ] SIMULATION CORRECTIONS
* [ ] EXPECTED BACKGROUND CONTRIBUTION USING DATA-DRIVEN METHOD
* [x] D0->pipi results and choice not to keep it (Xiafei's studies on old setup)
* [ ] Ks->pipi fake rate maps (ongoing)
* [x] prompt Ks selection choice
* [x] sWeights recompuations using #splot method
* [x] Fake rate maps per-year computation
* [x] Closure tests on data
* [x] Total background evaluation
* [ ] Real muons pollution in signal selection using normalisation channel (ongoing)
* [ ] EFFICIENCIES
* [x] Signal efficiency
* [ ] Control data muons efficiency
* [x] @optimal cuts
* [ ] @cuts pair to insure flat efficiency (--> what to do with this information)
* [ ] Resonances vetoing
* [x] Chic yield evaluation
* [ ] Chic vetoing?
* [ ] Include $\rho/\omega \to \mu \mu$ resonances in the analysis ? does it make sense for dimuon tagging?
### Mass fits
* [ ] $B\to KJ/\psi \mu\mu$ DATA BLINDED FIT
* [ ] MisID bkg "shape" from data-driven method
* [ ]
* [ ] NORMALISATION CHANNEL $B\to KJ/\psi \mu\mu$ FIT
* [ ]
* [ ] Corrections to the MC (Parzival)
* [ ] Efficiency computation (Parzival)
### Branching ratio
* [ ] Result
### Systematics
* [ ] Fit model, pull distributions (toys)
* [ ] Efficiencies
* [ ] BDT
* [ ] PID
## Repositories:
* Sonia: https://gitlab.cern.ch/sbouchib/multilepton
* Matthieu (TP4b): https://c4science.ch/diffusion/12329/
* Elena: https://gitlab.cern.ch/egraveri/multilepton-eg
## Preselections
### Simulation models and particle codes
* [**Evttype `12145067`**](https://gitlab.cern.ch/lhcb-datapkg/Gen/DecFiles/-/blob/v30r76/dkfiles/Bu_KOmegaJpsi,mm=LSFLAT,DecProdCut.dec): $B \xrightarrow[]{\text{PHSP}}K^+J/\psi( \xrightarrow[]{\text{VLL}} \mu^+ \mu^-)\omega( \xrightarrow[]{\text{VLL}} \mu^+ \mu^-)$ with flat $\omega$ mass profile.
* **Code**:`B`$\rightarrow$`Kaon Jpsi`$(\rightarrow$`mu1 mu2`$)$`omega`$(\rightarrow$`mu3 mu4`$)$
* [**Evttype `12145068`**](https://gitlab.cern.ch/lhcb-datapkg/Gen/DecFiles/-/blob/v30r76/dkfiles/Bu_KOmegaJpsi,mm=PHSP,DecProdCut.dec): same as before with usual $\omega$ mass profile.
* **Code**:`B`$\rightarrow$`Kaon Jpsi`$(\rightarrow$`mu1 mu2`$)$`omega`$(\rightarrow$`mu3 mu4`$)$
* [**Evttype `12145069`**](https://gitlab.cern.ch/lhcb-datapkg/Gen/DecFiles/-/blob/v30r76/dkfiles/Bu_chic1K,Jpsimumu=DecProdCut.dec): $B \xrightarrow[]{\text{PHSP}}K^+\chi_{c1}( \xrightarrow[]{\text{SVS} } J/\psi \mu^+ \mu^-)$. Evttype `12145069`
* * **Code**:`B`$\rightarrow$`Kaon chic`$(\rightarrow$` Jpsi`$(\rightarrow$`mu1 mu2`$)$ `mu3 mu3`$)$
### Explicit selections
* **Precuts** when getting back DST files from the grid:
```
TCut AllCuts = " Kaon_PIDK>4 && mu1_ProbNNmu>0.1 && mu2_ProbNNmu>0.1
&& abs(Jpsi_M-3097)<50 && mu1_TRACK_GhostProb<0.3
&& mu2_TRACK_GhostProb<0.3 && mu3_TRACK_GhostProb < 0.3
&& mu4_TRACK_GhostProb < 0.3 && Kaon_ProbNNk >0.2
&& Kaon_TRACK_GhostProb<0.3
&& (B_L0MuonDecision_TOS || B_L0DiMuonDecision_TOS)
&& mu3_isMuon==1 && mu4_isMuon==1";
```
* **Stripping lines**
* **Inclusive Jpsi**: FullDSTDiMuonJpsi2MuMuDetachedLine. Current used one, coded as `inclusiveJpsi`. https://lhcbdoc.web.cern.ch/lhcbdoc/stripping/config/stripping34/dimuon/strippingfulldstdimuonjpsi2mumudetachedline.html
* JpsiKmumu: The one used previously for feasibility studies, coded as`stripped`. https://lhcbdoc.web.cern.ch/lhcbdoc/stripping/config/stripping34r0p2/dimuon/strippingb2mumumumub2jpsikmumuline.html
* **Preselection**: code = `preselected`
```
"Kaon_PIDK>-2
&& mu1_ProbNNmu>0.05 && mu2_ProbNNmu>0.05 && abs(Jpsi_M-3097)<50
&& mu1_TRACK_GhostProb<0.3 && mu2_TRACK_GhostProb<0.3
&& mu3_TRACK_GhostProb < 0.3 && mu4_TRACK_GhostProb < 0.3
&& Kaon_TRACK_GhostProb<0.3
&& (B_L0MuonDecision_TOS || B_L0DiMuonDecision_TOS)
&& B_DIRA_OWNPV > 0.9999"
# && B_LOKI_DTF_CHI2NDOF<5
```
* **Signal cuts**
$M(B)= 5.27970e+03$
$\sigma(B) = 8.08554e+00$
For 1$\sigma$ window: `B_LOKI_MASS_JpsiConstr>5271.61446&&B_LOKI_MASS_JpsiConstr<5287.78554`
For 2$\sigma$ window: `B_LOKI_MASS_JpsiConstr>5263.52892&&B_LOKI_MASS_JpsiConstr<5263.52892`
* **Uppersidband**: for background training sample
```
B_LOKI_MASS_JpsiConstr>5400
```
* **Truth-matched** : for signal MC
```
(B_BKGCAT==10 || B_BKGCAT==50)
```
### Samples for $B \rightarrow K J/\psi \mu^+\mu^-$
#### LPHE cluster
Latest versions
* **MC2018 signal ntuple**, Evttype `12145069`
* Location: `/panfs/bouchiba/TP4b_spring22/NTUPLES/BDT1update_ntuples/BDT1added_truthmatched_cleanChi2Vtx234_preselectedDimuon_inclusiveJpsi_gangaAll_2018_MC69_MM_220412.root`
* Cuts applied: precuts, inclusive Jpsi stripping, preselection, cleanChi2Vtx234. Branch `BDT1` added: BDTG1 response, including Kmu3mu4 vertex $\chi^2$.
* **Data 2018 ntuple**, latest version:
* Location: `/panfs/bouchiba/TP4b_spring22/NTUPLES/BDT1update_ntuples/BDT1added_preselectedDimuon_inclusiveJpsi_gangaAll_2018_Data_MD_220412.root`
* Cuts applied: precuts, inclusive Jpsi stripping, preselection. BDTG1 including Kmu3mu4 vertex $\chi^2$ applied in branch `BDT1`
### Collection
Branches:
```
{ "B_LOKI_MASS_JpsiConstr",
"B_M", "B_PX", "B_PY", "B_PZ", "B_PE", "B_PT",
//"B_BKGCAT",
"B_IPCHI2_OWNPV", "B_DIRA_OWNPV", "B_LOKI_DTF_CHI2NDOF", "B_ENDVERTEX_CHI2",
"Kaon_M", "Kaon_PX", "Kaon_PY", "Kaon_PZ", "Kaon_PE", "Kaon_PT", "Kaon_ETA",
"Jpsi_M", "Jpsi_PX", "Jpsi_PY", "Jpsi_PZ", "Jpsi_PE", "Jpsi_PT",
"Jpsi_ETA", "Jpsi_FDCHI2_OWNPV",
"mu1_M", "mu1_P", "mu1_PX", "mu1_PY", "mu1_PZ", "mu1_PE", "mu1_PT", "mu1_isMuon",
"mu2_M", "mu2_P", "mu2_PX", "mu2_PY", "mu2_PZ", "mu2_PE", "mu2_PT", "mu2_isMuon",
"phi_M", "phi_PX", "phi_PY", "phi_PZ", "phi_PE", "phi_PT",
"phi_ETA", "phi_ENDVERTEX_CHI2",
"mu3_M", "mu3_P", "mu3_PX", "mu3_PY", "mu3_PZ", "mu3_PE", "mu3_PT",
"mu3_ProbNNmu", "mu3_ProbNNk", "mu3_PIDmu", "mu3_PIDK", "mu3_TRACK_CHI2",
"mu3_hasCalo", "mu3_EcalPIDmu", "mu3_HcalPIDmu", "mu3_RichDLLmu", "mu3_RichDLLk",
"mu3_MuonMuLL", "mu3_MuonChi2Corr", "mu3_TRACK_MatchCHI2", "mu3_isMuon",
"mu4_M", "mu4_P", "mu4_PX", "mu4_PY", "mu4_PZ", "mu4_PE", "mu4_PT",
"mu4_ProbNNmu", "mu4_ProbNNk", "mu4_PIDmu", "mu4_PIDK", "mu4_TRACK_CHI2",
"mu4_hasCalo", "mu4_EcalPIDmu", "mu4_HcalPIDmu", "mu4_RichDLLmu", "mu4_RichDLLk",
"mu4_MuonMuLL", "mu4_MuonChi2Corr", "mu4_TRACK_MatchCHI2", "mu4_isMuon",
});
```
## Soft muon BDT
### Variables
```
'muX_P',
'muX_PT',
#'muX_TRACK_CHI2’, -> technical issue being fixed
'log10(abs(muX_ProbNNmu))',
'log10(abs(muX_ProbNNk))',
'muX_PIDmu - muX_PIDK',
'muX_RichDLLmu',
'muX_RichDLLk',
'muX_TRACK_MatchCHI2',
'muX_CaloEcalE',
'muX_MuonMuLL’
```
### Hyperparams/algo
```
X_dev,X_eval, y_dev,y_eval = train_test_split(X, y,
test_size=0.33, random_state=42)
X_train,X_test, y_train,y_test = train_test_split(X_dev, y_dev,
test_size=0.33, random_state=492)
print("X_test",X_test.shape)
dt = DecisionTreeClassifier(max_depth=3, min_samples_leaf=1)
bdt = AdaBoostClassifier(dt, algorithm='SAMME', n_estimators=450, learning_rate=0.5)
```
### Training samples
###
## Ongoing studies
### $B^+\rightarrow K^+ J/\psi \pi^+ \pi^-$
Normalization channel. Evaluate preselection effect.