# Paper XX
---
## Ensemble selection from libraries of models
---
## Brief Introduction
- Author: Rich Caruana (cornell)
- Published at 2004 ICML
- \# of citations: 819
----
## Links
- Paper link: https://www.cs.cornell.edu/~alexn/papers/shotgun.icml04.revised.rev2.pdf
- Implementation on github: https://github.com/automl/auto-sklearn/blob/master/autosklearn/ensembles/ensemble_selection.py
---
## Main Idea
- Propose a ensemble selection method which use **forward stepwise selection** from libraries of thousands of models to build **ensembles**.
- Ensemble selection’s most important feature is that it can optimize ensemble performance to **any easily computed performance metric**.
----
## Main Idea Conti.
- Experiments with **7 test problem** and **10 performance metrics** show that ensemble selection consistently finds ensembles that **outperform** all other models and ensemble methods.
---
## Forward stepwise selection

---
### Issue of simple forward selection
- The simple forward model selection procedure presented in the Introduction is fast and effective, but sometimes overfits to the hillclimbing (validation) set, reducing ensemble performance.
---
## Propose Method Overview
- Made 3 enhancements to the simple forward selection procedure to reduce overfitting
1. Selection with Replacements
2. Sorted Ensemble Initialization
3. Bagged Ensemble Selection
---
## Selection with Replacements
- Selection with replacement allows models to be added to the ensemble multiple times.

Note:
- Situation:
- With model selection without replacement, performance improves as the best models are added to the ensemble, peaks, and then quickly declines.
- Performance drops because the best models in the library have been used and selection must now add models that hurt the ensemble.
---
## Sorted Ensemble Initialization
- Starting with an empty ensemble -> top N models.
- N is chosen by looking at performance on the hillclimbing (validation) set.
- This typically adds 5-25 of the best models to an ensemble before greedy stepwise selection begins.
Note:
Authors train 2000 models to do ensemble.
---
## Bagged Ensemble Selection
- Bag ensemble selection : Drawing a random sample of models from the library and selecting from that sample.
- Issue want to fix: As the number of models in a library increases, the chances of finding combinations of models that overfit the hillclimbing set increases.
---
## Experiment
- 7 datasets and converted these to binary classification problem.
- We used training sets of 5000 points. Each training sample was split into a train set of 4000 points and a hillclimbing/validation set of 1000 points. The final test sets for most of these problems contain 20,000 points
---
## Metrics
- Use 10 metrics: accuracy (ACC), root-mean-squared-error (RMS), mean cross-entropy (MXE), lift (LFT), precision/recall break-even point (BEP), precision/recall F-score (FSC), average pre-cision (APR), ROC Area (ROC), and a measure of probability calibration (CAL). The tenth metric is SAR = (ACC + ROC + (1 − RM S))/3.
---
## Result

---
## Normalized Score
123123
{"metaMigratedAt":"2023-06-16T12:41:15.951Z","metaMigratedFrom":"Content","title":"Paper XX","breaks":true,"contributors":"[{\"id\":\"dbcb44b8-adf7-4237-8e1e-ec7e7618f77b\",\"add\":3725,\"del\":451}]"}