---
tags: Master, Simulation
---
# Simulation and Performance Evaluation
[TOC]
## 1. Methodology introduction
**Syllabus:**
- Methodology introduction
- Review of basic probability concepts
- Discrete-Event Simulation
- Input modeling
- Random number generation
- Output analysis
- Variance reduction techniques
**Course structure:**
- front **lessons**
- some **homeworks** (some extra hours dedicated to them) (30%)
- final **project** + **presentation** (max 3 people) (40%)
- written **exam** (30%)
- around 90 mins
- multiple choice + open answers
**Simulation:** `imitation` of a real world system based on a model
- many levels of abstraction (from real world simulations to abstraction in mathematical ones)
### Performance evaluation
CHECKLIST:
- [x] Know your **goals:**
- [x] Identify **factors:** elements that may affect performance
1. **Desired factors:** intensity of load, number of servers, …
2. **Nuisance factors:** time of the day, denial-of-service attacks, …
- :warning: `Hidden factors` will fool you ❗❗
- :warning: `correlation` and `causality` may be misunderstood!
- **Simpson’s paradox:** (_same examples as in algoritmi avanzati_)
- phenomenon in which `a trend` appears in several different groups of data but `disappears or reverses when these groups are combined`
- [x] Define **metrics:**
- measures, are often `multi-dimensional`
- `sampling method:` operational conditions under which the metric is measured
- **Kiviat diagram** (radar diagram)
- `visual representation` of multi-dimensional metrics
- arrange metrics so that the closer to the center, the lesser the performance
- [x] Define the offered **load** under which your system operates:
- `benchmarks:` artificial load generators that statistically mimic real traffic
- [x] Know common **patterns:** _very common traits_ found in different situations
- 👁🗨knowing some of them `may save a lot of time` and often greatly simplify the evaluation task
1. `Bottlenecks:` common nuisance factors
- there can be many at once
- role not always trivial, may hide other factors
- in many cases are the only nuisance factors to consider, simplifies performance analysis
2. `Congestion collapse`
- offered load increases, work done decreases
- can be `latent`
3. `Competition` side effect
- [x] Know your system well
## 2. Probability Review
Classic statistics revision...
**Markov’s inequality** ([Wiki](https://en.wikipedia.org/wiki/Markov%27s_inequality))
$$
P(X \ge A) \le \frac{E(X)}{a}
$$
**Chebyshev's inequality** ([Wiki](https://en.wikipedia.org/wiki/Chebyshev%27s_inequality))
$$
Pr(|X- \mu| \ge k \sigma) \le \frac{1}{k^2}
$$
**Weak law of large numbers** ([Wiki](https://en.wikipedia.org/wiki/Law_of_large_numbers))
…
**Strong law of large numbers** ([Wiki](https://en.wikipedia.org/wiki/Law_of_large_numbers))
…
### Random variables:
- binomial
- poisson
## 3. DES (Discrete Event Simulation)
Why simulations? many reasons...
Taxonomy of **simulation types**:
- `Deterministic` vs `Stochastic`
- (Deterministic ones are not of interest for this course)
- `Static` vs `Dynamic` (e.g. time)
- `Discrete` vs `Continuous` (e.g. time)

👁🗨 **Monte Carlo Simulation** is an example of `static stochastic` one.
👁🗨 **Discrete-event Simulation** is an example of `discrete dynamic stochastic` one.
### Discrete-event simulation:
- within any interval of time, one can find a subinterval in which the state of the system does not change
- the number of events is `countable`
**System state:** set of variables that store the whole state of the simulation, so it can be interrupted and resumed.
**Event:** occurrence that may change the state of the system.
### M/M/1 Queue
...
### Computing Confidence Interval
- depends on assumptions about the data
- fairly simplke if data comes from iid model
#### Theorem 2.1: Confidence Interval fo Median and other Quantiles (iic)
## 4. Statistics on Data
...
### Summarizing data
da pag.6 ...
- Traditional measures in **engineering** are `mean`, `stdDev`, `CoV`
- Traditional measures in **computer science** are `mean` and `JFI`
- **JFI:** essentially the `same as the std dev or the CoV`
- If the CoV is high (heavy-tailed distributions) comparing JFI/CoVs does not bring a lot of information
- **Lorenz curve gap:** `more robust`, continues to be defined so long as the distribution has a finite mean
- To be preferred if possible
- The **Gini coefficient** can be basically predicted from the Lorenz curve gap, no need to use it further
### Confidence Intervals
da pag.17 ...
- Confidence interval for the **median (or other quantiles):** is easy to get from the `Binomial distribution`
- requires `iid` and `no other assumption`
- Confidence interval for the **mean:**
- requires `iid` and
- either if `data sample` is **normal** and `n` is **small**
- or `data sample` is **not wild** and `n` is **large enough**
- **Bootstrap:** is `more robust` and `more general` but is more than a simple formula to apply
- **Confidence interval for success probability:** requires special attention when success or failure is rare
- check `"rule of three"` case
- 👁🗨 We need to **verify the assumptions!**
### About the Independence Assumption
da pag 64. ...
### Prediction Intervals, Outliers, Rescaling
da pag.56 ...
#### Take home (standard deviation)
- Use **standard deviation** as a measure of `variability` only if
- you can reasonably assume that `your data is normally distributed`
- you have sufficiently `many data from a well-behaved distribution`
- Otherwise, standard deviation is misleading
- **data is best rescaled**
### Which summarisation to use
- Use `methods` that you understand
- **Mean** and **standard deviation** make sense `when datasets are not wild`
- **close to normal**, or **not heavy tailed** and **large data sample**
- Use `quantiles` and `order statistics` if you have the choice
- prediction interval based on **order statistics is robust** (and probably the best)
- **Rescale** if this produces a version of the data set whose properties are closer to the above
- note that computing fairness indices for rescaled data is meaningless (why?)
## 5. Some Proofs and Formal Results
...
### Sample Mean
...
### Sample Variance
...
### Confidence Intervals
...
## 6. Fitting Distributions to Data
...
### Test for Goodness of Fit
...
### Trend Removal (through Least Squares)
...
### Fitting Mixtures of Distributions via Expectation Maximization
...