Simulation and Performance Evaluation

--- tags: Master, Simulation --- # Simulation and Performance Evaluation [TOC] ## 1. Methodology introduction **Syllabus:** - Methodology introduction - Review of basic probability concepts - Discrete-Event Simulation - Input modeling - Random number generation - Output analysis - Variance reduction techniques **Course structure:** - front **lessons** - some **homeworks** (some extra hours dedicated to them) (30%) - final **project** + **presentation** (max 3 people) (40%) - written **exam** (30%) - around 90 mins - multiple choice + open answers **Simulation:** `imitation` of a real world system based on a model - many levels of abstraction (from real world simulations to abstraction in mathematical ones) ### Performance evaluation CHECKLIST: - [x] Know your **goals:** - [x] Identify **factors:** elements that may affect performance 1. **Desired factors:** intensity of load, number of servers, … 2. **Nuisance factors:** time of the day, denial-of-service attacks, … - :warning: `Hidden factors` will fool you ❗❗ - :warning: `correlation` and `causality` may be misunderstood! - **Simpson’s paradox:** (_same examples as in algoritmi avanzati_) - phenomenon in which `a trend` appears in several different groups of data but `disappears or reverses when these groups are combined` - [x] Define **metrics:** - measures, are often `multi-dimensional` - `sampling method:` operational conditions under which the metric is measured - **Kiviat diagram** (radar diagram) - `visual representation` of multi-dimensional metrics - arrange metrics so that the closer to the center, the lesser the performance - [x] Define the offered **load** under which your system operates: - `benchmarks:` artificial load generators that statistically mimic real traffic - [x] Know common **patterns:** _very common traits_ found in different situations - 👁‍🗨knowing some of them `may save a lot of time` and often greatly simplify the evaluation task 1. `Bottlenecks:` common nuisance factors - there can be many at once - role not always trivial, may hide other factors - in many cases are the only nuisance factors to consider, simplifies performance analysis 2. `Congestion collapse` - offered load increases, work done decreases - can be `latent` 3. `Competition` side effect - [x] Know your system well ## 2. Probability Review Classic statistics revision... **Markov’s inequality** ([Wiki](https://en.wikipedia.org/wiki/Markov%27s_inequality)) $$ P(X \ge A) \le \frac{E(X)}{a} $$ **Chebyshev's inequality** ([Wiki](https://en.wikipedia.org/wiki/Chebyshev%27s_inequality)) $$ Pr(|X- \mu| \ge k \sigma) \le \frac{1}{k^2} $$ **Weak law of large numbers** ([Wiki](https://en.wikipedia.org/wiki/Law_of_large_numbers)) … **Strong law of large numbers** ([Wiki](https://en.wikipedia.org/wiki/Law_of_large_numbers)) … ### Random variables: - binomial - poisson ## 3. DES (Discrete Event Simulation) Why simulations? many reasons... Taxonomy of **simulation types**: - `Deterministic` vs `Stochastic` - (Deterministic ones are not of interest for this course) - `Static` vs `Dynamic` (e.g. time) - `Discrete` vs `Continuous` (e.g. time) ![](https://i.imgur.com/Iz8X74K.png) 👁‍🗨 **Monte Carlo Simulation** is an example of `static stochastic` one. 👁‍🗨 **Discrete-event Simulation** is an example of `discrete dynamic stochastic` one. ### Discrete-event simulation: - within any interval of time, one can find a subinterval in which the state of the system does not change - the number of events is `countable` **System state:** set of variables that store the whole state of the simulation, so it can be interrupted and resumed. **Event:** occurrence that may change the state of the system. ### M/M/1 Queue ... ### Computing Confidence Interval - depends on assumptions about the data - fairly simplke if data comes from iid model #### Theorem 2.1: Confidence Interval fo Median and other Quantiles (iic) ## 4. Statistics on Data ... ### Summarizing data da pag.6 ... - Traditional measures in **engineering** are `mean`, `stdDev`, `CoV` - Traditional measures in **computer science** are `mean` and `JFI` - **JFI:** essentially the `same as the std dev or the CoV` - If the CoV is high (heavy-tailed distributions) comparing JFI/CoVs does not bring a lot of information - **Lorenz curve gap:** `more robust`, continues to be defined so long as the distribution has a finite mean - To be preferred if possible - The **Gini coefficient** can be basically predicted from the Lorenz curve gap, no need to use it further ### Confidence Intervals da pag.17 ... - Confidence interval for the **median (or other quantiles):** is easy to get from the `Binomial distribution` - requires `iid` and `no other assumption` - Confidence interval for the **mean:** - requires `iid` and - either if `data sample` is **normal** and `n` is **small** - or `data sample` is **not wild** and `n` is **large enough** - **Bootstrap:** is `more robust` and `more general` but is more than a simple formula to apply - **Confidence interval for success probability:** requires special attention when success or failure is rare - check `"rule of three"` case - 👁‍🗨 We need to **verify the assumptions!** ### About the Independence Assumption da pag 64. ... ### Prediction Intervals, Outliers, Rescaling da pag.56 ... #### Take home (standard deviation) - Use **standard deviation** as a measure of `variability` only if - you can reasonably assume that `your data is normally distributed` - you have sufficiently `many data from a well-behaved distribution` - Otherwise, standard deviation is misleading - **data is best rescaled** ### Which summarisation to use - Use `methods` that you understand - **Mean** and **standard deviation** make sense `when datasets are not wild` - **close to normal**, or **not heavy tailed** and **large data sample** - Use `quantiles` and `order statistics` if you have the choice - prediction interval based on **order statistics is robust** (and probably the best) - **Rescale** if this produces a version of the data set whose properties are closer to the above - note that computing fairness indices for rescaled data is meaningless (why?) ## 5. Some Proofs and Formal Results ... ### Sample Mean ... ### Sample Variance ... ### Confidence Intervals ... ## 6. Fitting Distributions to Data ... ### Test for Goodness of Fit ... ### Trend Removal (through Least Squares) ... ### Fitting Mixtures of Distributions via Expectation Maximization ...