# Basic Methods of Data Analysis Exam Questions ###### tags: `Exam` ## These questions have come from different sources Exam Questions: Nachklausur 6.2.2015 "basic" were quite simmilar: centers/mean average mode etc. calculate + recalculate with one outlier, give formular rest: no "calculations" instead of histogramm obligatory other thing. box plot and violin plot fuzzy k-mean clustering hierachical vs k-mean clustering: performance oja rule (from pca) some questions about bi/multivariant data iid: how distributed how to calc cov. matrix anova: additional properties no super-basic questions about linear models distribution of error variance (standard^2 = t? said another student) --- Basic Methods of Data Analysis (14/19) Klausurfragen vom 29.01.2015 Partially (badly) answered - don't trust the answers, just the questions - population mean vs. empirical mean (2 punkte) Empirical: From n samples, vs Population: from n samples with 0<p<1 probability per sample - Histogram: was ist das? welche Eigenschaften der Daten kann man ablesen? (4 punkte) Diagram of data distribution, amout of samples with value x/interval. Evidence of spread and distribution, peaks and low density regions - Schätzung der Covariance Matrix Calculate eigenvalues of it, check how many are really linary independent? - MVLUE Minimal Variance Linear Unbiased Estimator: Unbiased: Error-mean = 0; Variance should be minimal. - iid: 3 assumptions -> exogenity, linear independence, homoscedasticity (12 punkte) exogenity: average of noise is zero, but data is distributed around correct value linear independence: like algebra, full rank homoscedacity: variance of every error is the same - agglomerative clustering + different distance measures (linkage) spanning tree, minimization by different values of edges agglomerative: buttom up: merge divisive: top down: split distance measure: single, complete, average: single: näheste complete: weiteste distanz average: distance - Eigenschaften von k-means fast, robust to outliers, simple find k clusters with minimal square distance - 2 Arten PCA-Berechnung (nicht iterativ) ICA und PCA? Eigenvalue decomposition vs Single Value Decomposition Mit covariance matrix oder singular values - Model von einem Linear Model aufschreiben + Variablen und Parameter bezeichnen y_n = x_n * b_param + e_n y = beta0 + sum over xj * bj + e (for one data point) - Parameter von linear Model (eigene Frage, man konnte sich dabei auf die vorherige beziehen) Beta should be for all variables - ANOVA properties Additional to normal parameters: grouping parameters - Goal of Linear Models? How distributed? x+betas. errors ignore. beta0 = intercept, beta1 = rise normal distributed? - Are all confidence intervalls for all parametesr the same? No, because it is dependent on the parameter. - What are [in]dependent variables in bivariant data? x is independent, y is dependet - what distributions are used for testing lin. models? normal vs mean --- Suggested questions (ws 2014/15) - important: assumptions, why needed - mean/median, robustness (outliers), when to use what - normal variant distributed, kai square, y-square distribution - confidence intervals, why called hat matrix, what confidence on what - (x^T*x)^-1 = ... - anova questions --- Test questions 2014 (outdated, german, sorry) 1) Zentrum von Daten – wie kann man es bestimmen, Beispiel mit gegebenen Zahlen, alles ausrechnen, was passiert wenn die Zahl 105 (in dem Fall ein Outliner) auch dabei ist (und wieder alles berechnen) 2) Experimental vs. Population mean – Wieder mit Zahlen und praktischer Berechnung 3) Ausdehnung von Daten – Beschreibung, Zahlenbeispiel ausrechnen, was verändert sich, wenn die Zahlen „105“ (ein Outliner) dazukommt? – wieder ausrechnen 4) Boxplot – was kann man ablesen, wie wird es dargestellt? 5) Histogram – was kann man ablesen, wie wird es dargestellt? 6) Violin plot – was kann man ablesen, wie wird es dargestellt? 7) Pearsson corr. Coefficient – assumption of distribution? (Student’s t) 8) Outer product – was ist es (mit Formel) 9) Scatter plot – was kann man ablesen, wie wird es dargestellt? 10) Factor analysis – was ist es (mit Objective) und vs PCA und ICA 11) Multi Dimensional Scaling – was ist es? + Objective 12) NMF – was ist es? + Objective 13) LLE – was ist es? + Objective 14) t-distribution stochastic – was ist es? + Objective 15) mixtures model – was ist es? + Objective 16) k-mean – was ist es? + update rule