# ML Exam Questions
### 1 Using multi-dimensional scaling, higher dimensionality of the output space (visualization space) generally results in a lower error (badness-of-fit measure) of the data projection.
- [x] Correct
- [ ] Wrong
- [ ] indecisive
---
### 2 The most common neighborhood kernel functions used in self-organizing maps are linear functions.
- [ ] Correct
- [x] Wrong
- [ ] indecisive
---
### 3 is missing
---
### 4 Recurrent neural networks cannot be used for sequential input data that has a time dimension.
- [ ] Correct
- [x] Wrong
- [ ] indecisive
---
### 5 The vanishing gradient problem is a difficulty in training artificial neural networks with many layers by gradient-based learning and backpropagation
- [x] Correct
- [ ] Wrong
- [ ] indecisive
---
### 6 Given a confusion matrix, one can compute precision and recall per class.
- [x] Correct
- [ ] Wrong
- [ ] indecisive
---
### 7 Visualizing the model vectors of a trained SOM using "Chernoff's Faces" is only meaningful if a semantic connection between the data dimensions and properties of the faces can be established.
- [x] Correct
- [ ] Wrong
- [ ] indecisive
---
### 8 The $R^2$ statistics (aka coefficient of determination) can be used to quantify the performance of a regression approach and always takes positive values.
- [ ] Correct
- [x] Wrong
- [ ] indecisive
---
### 9 Principal components analysis (PCA) involves Eigenvector decomposition, which can be performed, for instance, using the Power method or the QR algorithm
- [ ] Correct
- [x] Wrong
- [ ] indecisive
---
### 10 In a 2-class classification problem, a random guesser (that randomly chooses 1 or class 2 for prediction) will achieve, on average, an area under the ROC (AUC) of 0.707.
- [ ] Correct
- [x] Wrong
- [ ] indecisive
---
### 11 The spread parameter is smoothed data histograms (SDH) adjust the density of the data points in the output space.
- [ ] Correct
- [x] Wrong
- [ ] indecisive
---
### 12 One of the disadvantages of k nearest neighbor classifiers is they are often very slow during training.
- [ ] Correct
- [x] Wrong
- [ ] indecisive
---
### 13 The EM algorithm can be used to estimate the parameter of a statistical model even if the model depends on latent variables.
- [ ] Correct
- [x] Wrong
- [x] indecisive
---
### 14 is missing
---
### 15 is missing
---
### 16 is missing
---
### 17 The loss of neural networks is in general strictly convex.
- [ ] Correct
- [x] Wrong
- [ ] indecisive
---
### 18 A model is said to "underfit" if it has a lower error on the training data than on future data.
- [ ] Correct
- [x] Wrong
- [ ] indecisive
---
### 19

* How many hidden layers does the network have?
**Antwort: [1]**
* How many input neurons does the network have?
**Antwort: [3]**
* What are the dimensions of the ......(not readable) between the input and the first hidden layer? (formatted like **ij**)
**Antwort: [2 3]**
(2 inputs von hidden layer 3 Gewichte jeweils)
**b)** Activation (ReLU):?
Preactivation: ?
(formatted like **i j**)
**Antwort: [0.8 0.8]**
**c)** Activation (ReLU):?
Preactivation: ?
(formatted like **i j**)
**Antwort: [0.0 -0.8]**
(Activation: 0.0
Pre-Activation: -0.8)
**a)** Activation (linear):?
Preactivation: ?
(formatted like **i j**)
**Antwort: [0.8 0.8]**
nicht sicher!
* prediction y=?
**Antwort: [0.8]**
---
### 20
Assume you have a data set of 15,000 individuals suffering from a particular disease. For each patient, you have 50 measured features, such as body weight, blood pressure, blood parameters, etc. Based on knowledge from medical experts, the hypothesis is that there are actually three-point groups that correspond to different disease subtypes. Which of the machine learning approaches discussed in the lecture would you choose to identify these subtypes? Chose from the options below!
- [x] I would use a clustering algorithm with patients as input objects described by their features. For k-means, I would choose three cluster centers that could correspond to three patient groups.
- [ ] I would choose a reinforcement learning approach, in which a neural network selects a patient, and if this patient is assigned to the correct group the network gets a reward.
- [ ] I would chose a supervised machine learning approach in which a support vector machine classifies each patient into one of the three groups. The similarity measure between patients would be a linear kernel between feature vectors.
- [ ] I would choose a linear model with the 50 measured features as input and patients as output (target) variables. In this way, one can obtain weights on each feature to identify important features that can discriminate between the three groups.
- [ ] I would choose a decision tree approach in which a machine learning method learns to differentiate between the three or more groups and then investigate the decision criteria. In this way, a low-dimensional representation of the diseases is learned.
- [ ] I would choose an unsupervised approach using PCA and visualization of the first principle components. The three clusters should be visible in a PCA plot as three groups of points that are close together.
---
### 21

> 100 % correct
---
### 22 Consider the following classification problem:
Given a user's voice recording, you want to predict whether he or she is in a happy or angry mood. **Suppose you are only interested in angry users!** (E.g., in a conversational interface, you want to take countermeasures if the user gets upset)
We already performed a classification experiment on a collection of audio streams containing voice, which yielded the contingency table/confusion matrix depicted below. Compute the requested performance and write them down in the correct places.
| | Classified as happy | Classified as angry |
| -------- | -------- | -------- |
| is actually happy | 30 | 20 |
| is actually angry | 10 | 40 |
What is **accuracy** in this example (round to 2 digits after the decimal point)?
Antwort:
$$
\begin{align}
Accuracy &= \frac{(TP+TN)}{TP+TN+FP+FN} \\ \\
&=\frac{(TP+TN)}{TP+TN+FP+FN} \\ \\
&= \frac{(40+30)}{(40+30+20+10)} = 0.7 = 70\%
\end{align}
$$
---
### 23 Consider the following classification problem:
Given a user's voice recording, you want to predict whether he or she is in a happy or angry mood. **Suppose you are only interested in angry users!** (E.g., in a conversational interface, you want to take countermeasures if the user gets upset)
We already performed a classification experiment on a collection of audio streams containing voice, which yielded the contingency table/confusion matrix depicted below. Compute the requested performance and write them down in the correct places.
| | Classified as happy | Classified as angry |
| -------- | -------- | -------- |
| is actually happy | 4 | 16 |
| is actually angry | 16 | 64 |
What is **F-measure** in this example (round to 2 digits after the decimal point)?
Antwort:
$$
F-measure = \frac{2 \cdot \left (Precision \cdot Recall \right )}{Precision + Recall}
$$
$$
\begin{align}
&Recall = \frac{TP}{(TP + FN)} = \frac{4}{(4+16)} = 0.2 \\ \\
&Precision = \frac{TP}{(TP + FP)} = \frac{4}{(4+16)} = 0.2 \\ \\
&F-measure = \frac{2 \cdot (0.2 \cdot 0.2)}{(0.2+0.2)} = 0.2 = 20\%
\end{align}
$$
---
### 24 is missing
---
### 25 Consider the following classification problem:
Given a user's voice recording, you want to predict whether he or she is in a happy or angry mood. **Suppose you are only interested in angry users!** (E.g., in a conversational interface, you want to take countermeasures if the user gets upset)
We already performed a classification experiment on a collection of audio streams containing voice, which yielded the contingency table/confusion matrix depicted below. Compute the requested performance and write them down in the correct places.
| | Classified as happy | Classified as angry |
| -------- | -------- | -------- |
| is actually happy | 30 | 20 |
| is actually angry | 10 | 40 |
What is **precision** in this example (round to 2 digits after the decimal point)?
Antwort:
$$
Precision = \frac{TP}{(TP+FP)} = \frac{40}{(40+20)} = 0.66 = 66\%
$$
---
### 26 Consider the following classification problem:
Given a user's voice recording, you want to predict whether he or she is in a happy or angry mood. **Suppose you are only interested in angry users!** (E.g., in a conversational interface, you want to take countermeasures if the user gets upset)
We already performed a classification experiment on a collection of audio streams containing voice, which yielded the contingency table/confusion matrix depicted below. Compute the requested performance and write them down in the correct places.
| | Classified as happy | Classified as angry |
| -------- | -------- | -------- |
| is actually happy | 30 | 20 |
| is actually angry | 10 | 40 |
What is **recall** in this example (round to 2 digits after the decimal point)?
Antwort:
recall = TP / (TP+FN) = 40 / (40+10) = 0.8 = 80%
$$
Recall = \frac{TP}{(TP+FN)} = \frac{40}{(40+10)} = 0.8 = 80\%
$$
---
### 27 Which of the following methods can be used to evaluate the quality of clustering produced by algorithms such as k-means or self-organizing maps (SOM)?
- [x] Topographic error
- [ ] R-square statistics
- [ ] Root mean squared error
- [ ] RBF kernel
- [ ] Area under the curve (AUC)
- [x] Sheperd plot
- [ ] Confusion matrix
- [x] Inter-cluster distance
- [ ] Interrater agreement
---