# Quiz Module 5
## Linear Regression
### Overview
1. What is the purpose of regression?
- [ ] Sort the data into order
- [ ] Show the relationship between two variables
- [ ] Estimate the variables in Y using variables in X
- [ ] s
2. In the regression equation Y = 21 - 3X, the slope is - [ ] 21
- [ ]-21
- [ ] 3
- [x] -3
3. In the regression equation Y = 75.65 + 0.50X, the intercept is
- [ ] 0.50
- [x] 75.65
- [ ] 1.00
- [ ] indeterminable
4. In syntax of linear model, data refers to ______
- [ ] Matrix
- [x] Vector
- [ ] Array
- [ ] List
5. The difference between the actual Y value and the predicted Y value found using a regression equation is called the
- [ ] slope
- [x] residual
- [ ] outlier
- [ ] scatter plot
6. Which method is used to find the best fit line for linear regression?
- [x] Ordinary Least Squares
- [ ] Mean Square Error
- [ ] Maximum Likelihood
7. Which of the following is FALSE regarding regression?
- [ ] It relates inputs to outputs
- [ ] It is used for prediction
- [ ] It may be used for interpretation
- [x] It discovers causal relationships
8. Which of these lines seems to fit the data best?

- [ ] A
- [ ] B
- [ ] C
- [ ] None
### An example
1. The relation between the selling price of a car (in $1,000) and its age (in years) is estimated from a random sample of cars of a specific model. The relation is given by the following formula:
SellingPrice = 24.2 − (1.182)Age
What is 'Age' called?
- [x] Independent variable
- [ ] Dependent variable
- [ ] Bias
- [ ] Coefficient
2. A family would like to build a linear regression equation to predict the amount of grain harvested per acre of land on their farm. They subdivide their land into several smaller plots of land for testing and would like to select an independent variable they can control. Which of the following is an appropriate independent variable that the family could use to create a linear regression equation?
- [ ] The total amount of rainfall recorded at their farm
- [ ] The type of crop planted in the plot the previous year
- [ ] The average daily temperature at their farm
- [x] The amount of fertilizer applied to each plot of land
### Build a Model
Calculate MAE, MSE
Interpret
1. Which of these commands will give you the value of slope for the fitted model?
- [ ] lr.fit
- [x] lr.coef_
- [ ] lr.intercept_
2. What is testing in Machine Learning?
- [x] Comparing the predicted data with original ones
- [ ] Finding broken code
- [ ] A stage of all projects
- [ ] None of the above
3. Why do we need two sets: a train set and a test set?
- [ ] to train the model faster
- [ ] to validate the model on unseen data
- [x] to improve the accuracy of the model
4. If the model worked correctly, what should y_new represent in the code below?
```python=
regr = LinearRegression()
regr.fit(X, y)
y_new = regr.predict(X)
```
- [ ] The slope and intercept values of the line of best fit.
- [x] The y-values that X would produce on the line of best fit.
- [ ] The predicted y-values from a new set of x-values.
- [ ] The same as y.
### Evaluate a Model
1. Fitting two linear regression models for the same data, first one gives an RMSE value of 3.78, and the second returns a RMSE value of 6.33. Which of these is a better model?
- [ ] Undetermined
- [ ] Both are the same
- [x] The first one
- [ ] The second one
2. Which metric should we avoid when looking into outliers?
- [x] Mean Absolute Error (MAE)
- [ ] Mean Squared Error (MSE)
- [ ] Root Mean Squared Error (RMSE)
### Interpret a Model
1. Which of the following will help you in effectively comparing models (built on the same dataset) with different numbers of features?
- [x] R-squared adjusted
- [ ] R-squared
2. Supposed you add another variable to a model that increases the R-quared value. Which is true?
- [x] The adjusted R-squared value may increase or decrease.
- [ ] The adjusted R-squared value will either increase or remain the same.
3. The R-squared value will always increase (or at least remain the same) when adding more variables
- [x] True
- [ ] False
4. Which metric is used to determine the significance of the overall model fit?
- [ ] R-squared value
- [ ] R-squared adjusted
- [x] F-statistic
## Logistic Regression
### Overview
1. Which of the following describes the output of Logistic Regression?
- [ ] A Boolean value of either True or False for whether the outcome will happen.
- [ ] A value between 0 and 1, which indicates the probability of a particular outcome.
- [ ] Either 0, indicating the event likely won’t happen, or 1, indicating that an outcome will likely occur.
2. Logistic Regression differs from Linear Regression because the output of a Logistic Regression model ranges from -∞ to +∞.
- [ ] True
- [x] False
3. Logistic regression is mainly used for Regression?
- [ ] True
- [x] False
4. Which of the following plots shows a Sigmoid function?

- [ ] A
- [ ] B
- [x] C
- [ ] D
5. You are predicting whether an email is spam or not. Based on the features, you obtained an estimated probability to be 0.75. What’s the meaning of this estimated probability? (select two)
(A) there is 25% chance that the email will be spam
(B) there is 75% chance that the email will be spam
(C) there is 75% chance that the email will not be spam
(D) there is 25% chance that the email will not be spam
### Loss Function
1. What’s the the hypothesis of logistic regression?
- [x] to limit the cost function between 0 and 1
- [ ] to limit the cost function between -1 and 1
- [ ] to limit the cost function between -infinity and +infinity
- [ ] to limit the cost function between 0 and +infinity
2. It is desirable to ... the error of predicted values.
- [x] minimize
- [ ] maximize
### Gradient Descent
1. What is the purpose of performing gradient descent?
- [ ] To decrease the slope of the loss curve
- [x] To move parameters in the direction that minimizes loss.
- [ ] To maximize loss.
- [ ] To decrease parameter values.
### Normalization
Not sure?
1. Normalized data are centered at ___ and have units equal to standard deviations of the original data.
- [ ] 0
- [ ] 5
- [ ] 1
- [ ] 10
### Train Test Split
1. Consider 4 classifiers, whose classification performance is given by the following table. Which of the following 4 is most likely overfit?

- [ ] Classifier 1
- [ ] Classifier 2
- [ ] Classifier 3
- [ ] Classifier 4
### Example with Sklearn
### Confusion Matrix
### Precision and Recall
## Unsupervised Learning
### Overview
### Kmeans
### Hierarchical Clustering
### Color Compression with Kmeans