# Machine Learning Assignmnet 1 ### Mustafin Timur, B&S ## Description of the results. ### Logistic Regression ![](https://i.imgur.com/3gjKPfl.png) **Score:** 96.13165931455717 ### SVM ![](https://i.imgur.com/1DmpJsx.png) **Score:** 95.79233118425518 ### SVM with SGD ![](https://i.imgur.com/RyS9dl3.png) **Score:** 95.35120461486257 ### Results The data is almost linearly separable but we have a small overlap in data between `Sitting` and `Standing` because all three models made pretty much constant error there. ## Model comparison — effectivness The best accuracy score has been demonstrated by `Logistic Regression`. It happened becuase SVM (SVM with SGD) tried to find non-existing dependecies. ## Model comparison — time consuming I measured `evaluate_model` time consuming, the results are: 1. **SVM with SGD**: 53.765860080718994 seconds 1. **SVM**: 62.17709183692932 seconds 1. **Logistic Regression**: 129.93274903297424 seconds ## The feasibility of precision and recall We can't just call `sklearn.metrics.recall_score` in this case because classicaly these classifiers used to be binary and we calculated `recall` as ![](https://i.imgur.com/SCYQxZR.png) but this works only for binary cases (again, as it was designed). For our multiclass case we should redefine the function in order to handle multiclasses. There are a few ways(from the sklearn docs): * 'micro': Calculate metrics globally by counting the total true positives, false negatives and false positives. * 'macro': Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account. * 'weighted': Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. * 'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification where this differs from accuracy_score).