Try   HackMD

Decision Tree and Random Forest

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

This is my personal notes taken for the course Machine learning by Standford. Feel free to check the assignments.

Note that Decision Tree and Random Forest were not tackle in the course but are considered as a must know concept for everyone starting in the field.

I won't be able to understand Decision Tree and Random Forest without StatQuest videos. Feel free to check him out.

Also, if you want to read my other notes, feel free to check them at my blog.

I) Decision Tree

Decision Trees are versatile machine learning algorithms that can perform both classification and regression tasks, and even multi-output tasks. They are very powerful algorithms, capable of fitting complex datasets.

There are 2 types of decision trees:

  1. Classification Trees: When the decision tree has categorical target variable.
  2. Regression Trees: When the decision tree has a continuous target variable

Here is an example of a classification tree:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Here is an example of a regression tree:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

In order to grow those trees, we will use the CART algorithm which produces only binary trees (non-leaf nodes that always have two children). However, other algorithms such as ID3 can produce decision trees with nodes that have more than two children.

1) Classification tree

How do we grow a classification tree ? Let's first go through an example and then deduct from it the general formula.

Here is our dataset. We want to grow a classification tree from it.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

On the picture above we can see the first 10 rows of the iris dataset. The first 4 columns are the first 4 features that we will use to predict the target, the iris species, represented by the last column with numerical values : 0 for setosa, 1 for versicolor, 2 for virginica.

In total, we have 150 observations (150 rows), 50 observations for each iris species : the dataset is balanced.

Here is the decision tree that we will get:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

How do we get such a tree? The tree is built iteratively from the root to the the leaves using the CART algorithm.

The goal of a Decision Tree is to split the training set into homogeneous areas where only one iris species is present according to the features given : here the petal and sepal widths.

Node 0: Root node

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

The graph above shows the distribution of iris species according to the two features selected : petal width on the x-axis and sepal width on the y-axis.

The color of the dots represents the iris species : red for setosa, yellow for versicolor, blue for virginica.

The root node, on the right of the picture above, gives us several information:

  • There are
    68
    irises ('
    samples=68
    ') dots that we can count on the plot on the left.
  • '
    value=[23,20,25]
    ' describes the repartition of these irises among the tree possible classes of iris species, i.e.
    23
    for the setosa,
    20
    for the versicolor, and
    25
    for the virginica.
  • '
    class=virginica
    '. This is the iris species predicted by the Decision Tree at the root node. This decision is taken because virginica is the most numerous species at the root node (25 virginica compared to 20 versicolor and 23 setosa). This is the reason why the background color on the left graph is blue, the color chosen for the virginica species.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

This plot on the left is the same as the previous one but with the first decision boundary of the tree : petal width = 0.8 cm.

How was this decision boundary decided ?

A decision boundary is decided by testing all the possible decision boundaries splitting the dataset and choosing the one that minimizes the Gini impurity of the two splits.


Gini impurity explanation:

The Gini Impurity is a measure of homogeneity among a set. It can be described as:

Gi=1k=1npk2

with:

  • n
    , the number of classes.
  • pk
    is the ratio of class
    k
    among the training instances in the
    ith
    node.

In our case, petal width = 0.8 cm because the 2 splits created by the decision boundary have the lowest possible Gini impurity.

The question to be asked to determine a decision boundary is : How to split the iris species so that we create more homogeneous groups ?

Intuitively, what we can observe on the graph above is that it's possible to create an homogeneous group containing only setosa species just by splitting the dataset along the petal width axis.

But the algorithm has no intuition. So how does it find the best split ?

  • It will try all the possible boundaries along all the features, i.e. all the axes petal width and sepal width.
  • For each split the algorithm will compute the Gini impurity of the two groups created.
  • Finally it will choose the decision boundary that gives the lowest Gini impurity for the two groups (comparing each Gini impurity weighted sum with each other).

In our case, the algorithm has found that among all the possible splits the split with petal width = 0.8 cm gives the lowest Gini impurity.

The Gini impurity for the left leaf is :

Gleft=1(2323)2=0

The Gini impurity for the right leaf is :

Gright=1(2045)2(2545)2=0.494

Node 1:

The process described will continue iteratively until the tree succeeds or tries to separate all the data points or a restrictive condition is applied to the algorithm like a limitation in the depth of the tree.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Sometimes, splitting to get an homogeneous group is not always the best option. We will see that in the node 2.

Node 2:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

For this node the algorithm chose to split the tree at petal width = 1.55 cm creating two heterogeneous groups. Intuitively we would have split at petal width = 1.3 cm or sepal width = 3.1 cm so that we have a group with only versicolor irises.

Let's verify which decision boundary is the best by computing the Gini impurity in both split.

Gini impurity with the split at petal width = 1.55 cm (Left/Right/Weighted sum):

Gleft=1(1718)2(118)2=0.105Gright=1(35)2(25)2=0.48Gw1=18230.105+5230.48=0.187

Gini impurity with the split at petal width = 1.3 cm (Left/Right/Weighted sum):

Gleft=1(88)2=0Gright=1(1215)2(315)2=0.32Gw2=8230+15230.32=0.209

By comparing the Gini impurity weighted sum for each of our split, we can say that the algorithm is right and our intuition was wrong (

Gw1<Gw2).

Node 3:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Applying the same principle again and again the algorithm will try to isolate every point until it has only homogeneous groups. This can lead to overfitting if we don’t limit the size of the tree for example.

How does the built tree take a decision ?

When the Decision Tree has to predict a target, it travels down the tree from the root node until it reaches a leaf, deciding to go to the left or to the right child node by testing the feature value of the iris with the parent node condition.

With this example, we can deduce the CART cost function:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

with,

k a feature and
tk
a threshold (sepal width cm)

2) Regression tree

The CART algorithm works mostly the same way as earlier, except that instead of trying to split the training set in a way that minimizes the Gini impurity, it now tries to split the training set in a way that minimizes the MSE (Mean Square Error).

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

The smallest MSE will enable us to find

k and
tk
to form our decision boundary.

Just like for classification tasks, Decision Trees are prone to overfitting when dealing with regression tasks. Without any regularization (i.e, using the default hyperparameters), you get the predictions on the left. It is obviously overfitting the training set very badly. Just setting min_samples_leaf=10 results in a much more reasonable model, represented on the right. This hyperparameter enables you to allow a split only if there is at least min_samples_leaf points.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

II) Random Forest

Suppose you ask a complex question to thousands of random people, then aggregate their answers. In many cases you will find that this aggregated answer is better than an expert's answer. This is called the wisdom of the crowd. Similarly, if you aggregate the predictions of a group of predictors (such as classifiers or regressors), you will often get better predictions than with the best individual predictor. A group of predictors is called an ensemble. Thus, this technique is called Ensemble Learning, and an Ensemble Learning algorithm is called an Ensemble method.

Random Forest is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. It combines the simplicity of decision trees with flexibility, resulting in a vast improvement in accuracy.

How do we build a Random Forest?

  1. Create a "bootstrapped" dataset. To do so, use one of the following techniques:
    • Bagging: Sampling performed with replacement.
    • Pasting: Sampling performed without replacement.
  2. Create a decision tree using the bootstrapped dataset, but only using a random subset of features at each step.
  3. Repeat

Step 1:

Using bagging technique:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Step 2:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Step 3:

After repeating, we get a variety of decision trees.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Now to make a prediction (to make a decision), we just need to take a new example and then check the output for each decision trees among the random forest. We then take the one with the most votes.

How do we measure the accuracy of our Random Forest?

When creating our "bootstrapped" dataset with bagging, some instances may be sampled several times for any given predictor, while others may not be sampled at all. We called them Out-of-Bag dataset.

We can use them to evaluate our model by running the Out-of-Bag dataset through our Random Forest and then compare the most voted label with the Out-of-Bag decision sample.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Here, we used Hard voting where the label with the most votes wins. We could use Soft voting where we average the votes of the all the learners.

Note: Soft margin often achieves higher performance than hard voting because it gives more weight to highly confident votes.

Remember when we build our decision trees with a random number of features ? How do we concretely find this number ?

We started by selecting 2 random features to build our Random Forest decision trees and measured its accuracy. We then selected 3 random features to build our Random Forest decision trees and measured its accuracy. We then compared the accuracy of the 2 models and took the best and repeated the process.

Typically, we start by using the square root of the number of variables and then try few settings above and below that value.

III) Ensemble Methods

The main causes of error in learning are due to noise, bias and variance. Ensemble methods helps to minimize these factors. These methods are designed to improve the stability and the accuracy of machine learning algorithms.

Combinations of multiple classifiers decrease variance, especially in the case of unstable classifiers, and may produce a more reliable classification than a single classifier. To use Bagging or Boosting you must select a base learner algorithm. For example, if we choose a classification tree, Bagging and Boosting would consist of a pool of trees as big as we want

1) Bagging

Bagging (Bootstrap Aggregation) gets N decision trees (learners) from N datasets created by random sampling with replacement. The result is then obtained by averaging the responses of the N learners (Soft voting) or majority vote (Hard voting).

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

2) Boosting

Boosting (originally called hypothesis boosting) refers to any Ensemble method that can combine several weak learners into a strong learner. A weak learner is simply a classifier that performs poorly, but performs better than random guessing.

The general idea of most boosting methods is to train predictors sequentially, each trying to correct its predecessor. There are many boosting methods available, but by far the most popular are AdaBoost (short for Adaptive Boosting) and Gradient Boosting. Let's start with AdaBoost.

a) AdaBoost

Here are the main ideas behind AdaBoost:

  1. AdaBoost combines a lot of "weak learners" (predictors) to make classification. The "weak learners" are almost always stumps.
  2. Some stumps get more saying in the classification than others.
  3. Each stump is made by taking the previous stump mistakes into account.

Here are the differences between a Random Forest and a Forest of Stumps (using AdaBoost):

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

How to create a forest of stumps ? (Using AdaBoost)

  1. Assign
    sample_weight
    to dataset and initialize with
    1#sample_weights
    .
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  1. Choose the 1st stump in forest.
  • Check how well each feature correctly classifies.
  • Compute the Gini impurity for each stump.
  • Lowest Gini impurity = 1st stump.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  1. Compute its amount of say.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  1. Increase/Decrease sample weights for incorrectly/correctly classified samples using the
    amount_of_say
    previously computed.

(Increase)new_sample_weight=sample_weighteamount_of_say(Decrease)new_sample_weight=sample_weighteamount_of_say

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  1. Normalize
    new_sample_weight
    so that its sum is always equal to 1. Replace
    sample_weight
    by
    new_sample_weight
    .
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  1. Create a new dataset by picking a random number between
    [0,1]
    and use the
    new_sample_weight
    as a probabilty distribution. Pick sample in which your random number belongs to.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  1. After creating the new dataset, replace all
    sample_weights
    by
    1#sample_weights
    .
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  1. Repeat by going back to step 2. The algorithm stops when the desired number of predictors (models) is reached, or when a perfect predictor (model) is found.

How a forest of stumps created by AdaBoost makes classification ?

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


How to ensure that increasing the weights of misclassified points in AdaBoost does not adversely affect the learning progress?

The new classifier in each round might indeed classify the old points incorrectly. However the previous 'versions' of the classifier (from previous iterations) are not thrown away. The end result is an ensemble/average of all the classifiers of each step where the contribution of each classifier is weighted by how well that particular classifier did at that round.

For example if we have an outlier that is hard to classify correctly, the outlier will accumulate a lot of weight. The classifier will be forced to give priority to that point and classify it correctly. This might mean that all the other points are misclassified.

However, this classifier's 'opinion' will not be so important in the end because only one point was classified correctly (the outlier). This is also a good way to detect outliers. Just find the points with very large weight.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Remark:

  • In the 1st Iteration the classifier based on all datapoints classifies all points correctly except those in
    x<0.2
    and
    y>0.8
    and the point around
    0.4/0.55
    (see the circles in the second picture).
  • In the second iteration exactly those points gain a higher weight so that the classifier based on that weighted sample classifies them correctly (2nd Iteration, added dashed line). The combined classifiers (i.e. "the combination of the dashed lines") result in the classifier represent by the green line. Now the second classifier produces another missclassifications
    (x[0.5,0.6]/ y[0.3,0.4])
    , which gain more focus in the third iteration and so on and so on.
  • At every step, the combined classifier gets closer and closer to the best shape (although not continuously). The final classifier (i.e. the combination of all single classifiers) in the 100th Iteration classifies all points correctly.

b) Gradient Boosting

Another very popular Boosting algorithm is Gradient Boosting. Just like AdaBoost, Gradient Boosting works by sequentially adding predictors to an ensemble, each one correcting its predecessor. However, instead of tweaking the instance weights at every iteration like AdaBoost does, this method tries to fit the new predictor to the residual errors made by the previous predictor.

Like AdaBoost, Gradient boosting builds fixed sized trees based on previous trees errors.

Unlike Adaboost, Gradient boosting trees can be larger than a stump.

Regression:

Here is the Gradient Boosting algorithm for regression:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Here is our dataset:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

For the sake of our example, we will used stumps instead of larger trees because our dataset is small.


Input:

  • x1
    correspond to "Height/Favorite color/Gender" of the first row
    y1
    corresponds to "Weight".
  • x2
    correspond to "Height/Favorite color/Gender" of the second row
    y2
    corresponds to "Weight".
  • x3
    correspond to "Height/Favorite color/Gender" of the third row
    y3
    corresponds to "Weight".
  • L(yi,F(x))
    corresponds to
    12(observedpredicted)2

Step 1:

F0(x)=argminγi=1nL(yi,γ)

We want the predicted value

γ that will minimize the sum of loss function
L
. By setting the derivatives of the loss function
L
to
0
, the predicted value
γ
is equal to the average of the observed weights. This result will correspond to the initial predicted value
F0(x)
, which is just a leaf.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Step 2:

We just enter the loop so

m=1.

  1. ri,m=[L(yi,F(xi))F(xi)]F(x)=Fm1(x)

    • [L(yi,F(xi))F(xi)]
      corresponds to the
      (observedpredicted)
      .
    • Since
      m=1
      , the condition
      F(x)=Fm1(x)F(x)=F0(x)
      which means that
      (observedF0(x))
      .
    • ri,m
      which is the residual for sample
      i
      of the tree
      m
      .
    • In our case, we compute the residual for the tree 1.
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  1. We need to build a regression tree to predict the residual instead of the weight. For each leaves, need to assign a region

    Rj,m where
    j
    is the index for each leaf in the tree
    m
    .

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  1. γj,m=argminγxiRi,jL(yi,Fm1(xi)+γ)

    • The output value
      γj,m
      for each leaf is the value of
      γ
      for which the summation is minimized.
    • xiRi,j
      means that only the sample that belongs to a specific region will be used in the summation.
    • By setting the derivative of
      L
      in fuction of
      γ
      to 0, we will find that
      γj,m
      is equal to the average value of each region
      Rj,m
      .
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  1. Fm(x)=Fm1(x)+νj=1JmγmI(xRj,m)

    • Since
      m=1
      ,
      Fm(x)=F1(x)
      .
    • Since
      m=1
      ,
      Fm1(x)=F0(x)
      .
    • ν
      is the learning rate. It is usually equal to
      0.1
      .
    • γmI(xRj,m)
      is just the tree we finished to build previously in 3.

Now we set

m=2 and we repeat. We then get:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Step 3: Output

FM(x)

Let's assume that

M=2 (Note that in practice,
M100
) and make a prediction:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Classification:

Here is the Gradient Boosting algorithm for classification:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Here is our dataset:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

For the sake of our example, we will used stumps instead of larger trees because our dataset is small.


Input:

  • L(yi,F(x))
    corresponds to
    [observedlog(p1p)+log(1+elog(p1p))]

Step 1:

  • F0(x)=γ=log(odds)=log(p1p)
    .
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Step 2:

We just enter the loop so

m=1.

  1. ri,m=[L(yi,F(xi))F(xi)]F(x)=Fm1(x)

    • [L(yi,F(xi))F(xi)]
      corresponds to the
      (observedpredicted)
      .
    • Since
      m=1
      , the condition
      F(x)=Fm1(x)F(x)=F0(x)
      which means that
      (observedF0(x))
      .
    • ri,m
      which is the residual for sample
      i
      of the tree
      m
      .
    • In our case, we compute the residual for the tree 1.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  1. We need to build a regression tree to predict the residual instead of the weight. For each leaves, need to assign a region
    Rj,m
    where
    j
    is the index for each leaf in the tree
    m
    .
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  1. Using taylor expansion to the order 2, we find that:

γj,m=rj,mp(1p)]withp=elog(odds)1+elog(odds)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  1. Fm(x)=Fm1(x)+νj=1JmγmI(xRj,m)

    • Since
      m=1
      ,
      Fm(x)=F1(x)
      .
    • Since
      m=1
      ,
      Fm1(x)=F0(x)
      .
    • ν
      is the learning rate. It is usually equal to
      0.1
      .
    • γmI(xRj,m)
      is just the tree we finished to build in Step 2) 2.

Now we set

m=2 and we repeat. We then get:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Step 3: Output

FM(x)

Let's assume that

M=2 (Note that in practice,
M100
) and make a prediction:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

IV) Hybrid Methods

Contrary to ensemble methods, hybrid method takes a set of different learners and combines them using new learning techniques.

1) Stacking

Stacking (or blending, or stacked generalization) is a combining mechanism where the output (predictions) of the classifiers (Level 0 classifiers) will be used as training data for another classifier (Level 1 classifier) called a blender, or a meta learner to approximate the final prediction.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Method:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →