Introduction
Stacking is an ensemble machine learning technique that allows combining different prediction models to make a single model that make the final prediction out of the provided dataset.
Sometimes these combined models can be same depending up to the type of dataset we are dealing with, Stacking in Machine Learning also known as Stacked Generalisation which is the method to determine and combine the scope of the different model in order to predict the output for the same dataset.
The entire model is divided into folds where each model deals with some portion of the complete problem, not with the complete set of problem. So, all the subparts of the problem can be dealt individually with different models combined, it can also be treated as the layers of prediction where each model solve certain part of the problem and the output from every layer work as the input for the final layer or any intermediate layer these final and intermediate layer is also based on some prediction model technique based on whether we are predicting for classification or regression.
The main benefit of using stacking is to improve predictions. In broad situation it can be said that Ensemble methods are meta algorithms that combine many machine learning algorithms in one prediction model in order to decrease variance (bagging), bias (boosting) or improve predictions (stacking).
Mainly Ensemble method can be divided in two categories:
- Sequential Ensemble: Here the base learners are generated sequentially, base learners are dependent on each other.
- Parallel Ensemble: Here the base learners are generated parallel, base learners are independent of each other.
- Homogeneous Ensemble: Ensemble uses only one learning algorithm to make the complete learning model and in order to obtain the base learners.
- Heterogeneous Ensemble: Ensemble uses different learning algorithms to make the prediction model and to obtain the base learners.
Stacked Generalisation: Now, we know what is stacking in Machine Learning so our next question should be which models are useful in the particular problem? how to choose which model to choose? and final why do we Need stacking?
Before coming to these questions, we need to understand the architecture of the stacking. Stacking mainly involves two or more base models, these models referred to as level-0 model and level-1 is meta-model which take the output of the level-0 as an input in order to obtain the final output for regression or classification depending upon the dataset, we are dealing with.
- Level-0: This is also called the base model that fit the or train on a certain portion of a dataset whose output is treated as the input to another layer.
- Level-1: This layer is meta-model which takes input from the base layer and combines the output of the base layer in order to produce the output of meta-model.
The output that, we obtain from the base model and the input to the meta-model may be a real value in the case of regression or probability value, value of any probability density function, class or label in case of classification models.
Now the choice of prediction model of the base layer can be different. In stacking model can be different and be fitted on the same dataset also, a single model can be used to learn how to best combine the prediction from each model and to finally produce the output which can be our desired real value or may be a label attached to our data.
The above figure is the exact architecture of stacking
The details of the figure are as follow.
- the dataset has m rows and n columns which are m data points with n features per datapoints.
- There are M different models with different functions that work on X training dataset in the mode of K-folds.
- Each model provides prediction which is then provided to the second level training and this data has a dimension of m x M, that is m rows which is the number of data points and M columns which is the number of features which is the output of M base models.
- Level-2 model will be trained on the data produced by the base-model dataset in order to produce the final result. This level-2 model is also referred to as a meta-model.
Stacking in Classification
Classification is the technique to obtain or predict the label or class attached to the dataset on which our training is being done, there are many classification algorithms that can be used for obtaining the desired result such as logistic Regression, Support Vector Classification, Decision Tree, Random Forest, k-nearest Neighbours, Naive Bayes all these algorithms play a major role in the classification of data and may produce a different score.
Stacking can be a generalised way to produce a finite model that can combine all models where each model can deal with the portion of dataset and finally the output from each model or prediction classifier can be feed as an input to our meta-model in order to obtain the final result.
This technique of stacking is useful in many ways. Firstly, it improves the score or efficiency of the entire prediction model. Secondly, it combines all classification algorithms which is helpful in maintaining low variance.
Also Read, clustering in machine learning