Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Last Updated: Mar 27, 2024

AdaBoost Algorithm

Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Speaker
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM

Introduction

Have you ever heard of ensemble methods in machine learning? Ensemble methods aid in creating many models, which are then combined to generate better results. 

Boosting is an ensemble method in which each model or classifier runs depending on the next model's features. The boosting method distinguishes a more robust learner model from a weaker one by averaging their weights.

In this article, we will learn more about the AdaBoost (Adaptive Boost) algorithm, which is a boosting method that aims to build a strong classifier from several weak classifiers.

Let’s start by discussing the basics of ensemble learning.

Ensemble Learning

Ensemble learning is a method of combining numerous base algorithms to create a single optimised predictive algorithm. A standard decision tree for classification, for example, takes  numerous factors, converts them into rule questions, and then makes a decision or examines another factor for each factor. 

If there are several decision rules, such as if the threshold for making a decision is unclear or adding new sub-factors to evaluate, the decision tree's result can become ambiguous. Ensemble methods can be handy in this situation. Ensemble methods take several distinct trees and aggregate them into one final, strong predictor, rather than relying on one decision tree to get the correct result.

Types of Ensemble Learning

Ensemble methods are further divided into two categories:

  • Sequential Learners are models that are generated in sequential order and learn from the mistakes of preceding models. This seeks to exploit model dependency by providing mislabeled samples with larger weights (e.g. AdaBoost).
  • Parallel Learners are based on the generation of base models in parallel. This makes use of the models' independence by averaging out the errors (e.g. Random Forest).

Boosting Ensemble Methods

The boosting method seeks to develop a powerful learner (predictive model) from the mistakes of numerous weaker models, similar to how individuals learn from their mistakes and try not to repeat them in the future. To begin, you must first create a model using the training data. Then you develop a second model based on the first, attempting to reduce the previous model's flaws. Models are added in sequential order, each correcting the one before it until the training data is properly predicted or the maximum number of models is reached.

In this article, we'll go through the basics of AdaBoost, which is the most commonly used boosting approach.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Need for AdaBoost Algorithm

AdaBoost is best used to improve decision tree performance on binary classification issues.

  • AdaBoost is a machine learning algorithm that may be used to improve the performance of any other machine learning technique. It works well with students who are struggling. These are models that reach an accuracy just above random chance on a classification task.
  • Decision trees with one level are the most suitable and commonly used algorithm with AdaBoost. These trees are known as decision stumps because they are so short and only have one categorised decision.

Working of AdaBoost Algorithm

Let's have a look at how the AdaBoost algorithm works. The data training period creates a certain number of decision trees. The improperly categorised record in the first model is given priority as the first decision tree/model is constructed. Only these records are sent to the second model as input. The procedure continues until we have decided on several base learners to develop. Remember that all boosting strategies allow for record repetition.

Algorithm

The following algorithm can be used to describe how AdaBoost works:

  1. Set up the dataset and give each data point the same amount of weight.
  2. Provide this as an input to the model and find the data points that were incorrectly classified.
  3. Increase the weight of the data points that were incorrectly classified.
  4. If you got the required results:
    → Proceed to step 5.
  5. Otherwise, go to step 2.
  6. End

Example

(Diagram explaining the AdaBoost algorithm)

  • The above diagram shows that Adaboost begins by randomly selecting a training subset. It trains the AdaBoost machine learning model iteratively by picking the training set based on the previous training's accurate prediction.
  • It gives incorrectly classified observations a larger weight to have a higher chance of being classified in the next iteration. It also allocates weight to the trained classifier in each iteration based on the classifier's accuracy. The more accurate the classifier, the more weight will be given to it.
  • This method is repeated until all of the training data fits perfectly or the maximum number of estimators is reached.
  • Perform a "vote" across all of the learning algorithms you created to classify them.

Implementation in Python

The scikit-learn library provides the AdaBoostClassifier and AdaBoostRegressor classes in Python. In our example, we'd use AdaBoostClassifier (since our example is a classification task). Our dataset is split into training and test sets using the train-test split technique. We also import datasets from which we will use the Iris Dataset in the program.

Step 1 (Import Required Libraries and Load Iris Dataset)

You can use the IRIS dataset to develop the model, which is a well-known multi-class classification problem. There are four features in this dataset: sepal length, sepal width, petal length, and petal width, as well as a target (the type of flower). “Setosa”, “Versicolor”, and “Virginica” are the three flower classifications represented in this data. You may find the dataset in the scikit-learn library or get it from the UCI Machine Learning Library.

#program for adaboost classifier to classify iris dataset using sklearn
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.ensemble import AdaBoostClassifier

#load the iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
print(iris.data)

Output

Step 2 (Split the Dataset)

We divide the data into two sets: a training set and a testing set because training and testing on the same data are inefficient for classification. To separate the data, we utilise the "train_test_split" function.

#split the dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)
#shape of train and test objects
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

Output

Step 3 (Create and Train the Classifier)

Let's use Scikit-learn to build the AdaBoost Model. AdaBoost's default classifier is the Decision Tree Classifier. We then use the model's predict() method to determine which class it belongs to.

#create the adaboost classifier
clf = AdaBoostClassifier(n_estimators=50, random_state=1)
#train the classifier
clf.fit(X_train, y_train)
#predict the test set
y_pred = clf.predict(X_test)

Step 4 (Evaluate the model)

Let's calculate how well the classifier or model can predict the cultivar type. By comparing actual test set values to expected values, accuracy can be calculated.

#print the accuracy
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Output

Frequently Asked Questions

Is AdaBoost intended solely for classification?

AdaBoost works by giving more weight to cases that are difficult to classify and less to those already well-classified. The AdaBoost algorithm can solve both classification and regression problems.

 

In what ways does AdaBoost improve classifier accuracy?

It combines many classifiers to improve classifier accuracy. AdaBoost is a method for creating iterative ensembles. The AdaBoost classifier creates a powerful classifier by combining several low-performing classifiers, resulting in a high-accuracy classifier.

 

What distinguishes the iris dataset?

The Iris dataset contains 50 samples of three Iris species with four characteristics (length and width of sepals and petals) (Iris setosa, Iris virginica and Iris versicolor). We use these measurements to develop a linear discriminant model to classify the species.
 

Why is AdaBoost considered adaptive?

AdaBoost is adaptive in that it tweaks successful weak learners in favour of instances misclassified by earlier classifiers. It may be less prone to the overfitting problem than other learning algorithms in particular situations.

Conclusion

This article extensively discussed the AdaBoost algorithm and its implementation in the Python programming language.

The key points covered in this article on the AdaBoost Algorithm are as follows:

  • Ensemble learning
  • Types of Ensemble Learning
  • Boosting Ensemble Methods
  • Need for AdaBoost Algorithm
  • Working of AdaBoost Algorithm
  • Implementation

We hope that this blog has helped you enhance your knowledge regarding the AdaBoost algorithm. If you want to learn more, check out our articles on "Data Preprocessing," "Python in Data Mining," "Orange in Data Mining," "Applications of Data Mining" and "Outliers in Data Analysis." Do upvote our blog to help other ninjas grow.

Head over to our practice platform Coding Ninjas Studio to practice top problems, attempt mock tests, read interview experiences, interview bundle, follow guided paths for placement preparations and much more!

Happy Reading!

Topics covered
1.
Introduction
2.
Ensemble Learning
2.1.
Types of Ensemble Learning
2.2.
Boosting Ensemble Methods
3.
Need for AdaBoost Algorithm
4.
Working of AdaBoost Algorithm
4.1.
Algorithm
4.2.
Example
5.
Implementation in Python
5.1.
Step 1 (Import Required Libraries and Load Iris Dataset)
5.2.
Output
5.3.
Step 2 (Split the Dataset)
5.4.
Output
5.5.
Step 3 (Create and Train the Classifier)
5.6.
Step 4 (Evaluate the model)
5.7.
Output
6.
Frequently Asked Questions
6.1.
Is AdaBoost intended solely for classification?
6.2.
In what ways does AdaBoost improve classifier accuracy?
6.3.
What distinguishes the iris dataset?
6.4.
Why is AdaBoost considered adaptive?
7.
Conclusion