Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Multiclass Classification
2.1.
Implementation of multiclass classification using scikit-learn
3.
Multilabel Classification
3.1.
Implementation of multilabel classification using scikit-learn
4.
Multioutput classification using scikit-learn
5.
Frequently Asked Questions
5.1.
How would you define a random forest?
5.2.
What is bias in Machine learning?
5.3.
What are weak learners?
5.4.
What is deep learning?
5.5.
What is a confusion matrix, and why is it used?
6.
Conclusion
Last Updated: Mar 27, 2024
Medium

MultiClass and Multioutput Classification using scikit-learn

Introduction

The demand for machine learning in the market is increasing day by day. It is becoming popular daily, so having good knowledge of machine learning algorithms is essential. Therefore, this blog will discuss multiclass and multioutput classification using scikit-learn.

intro image

In this article, we will cover the topic of multiclass, multilabel, and multioutput classification, and after that, we will also look at their implementations using scikit-learn. Now let us first look at what exactly multiclass classification is.

Multiclass Classification

As we all know, classification means categorizing data and forming groups based on similarities. In a particular dataset, the independent variables or features play an important role in classifying the data. A classification problem includes more than two classes, such as classifying a series of car brands based on photographs, where the car can be a Mercedes, BMW, or Toyota. 

Multiclass classification assumes that each sample can be only assigned to only one class, i.e, a car be either a Mercedes or a BMW it cannot be both at the same time. In multiclass classification, we will train a classifier using our training data and then use this classifier to classify new examples.

Implementation of multiclass classification using scikit-learn

Now, we will use the KNN (K-nearest neighbors) classifier to solve this problem. KNN is one of the simplest classification algorithms of machine learning. Whenever new data set is encountered, its k-nearest neighbors from the training data are examined. The distance between two different datasets can be calculated by finding the Euclidean distance between their two feature vectors. 

# importing required libraries such as sklearn for preparation of our model
from sklearn import datasets
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

# we require iris dataset for our model and we are loading the dataset
iris_dataset = datasets.load_iris()

# X is defined as features
#y is defined as labels
X = iris_dataset.data
y = iris_dataset.target

# spliting X, y data into train data and test data
x_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

# training a KNN classifier by importing KNeighborsClassifier 
knn = KNeighborsClassifier(n_neighbors = 7).fit(x_train, y_train)

# accuracy on X_test data
final_accuracy = knn.score(X_test, y_test)
print("final_accuracy is : ",final_accuracy)

# creating a confusion matrix for determining and visualizing the accuracy score
knn_predict = knn.predict(X_test)
confusion_matrix = confusion_matrix(y_test, knn_predict)
print("confusion matrix is : \n",confusion_matrix)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Multilabel Classification

Consider this house. What if somebody asks you does this single image only contain a house? Will the answer be YES or NO? Well, of course, the answer would be NO.

Consider one more case. What are things relevant to the above picture?

img
House Animals Cloud Tree River Hill
Yes No Yes Yes No No

 

These problems, where we have a whole set of target variables, are known as multilabel classification problems. Multilabel classification is generally used when there are two or more different classes, and the data we want to classify either does not belong to any of the classes or belongs to all classes simultaneously.

Implementation of multilabel classification using scikit-learn

Here each model is predicted in the order specified by the chain using the available features provided to the models.

The first model in the chain has no information about the other labels, but the last model in the chain contains characteristics indicating the existence of the other labels. Since the optimal ordering of the models in the chain is uncertain, randomly ordered chains are fit and their predictions are averaged together.

First model

from sklearn.datasets import make_multilabel_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.multioutput import ClassifierChain
X, Y = make_multilabel_classification( n_samples=12, n_classes=3, random_state=0)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state=0 )
base_lr = LogisticRegression(solver='lbfgs', random_state=0)
chain = ClassifierChain(base_lr, order='random', random_state=0)
chain.fit(X_train, Y_train).predict(X_test)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

 

chain.predict_proba(X_test)
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

 

Second model

import numpy as np
from sklearn.datasets import make_multilabel_classification
from sklearn.multioutput import MultiOutputClassifier
from sklearn.linear_model import LogisticRegression
X, y = make_multilabel_classification(n_classes=3, random_state=0)
clf = MultiOutputClassifier(LogisticRegression()).fit(X, y)
clf.predict(X[-2:])
You can also try this code with Online Python Compiler
Run Code

 

Output:

output

Multioutput classification using scikit-learn

This is a classification of machine learning where multiple outputs are predicted simultaneously. In multioutput classification, the model will have two or more different outputs after making any predictions. The model predicts only a single output in all the other classifications.

An example of multioutput classification is a model which predicts both name and brand of a car. For example, the brand of the car can be Mahindra, Maruti, or Hyundai, and the car's name can be THAR, Vitara, or Creta. The multioutput classification solves this problem and gives two predictions.
 

import numpy as np
from sklearn.utils import shuffle
from sklearn.multioutput import MultiOutputClassifier
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
num1, num2 = make_classification(n_samples=10, n_features=50,
                            n_informative=10, n_classes=4,
                            random_state=1)
num3 = shuffle(num2, random_state=1)
num4 = shuffle(num2, random_state=2)
Y = np.vstack((num2, num3, num4)).T
n_outputs = Y.shape[1] 
n_classes = 3
n_samples, n_features = num1.shape 
forest = RandomForestClassifier(random_state=2)
forest = MultiOutputClassifier(forest, n_jobs=2)
forest.fit(num1, Y).predict(num1)
You can also try this code with Online Python Compiler
Run Code

Output: 

output

Frequently Asked Questions

How would you define a random forest?

Random forest is a type of ensemble learning method which is majorly used for classification and regression. 

What is bias in Machine learning?

Bias occurs in a machine learning model when the values predicted are further from the actual values. Low Bias occurs when the prediction values are very close to the expected values. 

What are weak learners?

In ensemble learning, weak learners are the models that can be used as the fundamental blocks for designing more complex models by combining two or more of them.

What is deep learning?

Deep learning is a branch of machine learning that uses artificial neural networks to create systems that think and learn like people. The word "deep" refers to neural networks containing several layers.

What is a confusion matrix, and why is it used?

A confusion matrix is a type of table used to illustrate a classification model's performance. The confusion matrix allows us to visualize the performance of a model. It allows us to identify the confusion between any two classes.

Conclusion

In this article, we have discussed the topic of multiclass, multilabel, and multioutput classification, and after that, we have also discussed their coded implementation, which is done using scikit-learn. 

We hope that this article has helped you in some way, and if you liked our article, do upvote our article and help other ninjas grow.  You can refer to our Guided Path on Coding Ninjas Studio to upskill yourself in Data Structures and AlgorithmsCompetitive ProgrammingSystem Design, and many more!

Head over to our practice platform Coding Ninjas Studio to practice top problems, attempt mock tests, read interview experiences and interview bundles, follow guided paths for placement preparations, and much more!!

Live masterclass