Table of contents
1.
Introduction
2.
What is Binary Classification?
3.
Building Binary Classifier:
4.
Data Set Description
4.1.
Load the Dataset
4.2.
Extract Feature With CountVectorizer
4.3.
Fit the Data
4.4.
Naive Bayes Classifier
4.4.1.
Sample Output
4.4.2.
Visualizing the Output and Data
4.4.3.
Sample Output
5.
Frequently Asked Questions:
6.
Key Takeaways
Last Updated: Mar 27, 2024

Binary Classification

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Have you ever used any detection algorithm where it classifies? For example, let's think of classifying emails into “Spam” and “Not Spam”, so here we label them as Spam - 1 and Not spam-0. The most common examples of Binary Classification are Credit Card Fraud detection, Sentimental analysis. Let’s develop a simple sample model for Binary Classification to understand better.

 

Source

Also See, Resnet 50 Architecture

What is Binary Classification?

Binary classification is the task of classifying the elements of a set into two groups on the basis of a classification rule. 

In the below graph, we have two classes ‘0’ and ‘1’, where 0→ is represented with blue and 1→ with orange color. Based on the classes a graph is plotted to differentiate two classes.

Building Binary Classifier:

The first and foremost step to build any model is to import libraries like pandas, sklearn. Keras and Tensorflow are mostly used libraries to build a model because they provide better accuracy and fewer entropy values.

 

from flask import Flask, render_template, request
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
You can also try this code with Online Python Compiler
Run Code

Data Set Description

So let’s take an easy example to implement a binary classification algorithm. Here we are going to consider the Spam classifier model. Here we have 87% of “Not Spam” and 13% of “Spam”.

 

Download your dataset here.

Load the Dataset

In the below code we are loading our data set and labeling our features as “Spam” → 1 and “Not spam/ Ham” → 0.  

df = pd.read_csv("spam.csv", encoding="latin-1")
df.drop(['Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4'], axis=1, inplace=True)
# Features and Labels
df['label'] = df['class'].map({'ham': 0, 'spam': 1})
X = df['message']
y = df['label']
You can also try this code with Online Python Compiler
Run Code

Extract Feature With CountVectorizer

Generally, whenever a data set is taken, it always needs to be cleaned. In terms of machine learning, we need to extract the important features from the dataset. To extract the features here we are using the CountVectorizer algorithm.

# Extract Feature With CountVectorizer
    cv = CountVectorizer()
You can also try this code with Online Python Compiler
Run Code

Fit the Data

To fit the data into a model we need to split our data set into train and test data. In general, the data is divided in the ratio of 8:2.

X = cv.fit_transform(X)  # Fit the Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
You can also try this code with Online Python Compiler
Run Code

Naive Bayes Classifier

Naive Bayes classifiers are a collection of classification algorithms based on Bayes' Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.
 

Naive Bayes uses a similar method to predict the probability of different classes based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes.

 

To train the Spam Classifier we are using Naive Bayes as this is one of the best algorithms for classifying.

# Naive Bayes Classifier
  clf = MultinomialNB()
  clf.fit(X_train, y_train)
  clf.score(X_test, y_test)
  if request.method == 'POST':
      message = request.form['message']
      data = [message]
      vect = cv.transform(data).toarray()
       my_prediction = clf.predict(vect)
  return render_template('index.html', prediction=my_prediction)
You can also try this code with Online Python Compiler
Run Code

This code snippet returns the prediction of your given text and Spam classifier.

Sample Output

 

Visualizing the Output and Data

Visualizing data is the most important part while developing any machine learning model. 

 

Sample Output

 

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()
 
acc = hist.history['accuracy']
val = hist.history['val_accuracy']
epochs = range(1, len(acc) + 1)
 
plt.plot(epochs, acc, '-', label='Training accuracy')
plt.plot(epochs, val, ':', label='Validation accuracy')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(loc='lower right')
plt.plot()
You can also try this code with Online Python Compiler
Run Code

Also read, Artificial Intelligence in Education

Frequently Asked Questions:

  1. What do you mean by Binary Classification?
    Binary Classification helps to differentiate two classes by different means like plotting graphs, labeling, etc.
     
  2. What is Binary text Classification?
    It is a supervised learning algorithm in which we try to predict whether a piece of text follows into some category.
     
  3. List some Binary Classifiers?
    Naive Bayes, K- Nearest Neighbours, Support Vector Machine are some of the Binary Classifiers.
     
  4. What is Binary Classification Data set?
    The goal of a binary classification problem is to create a machine learning model that makes a prediction in situations where the thing to predict can take one of just two possible values.

Key Takeaways

In this blog, we discussed 

  • Binary Classification
  • Developing a Binary Classification Model
  • Understanding the Binary Classification Model


To learn more about Machine Learning, take this awesome course from CodingNinjas.

Live masterclass