Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
We can use the Support Vector Machine algorithm for both regression and classification-based tasks. This blog will take one classification problem and build one SVM model to classify the features accurately.
The Breast Cancer dataset is accessible from Kaggle (from this link). This breast cancer dataset was obtained from the University of Wisconsin Hospitals, Madison, from Dr. William H. Wolberg.
For each cell nucleus, we have five features:
1. Mean Radius: (Float-type) It is the mean of distances from the center to the points on the perimeter.
2. Mean Texture: (Float-type) It is the value of the standard deviation of gray-scale values.
3. Mean Perimeter: (Float-type) It is the circumference of the nucleus cell.
4. Mean Area: (Float-type) It is the total area occupied by the cell.
5. Mean Smoothness: (Float-type) It is the local variation in radius lengths.
Our task is to analyze these features for different records and predict whether a person has breast cancer or not. The target feature is diagnosis (int-type), with two values, 0 or 1, corresponding to benign or malignant cancer cells.
Step I
The first step in any ML task is to import all the necessary libraries in the notebook.
# Basic libraries to manipulate & explore the dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
# Models from sklearn
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.svm import SVC
You can also try this code with Online Python Compiler
Checking the dataset is balanced or not by plotting target values.
graph = sns.countplot(x='diagnosis', data=data);
for p in graph.patches:
graph.annotate('{:.1f}'.format(p.get_height()), (p.get_x()+0.25, p.get_height()+0.01))
You can also try this code with Online Python Compiler
Splitting the dataset into training and testing - We’ll use the training dataset to train our SVM model for the predictions and the testing dataset to check the accuracy of the predictions.
# x contains all the independent features
# y is the target variable
x = data.drop(['diagnosis'], axis=1).values
y = data['diagnosis'].values
#splitting in training and testing
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=101)
print("The training data has", x_train.shape[0], "records.")
print("The testing data has", x_test.shape[0], "records.")
You can also try this code with Online Python Compiler
Results: The accuracy of our SVM model is 0.89, which is pretty good.
From the Confusion matrix, we can see the value of True Positives is 41, which means that 41 females have been correctly classified as ‘not having cancer’ and True Negatives is 86, which means that 86 females have correctly been classified as ‘having cancer.’
Our model has predicted that 55 (41 + 14) females don’t have breast cancer; hence the value of False Negative is 14.
The model also classified two females as ‘having cancer,’ but they didn’t have cancer in reality, so False Positive’s value is 2.
Frequently Asked Questions
1). What are the applications of SVMs? Support Vector Machine has many use-cases in real-world projects like:
Face Detection,
Gene Classification,
Email Classification,
Handwriting Recognition
2). What is the use-case of SVM?
SVM is a supervised machine learning algorithm that can be used for classification or regression problems.
3). What is the standard Kernel Function equation?
The standard kernel function equation is:
Key Takeaways
Congratulations on making it this far. In this blog, we implemented a Support Vector Machine Algorithm using a Breast Cancer Dataset. Click here, If you want to learn the theoretical aspects of SVM. The kernel is essentially a function to perform calculations even in the higher dimensions in SVM. Learn more about Kernels from this blog.
You can also consider our Online Coding Courses such as the Machine Learning Course to give your career an edge over others.
If you are preparing for the upcoming Campus Placements, then don’t worry. Coding Ninjas has your back. Visit this link for a carefully crafted and designed course on-campus placements and interview preparation.
Live masterclass
System Design Questions Asked at Microsoft, Oracle, PayPal
by Pranav Malik
23 Apr, 2025
01:30 PM
Master DSA to Ace MAANG SDE Interviews
by Saurav Prateek
21 Apr, 2025
01:30 PM
Google Data Analyst roadmap: Essential SQL concepts
by Maaheen Jaiswal
22 Apr, 2025
01:30 PM
Amazon Data Analyst: Advanced Excel & AI Interview Tips
by Megna Roy
24 Apr, 2025
01:30 PM
System Design Questions Asked at Microsoft, Oracle, PayPal