Code360 powered by Coding Ninjas X Code360 powered by Coding Ninjas X
Table of contents
Decision Trees
Information Gain
How do we predict a new data point by a decision tree?
Last Updated: Mar 27, 2024

C4.5 Algorithm in Data Mining

Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM


In Data Mining, the C4.5 algorithm is utilized as a Decision Tree Classifier, which can be used to decide based on a sample of data (univariate or multivariate predictors).
So, before we jump right into C4.5, let's talk about Decision Trees and how they may be used to classify data. Let's first brush up on our concepts of decision trees, and then we can go into the algorithm.

Decision Trees

We have all used decision trees many times, and it's nothing but a tree of decisions where the current decision depends on the previous decisions, and there are always a plethora of choices at each level. 

The figure below explains it in a nutshell. We have some decisions to make at each node, and our past choices influence the current decision. 


Information Gain

Suppose we have accumulated information through time that allows us to anticipate whether or not something will occur reliably. In that case, the information you have provided about the event you have predicted is not new. However, if the situation deteriorates and an unexpected event occurs, the information is helpful and required.

Let's pretend we're watching a coin toss. Any side of a fair coin has a 50% chance of being expected. If the coin is unfair, and the likelihood of getting a HEAD or TAIL is 1.00, we claim the Entropy is minimal since we can predict the coin toss outcome without any trials.


This is the formula of Entropy and is maximum when we don't know anything about the system, now, more Entropy means less knowledge about the system, so we like to have the Entropy of the system decrease. 

Hence when we want to split a node, we would like that decrease of Entropy to be maximum. 

Hence, we do this when we split a node in the tree. 

We examine each node for the likelihood of splitting. The Information Gain Ratio (m/N = p) and (n/N = q) are the ratios of observations to total observations (m/N = p) and (n/N = q), respectively, where m+n=Nm+n=N and p+q=1p+q=1. If the following node's Entropy is less than the Entropy before splitting, and if this value is the smallest among all conceivable test cases for splitting, the node is divided into its purest elements.

Here is the pseudo-code of the algorithm to build a decision tree. 

  1. Check for the above base cases.
  2. For each attribute a, find the normalized information gain ratio from splitting on a.
  3. Let a_best be the attribute with the highest normalized information gain.
  4. Create a decision node that splits on a_best.
  5. Recurse on the sublists obtained by splitting on a_best and adding those nodes as children of the node.

The actual algorithm also uses some more concepts like pruning handles continuous and discrete data as well, which are just minor adjustments that are made to the above process of creating a decision tree. 

We can implement these algorithms using predefined libraries like Sklearn, which uses similar algorithms to create a decision tree and categorize the input. 

Recommended topic: Machine Learning

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job

How do we predict a new data point by a decision tree?

We start from the root and keep checking the variable of the datapoint along which the split happened, and keep making the decisions according to the category to which the data point belongs. 

Finally, when we reach a point where there are no children, we stop and give the output and tell the category to which it belongs. 


1. How does the C4 5 algorithm work?

In Data Mining, the C4. 5 technique is utilized as a Decision Tree Classifier, which can be used to make a decision based on a sample of data (univariate or multivariate predictors).

2. Is C4 5 algorithm supervised or unsupervised?

This is supervised learning since the training dataset is labelled with classes. Using the patient example, C4. 5 doesn't learn on its own that a patient will get cancer or won't get cancer.

3. What are the new features of C4 5?

The J48 implementation of the C4. 5 algorithm has many additional features, including accounting for missing values, decision tree pruning, continuous attribute value ranges, derivation of rules, etc. In the WEKA data mining tool, J48 is an open-source Java implementation of the C4.

4. What is the difference between ID3 and C4 5?

ID3 only work with Discrete or nominal data, but C4. 5 work with both Discrete and Continuous Data. Random Forest is entirely different from ID3 and C4. 5; it builds several trees from a single data set and selects the best decision among the forest of trees it generates.


So, in a nutshell, C4.5 works by creating decision trees similar to the way ID3 works, but it improves in many aspects like pruning and the ability to handle discrete and continuous data.

Hey Ninjas! Don't stop here; check out Coding Ninjas for Machine Learning, more unique courses and guided paths. Also, try Coding Ninjas Studio for more exciting articles, interview experiences, and fantastic Data Structures and Algorithms problems. 

Happy Learning!

Live masterclass