Introduction
Decision Trees: Decision Trees are the Supervised Machine learning algorithm that can be used for Classification and Regression problems. A classification problem identifies the set of categories or groups to which an observation belongs. A Decision Tree uses a tree or graph-like model of decision. Each internal node represents a "test" on attributes, and each branch represents the outcome of the test. Each leaf node represents a class label (decision taken after computing all features).
Common Examples:
Let's draw a Decision Tree for calculating the XNOR table. The table is as follows:
INPUT-1 X1 |
INPUT-2 X2 |
OUTPUT Y |
1 |
1 |
1 |
1 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
1 |
Here we have three features, INPUT-1, INPUT-2, and OUTPUT.
Let's make the feature INPUT-1 the root node and build a Decision Tree.
Source: alliance.seas.uppnn.edu
Here we have split our features at each level depending upon the requirement, and at last, we have final decisions. In actual practice, the data is not as simple as this. The complexity of trees would increase as the data tend towards more practical real-world examples. Let's have a look at a similar kind of example:
Suppose we have to predict whether a student has been invited for an interview or not. We have three features, college type, is-intern, and has-project, and based on these, we have to make the Decision Tree.
TIER-1/TIER-2 |
TIER-3 |
IS-INTERN |
HAS-PROJECT |
GOT A CALL |
YES |
YES |
YES |
YES |
|
YES |
YES |
NO |
YES |
|
YES |
NO |
YES |
YES |
|
YES |
NO |
NO |
NO |
|
YES |
YES |
NO |
NO |
|
YES |
NO |
YES |
NO |
|
YES |
YES |
YES |
YES |
In the Machine Learning algorithm, we try to find a nearly accurate result and not the exact one. The task would be very complex if we train the algorithm to get the precise result every time. Similarly, the best version of the Decision Tree is very expensive and computationally impossible to build. So we try to develop a perfect Decision Tree that will give decent accuracy and be much easier to make. According to the above Decision Tree, if a student is from tier-3 college and has a fantastic internship(say Google) but doesn't have any project, they will not get an interview call, which is practically not feasible. Mistakes will come in the way when we have exceptions that don't match the data upon which the Decision Tree is built.
Mathematics behind Decision Tree:
Following are the steps that we need to follow to build a very good Decision Tree:-
1) We need to select the target attribute/feature from the given attributes. In the above example, the target attribute is “GOT A CALL,” which has two classes of values, either yes or no.
2) Calculate the information gain of the selected target attribute
Information Gain = -PP+N log2(PP+N) - NP+N log2(NP+N)
Where P = number of the elements in class 1 of the target attribute.
N = number of the elements in class 2 of the target attribute.
In our case, the target attribute has two classes, “YES” and “NO”. So, P=5, and N=5.
3) Now for the remaining features/attributes.
- We calculate the entropy for all the Categorical variables.
- We take average information entropy for the current attribute.
The entropy of a given attribute is given by:
Entropy = i=1v Information gain of that i th categorical variable x Probability of i th categorical variable
Here 'V is the number of categorical variables in that attribute.
For example, if the attribute is "IS-INTERN," then it has two categorical variables, "YES" and "NO."
I.G of “YES” = -47log2(47) say E1
I.G of “NO” = -37log2(37) say E2
The entropy of the attribute “IS-INTERN” = E1+E2
4) Wel will now find the Gain of each attribute (except the target attribute).
Gain of Individual attribute = Information Gain of the target attribute - Entropy of that attribute.
5) The attribute having maximum Gain value will be our Root node. We will divide the root node into subsets that contain possible values for the best feature.
6) Recursively make new decision trees using the subsets of the dataset created in step 5, till the point where we don't have any features to split upon.