Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
An Optimal Binary Search Tree (OBST) is a specialized binary search tree designed for efficient searching when the access frequencies of different keys vary. The goal of an OBST is to minimize the cost of searches across the tree, making it a valuable tool in scenarios like database querying where some records are accessed more frequently than others.
In this article, we will learn about OBST, characteristics, its advantages and disadvantages and much more.
What is Optimal Binary Search Tree (OBST)?
An Optimal Binary Search Tree (OBST) is a specialized type of binary search tree designed to minimize the expected search cost. In a regular binary search tree, the search cost refers to the number of comparisons needed to find a specific key (data value).
OBST takes things a step further by considering the access frequency of each key. It aims to arrange the keys in the tree strategically to minimize the overall search cost based on how often each key is likely to be searched for.
Key Characteristics of OBST
Probability-Based Node Placement: In an OBST, nodes (keys) are placed based on their access probabilities. Keys that are accessed more frequently are placed closer to the root of the tree, reducing the average time it takes to access these keys.
Cost Minimization: The structure of the OBST aims to minimize the total cost of searches. This cost is calculated based on the probability of each key being accessed and the depth at which each key is located in the tree.
Dynamic Construction: Constructing an OBST is not straightforward like a regular binary search tree. It requires a dynamic approach, typically employing dynamic programming techniques to find the optimal structure.
How OBST Works?
The construction of an OBST involves the following steps:
Identify Key Frequencies: Determine the frequency or probability of access for each key in the dataset.
Calculate Cost for Different Arrangements: Using dynamic programming, calculate the cost of different tree configurations. This involves considering each key as a potential root and then calculating the cost based on the sum of the probabilities of each key, weighted by the depth at which the key would be placed in the tree.
Select Optimal Configuration: The configuration that yields the lowest overall cost is chosen as the OBST.
Time and Space Complexity
Time Complexity: The process of constructing an OBST is computationally intensive. The time complexity for constructing an OBST using dynamic programming is
O(n3)
, where 'n' is the number of keys. This is because it involves calculating the cost of each possible subtree for each key.
Space Complexity: The space complexity for OBST construction is
O(n2)
due to the need to store the costs of different subtrees during the dynamic programming process.
Advantages of OBST
Efficient Searching: For datasets with non-uniform access patterns, OBSTs provide faster search times compared to regular binary search trees.
Reduced Average Search Time: OBSTs are tailored to the specific frequency distribution of the dataset, minimizing the average time required to access an element.
Customization: They can be customized for specific usage patterns, making them ideal for applications like database indexing where certain records are queried more frequently.
Disadvantages of OBST
Increased complexity in insertion and deletion operations.
Imbalance can occur, leading to degradation in search time.
High memory overhead due to storing additional balance information.
Difficulty in implementation compared to simpler binary search trees.
Performance degradation with frequent modifications to the tree structure.
Example of How OBST Works
Let's take a more detailed look at the example to better understand how an Optimal Binary Search Tree (OBST) is constructed and functions. We will use a small set of keys with associated probabilities to illustrate the process.
Given Data
Suppose we have the following keys and their respective access probabilities:
Key 10 with a probability of 0.1
Key 20 with a probability of 0.2
Key 30 with a probability of 0.4
Key 40 with a probability of 0.2
Key 50 with a probability of 0.1
Step-by-Step Construction
Calculation of Cumulative Costs: We start by calculating the cumulative costs and weights for all possible subtrees. This is done using dynamic programming.
Constructing Cost Matrix:
The cost matrix will have dimensions 5x5 for our 5 keys.
Each cell (i, j) in this matrix represents the cost of the OBST containing keys from i to j.
Initially, the diagonal elements, representing trees with a single key, are set to the key's probability (as this is the cost of a tree with a single node).
Filling the Matrix:
We then fill this matrix for subtrees of increasing sizes.
For each size, we consider all possible roots and calculate the cost of the tree with that root. This cost includes the sum of the probabilities of all keys in the subtree (as every search operation goes through the root) and the cost of left and right subtrees.
Choosing the Optimal Root: For each subset of keys, the root that gives the minimum cost is chosen. This involves comparing the cost of different configurations and selecting the one with the lowest cost.
Example Calculation
Let's consider a subset of keys 10, 20, and 30. We calculate the cost of trees with each of these keys as the root:
Tree with 10 as Root: The cost will include the cost of searching 10 (its probability), plus the cost of the subtree containing 20 and 30.
Tree with 20 as Root: The cost includes the cost of searching 20, plus the cost of the subtree containing 10 on the left and 30 on the right.
Tree with 30 as Root: Similar calculation as above.
We sum these costs and choose the root that gives the minimum sum.
Result
The result of this process is an OBST where keys are arranged not just based on their values but also based on their access probabilities, ensuring that the overall cost of searches in the tree is minimized.
In our example, key 30 might be the root due to its highest probability, but the exact structure depends on the detailed cost calculations.
Optimal Substructure in Optimal Binary Search Trees (OBSTs)
Optimal substructure is a fundamental property in dynamic programming, and it plays a crucial role in solving problems like constructing Optimal Binary Search Trees (OBSTs). It means that an optimal solution to a larger problem can be constructed from optimal solutions to its smaller subproblems.
In the context of OBSTs, optimal substructure refers to the fact that if you have an optimal tree for a given set of keys and their probabilities, then you can construct optimal trees for subsets of those keys while preserving their optimality.
Explanation
Consider a set of keys K[i,j] where i≤j. If you have already constructed an optimal BST
T[i,j] for this set of keys, you can reuse this optimal subtree when you extend it to include additional keys.
Suppose
r is the root of the optimal BST T[i,j]. Then, T[i,j] consists of three parts:
The root node r.
The optimal BST T[i,r−1] for keys K[i,r−1].
The optimal BST T[r+1,j] for keys K[r+1,j].
This property allows us to break down the problem of constructing an optimal BST for a larger set of keys into subproblems of constructing optimal BSTs for smaller sets of keys.
Java
C++
Python
Java
public class OptimalBST { // Function to construct an optimal BST for a given set of keys and probabilities static Node constructOptimalBST(int[] keys, double[] p, int i, int j) { if (i > j) { return null; }
// Find the root with minimum expected cost int minRoot = findMinRoot(keys, p, i, j);
// Function to find the root with minimum expected cost static int findMinRoot(int[] keys, double[] p, int i, int j) { int minRoot = -1; double minCost = Double.MAX_VALUE; double cost;
for (int r = i; r <= j; r++) { cost = expectedCost(keys, p, i, j, r); if (cost < minCost) { minCost = cost; minRoot = r; } }
return minRoot; }
// Function to calculate the expected cost for a given root static double expectedCost(int[] keys, double[] p, int i, int j, int r) { double cost = 0; for (int k = i; k <= j; k++) { cost += p[k]; } return cost; }
public static void main(String[] args) { int[] keys = {10, 20, 30}; double[] p = {0.1, 0.2, 0.3}; int n = keys.length;
Node root = constructOptimalBST(keys, p, 0, n - 1); // Print the constructed optimal BST printBST(root); } }
class Node { int data; Node left, right;
public Node(int data) { this.data = data; left = right = null; } }
Dynamic Programming Approach for Constructing OBST
The construction of an Optimal Binary Search Tree (OBST) using dynamic programming is a systematic approach that involves breaking down the problem into smaller subproblems, solving each of those subproblems just once, and storing their solutions.
The Dynamic Programming Solution
Initialization:
Create a cost matrix cost[][] where cost[i][j] will store the minimum cost of constructing an OBST from keys i to j. Initially, the cost is 0 for all entries.
Create a frequency sum matrix sumFreq[][] to hold the sum of frequencies from i to j.
Filling the Cost Matrix:
Fill the diagonal of the cost matrix with the frequencies of individual keys, as a tree with a single key has a search cost equal to its frequency.
Now, for each chain length from 2 to n (where n is the number of keys), calculate the minimum cost of all possible binary search trees with that chain length.
Calculating Cost for Subtrees:
For each subset of keys, compute the cost for different roots and choose the root that minimizes the cost.
The cost for a given root r is the sum of the cost of the left and right subtrees plus the sum of frequencies from i to j (as each search will have to pass through the root).
Choosing the Optimal Root:
For each pair (i, j), where i is the start and j is the end of a subset, consider each key as a potential root.
Compute the cost when that key is the root and update cost[i][j] with the minimum cost found.
Final Output:
After filling the matrix, cost[0][n-1] will hold the cost of the OBST for all keys.
Code Implementation
Java
C++
Python
Java
public class OptimalBST { // Function to calculate minimum cost of constructing an OBST static int optimalSearchTree(int keys[], int freq[], int n) { int cost[][] = new int[n][n]; int sumFreq[][] = new int[n][n];
// Fill the diagonal and sumFreq matrix for (int i = 0; i < n; i++) { cost[i][i] = freq[i]; sumFreq[i][i] = freq[i]; }
// Calculate cost for chains of length L for (int L = 2; L <= n; L++) { for (int i = 0; i <= n - L; i++) { int j = i + L - 1; cost[i][j] = Integer.MAX_VALUE; sumFreq[i][j] = sumFreq[i][j - 1] + freq[j];
for (int r = i; r <= j; r++) { int c = ((r > i) ? cost[i][r - 1] : 0) + ((r < j) ? cost[r + 1][j] : 0) + sumFreq[i][j]; if (c < cost[i][j]) cost[i][j] = c; } } } return cost[0][n - 1]; }
public static void main(String[] args) { int keys[] = {10, 20, 30, 40, 50}; int freq[] = {4, 2, 6, 3, 5}; int n = keys.length; System.out.println("Cost of Optimal BST is: " + optimalSearchTree(keys, freq, n)); } }
You can also try this code with Online Java Compiler
#include <iostream> #include <vector> using namespace std;
int optimalSearchTree(vector<int>& keys, vector<int>& freq, int n) { vector<vector<int>> cost(n, vector<int>(n, 0)); vector<vector<int>> sumFreq(n, vector<int>(n, 0));
for (int i = 0; i < n; i++) { cost[i][i] = freq[i]; sumFreq[i][i] = freq[i]; }
for (int L = 2; L <= n; L++) { for (int i = 0; i <= n - L; i++) { int j = i + L - 1; cost[i][j] = INT_MAX; sumFreq[i][j] = sumFreq[i][j - 1] + freq[j];
for (int r = i; r <= j; r++) { int c = ((r > i) ? cost[i][r - 1] : 0) + ((r < j) ? cost[r + 1][j] : 0) + sumFreq[i][j]; if (c < cost[i][j]) cost[i][j] = c; } } } return cost[0][n - 1]; }
int main() { vector<int> keys = {10, 20, 30, 40, 50}; vector<int> freq = {4, 2, 6, 3, 5}; int n = keys.size(); cout << "Cost of Optimal BST is: " << optimalSearchTree(keys, freq, n) << endl; return 0; }
You can also try this code with Online C++ Compiler
The recursive memoized solution for constructing an Optimal Binary Search Tree (OBST) involves breaking down the problem into smaller subproblems, solving them recursively, and storing their results to avoid redundant calculations. This approach is particularly efficient in reducing the time complexity by avoiding the computation of the same subproblems repeatedly.
Recursive Memoized Approach Explained
Recursive Function: Define a recursive function that calculates the minimum cost of an OBST for a given range of keys. This function will call itself to compute the costs of the left and right subtrees.
Memoization: Use a matrix to store the results of subproblems. Before computing the cost for a range of keys, check if the cost has already been computed and stored in the matrix. If so, use the stored value instead of recalculating.
Computing Cost: The cost of an OBST for a range of keys is calculated by considering each key in the range as the root and summing the cost of the left and right subtrees, along with the sum of frequencies of all keys in the range.
Java
C++
Python
Java
public class OptimalBSTMemoized { static int[][] memo;
// Function to calculate sum of frequencies from i to j static int sum(int[] freq, int i, int j) { int s = 0; for (int k = i; k <= j; k++) { s += freq[k]; } return s; }
// Recursive function with memoization to find the cost of OBST static int optimalCost(int[] freq, int i, int j) { // Base case: If there are no elements in this range if (j < i) { return 0; }
// If already computed, return the stored value if (memo[i][j] != Integer.MAX_VALUE) { return memo[i][j]; }
// Try making all keys in the interval as root and pick minimum for (int r = i; r <= j; r++) { int cost = optimalCost(freq, i, r - 1) + optimalCost(freq, r + 1, j) + sum(freq, i, j); memo[i][j] = Math.min(memo[i][j], cost); }
return memo[i][j]; }
static int optimalSearchTree(int keys[], int freq[], int n) { memo = new int[n][n]; for (int i = 0; i < n; i++) { for (int j = 0; j < n; j++) { memo[i][j] = Integer.MAX_VALUE; } } return optimalCost(freq, 0, n - 1); }
public static void main(String[] args) { int keys[] = {10, 20, 30, 40, 50}; int freq[] = {4, 2, 6, 4, 5}; int n = keys.length; System.out.println("Cost of Optimal BST is: " + optimalSearchTree(keys, freq, n)); } }
You can also try this code with Online Java Compiler
#include <iostream> #include <vector> #include <climits> using namespace std;
vector<vector<int>> memo;
int sum(const vector<int>& freq, int i, int j) { int s = 0; for (int k = i; k <= j; k++) { s += freq[k]; } return s; }
int optimalCost(const vector<int>& freq, int i, int j) { if (j < i) { return 0; }
if (memo[i][j] != INT_MAX) { return memo[i][j]; }
for (int r = i; r <= j; r++) { int cost = optimalCost(freq, i, r - 1) + optimalCost(freq, r + 1, j) + sum(freq, i, j); memo[i][j] = min(memo[i][j], cost); }
return memo[i][j]; }
int optimalSearchTree(vector<int>& keys, vector<int>& freq, int n) { memo.assign(n, vector<int>(n, INT_MAX)); return optimalCost(freq, 0, n - 1); }
int main() { vector<int> keys = {10, 20, 30, 40, 50}; vector<int> freq = {4, 2, 6, 4, 5}; int n = keys.size(); cout << "Cost of Optimal BST is: " << optimalSearchTree(keys, freq, n) << endl; return 0; }
You can also try this code with Online C++ Compiler
def sumFreq(freq, i, j): return sum(freq[k] for k in range(i, j + 1))
def optimalCost(freq, i, j, memo): if j < i: return 0
if memo[i][j] != float('inf'): return memo[i][j]
for r in range(i, j + 1): cost = optimalCost(freq, i, r - 1, memo) + optimalCost(freq, r + 1, j, memo) + sumFreq(freq, i, j) memo[i][j] = min(memo[i][j], cost)
return memo[i][j]
def optimalSearchTree(keys, freq, n): memo = [[float('inf') for _ in range(n)] for _ in range(n)] return optimalCost(freq, 0, n - 1, memo)
What are the practical applications of Optimal Binary Search Trees?
Optimal Binary Search Trees find applications in compiler optimization, database systems, dynamic programming, and data compression through techniques like Huffman coding.
How does the choice of keys and probabilities affect the construction of an OBST?
The choice of keys and their probabilities determines the placement of keys in the tree, optimizing search performance by placing frequently accessed keys closer to the root.
Are there variations of OBSTs for non-uniform probabilities?
Yes, Weighted Optimal Binary Search Trees are designed to handle non-uniform probabilities by incorporating weights that reflect access frequencies into the construction process.
What is the difference between optimal binary search tree and AVL tree?
While both aim for efficient searching, AVL trees guarantee a balanced structure for all data, while optimal binary search trees focus on minimizing search cost based on predefined access frequencies, which may not be possible in real-world scenarios.
What is the difference between BST and OBST?
BST (Binary Search Tree) maintains order but lacks balance, while OBST (Optimal Binary Search Tree) balances the tree to reduce search time, albeit with increased complexity.
Conclusion
In summary, Optimal Binary Search Trees are versatile structures with real-world applications. Careful key and probability selection and variations like weighted OBSTs enable efficient search optimization, making them essential in various fields.