Table of contents
1.
Introduction
1.1.
Problem Statement
1.2.
Sample Examples
2.
Brute-Force Approach
3.
Better Approach(AVL Tree)
3.1.
Steps of algorithm
3.2.
Implementation in C++
3.2.1.
Complexity Analysis
4.
Optimal Approach(Hashmap)
4.1.
Steps of algorithm
4.2.
Implementation in C++
4.2.1.
Complexity Analysis
5.
Frequently asked questions
6.
Key takeaways
Last Updated: Mar 27, 2024

Sort the big sized array where there are many repetitions of elements

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Problem Statement

The problem states that we are given a big-sized array in which there are many repetitions of elements, i.e, one key is occurring many times.  We need to sort the array efficiently.   

Sample Examples

Example 1: 

Input: Arr[] = {2,2,3,6,3,3,3,3,3,6,6,6,6,2,2,2,2}
Output: {2,2,2,2,2,2,3,3,3,3,3,3,6,6,6,6,6}

Explanation: We have simply sorted the array in ascending order. 

 

Example 2:

Input: Arr[] = {7,5,7,5,7,5,7,5,7,5,7,5,7,5,7,5,7,5,7,5,7}
Output: {5,5,5,5,5,5,5,5,5,5,5,7,7,7,7,7,7,7,7,7,7,7,7}

Explanation: We have simply sorted the array in ascending order. 

Brute-Force Approach

The idea is to use any sorting algorithm like quicksort, mergesort, heapSort. In the average case, any sorting algorithm takes O(NlogN). But in this case, we have not considered the fact that there are many repetitions of elements. It is a general approach that can be used to sort any array. 

Better Approach(AVL Tree)

To sort the array with repetitions of elements, The idea is to create a self-balancing binary search tree like red-black trees or AVL trees with one extra field in the structure, i.e., count. 

Now Structure will look like this

Class AvlTree{
  Public: 
int count, key;
AvlTree *left, *right;
}; 
You can also try this code with Online C++ Compiler
Run Code

 

We will traverse through the array and increase the count in Avl Tree if the key already exists; otherwise, we will add this key with count 1. Finally, we will travel this tree in an inorder manner and put the key count times into the arr[]. This will give us arr[] in a sorted way. 

 

Let us recap what AVL Tree is:

AVL is a self-balancing binary search tree, and self-balancing means that difference between the heights of the left and right subtree will not be greater than 1. We are using AVL Tree because, in AVL Tree, every operation like insert and search takes only log(N) time. To study more about this, you can refer to this link.  

Steps of algorithm

  1. Create an empty AVL Tree with count as an extra field in the structure. 
  2. Traverse the input arr[] and check for each arr[i], if it is already present in the AVL Tree or not. If it is already present in the AVL Tree, then increase the count of this key; otherwise, if it is not present, insert this key with count = 1. 
  3. Finally, do the inorder Traversal of the binary search tree, and put each key count number of times in the arr[]. 

Implementation in C++

// C++ function to sort the big array with repetitions of elements 
#include<bits/stdc++.h>
using namespace std;
// structure of the AVL Tree
class AvlTree{
    public:
        int count, key, height;
        AvlTree *left, *right;
        // constructor for Avl Tree
        AvlTree(int key){
            this->key = key;
            this->count = 1;
            this->height = 1;
            this->left = NULL;
            this->right = NULL;
        }
};

// A utility function to get the height of the tree
int height(AvlTree *root){
    if(root==NULL){
        return 0;
    }
    return root->height;
}

// A utility function to get the balance factor
int getBalanceFactor(AvlTree *root){
    if(root == NULL){
        return 0;
    }
    return height(root->left) - height(root->right);
}

// Inorder Traversal of the AVL Tree
void inorder(AvlTree *root, vector<int>&arr, int *index){
    if(root!=NULL){
        // recur for left child
        inorder(root->left, arr, index);
        
        for(int i=0;i<root->count;i++){
            arr[*index] = root->key;
            (*index)++;
        }
        
        // recur for right child
        inorder(root->right, arr, index);
    }
    return;
}

AvlTree *rightRotate(AvlTree *root){
    AvlTree *Left = root->left;
    AvlTree *Right = root->right;

    // Perform rotation
    Left->right = root;
    root->left = Right;

    // Update Heights
    root->height = max(height(root->left), height(root->right))+1;
    Left->height = max(height(Left->left), height(Left->right))+1;

    // return root
    return Left;
}

AvlTree* leftRotate(AvlTree *root){
    AvlTree *Right = root->right;
    AvlTree *Left = root->left;

    // perform rotation
    Right->left = root;
    root->right = Left;

    // Updating the height
    root->height = max(height(root->left), height(root->right))+1;
    Right->height = max(height(Right->left), height(Right->right))+1;

    // Return Root
    return Right;


}
// Utility function that will insert the key
// and keep the count of repetitions of elements
AvlTree* InsertIntoAvl(AvlTree *root, int key){
    if(root == NULL){
        AvlTree* newNode = new AvlTree(key);
        newNode->left = newNode->right = NULL;
        return newNode;
    }
    
    if(key == root->key){
        (root->count)++;
        return root;
    }
    
    if(key < root->key){
        root->left = InsertIntoAvl(root->left, key);
    } else{
        root->right = InsertIntoAvl(root->right, key);
    }

    // Now update the height of this current Node
    root->height = max(height(root->left), height(root->right))+1;

    // get balance factor of this node
    int balance = getBalanceFactor(root);

    // If node is unbalanced, then there are 4 cases

    // left right case
    if(balance>1 && key>root->left->key){
        root->left = leftRotate(root->left);
        return rightRotate(root);
    }
    
    // right left case
    if(balance < -1 && key<root->right->key){
        root->right = rightRotate(root->right);
        return leftRotate(root);
    }
    
    // left left case
    if(balance>1 && key<root->left->key){
        return rightRotate(root);
    }
    
    // right right case
    if(balance<-1 && key>root->right->key){
        return leftRotate(root);
    }
    
    // return the root pointer
    return root;
}

// utility function that will convert the array into sorted one
void SortedArr(vector<int>&arr){
    // Initializing the root of the AVL Tree
    AvlTree *root = NULL;
    
    // Inserting each arr[] element into the Avl Tree
    // if it already present then we will just increase the count
    for(int i=0;i<arr.size();i++){
        root = InsertIntoAvl(root, arr[i]);
    }
    
    // initialing the index variable
    // to put sorted elements back into arr[]
    int index = 0;
    
    // do inorder Traversal to put back elements
    // in the sorted order
    inorder(root,arr, &index);
}

int main(){
    // sample array with many repetitions of elements
    vector<int>arr = {7,5,7,5,2,3,5,4,6,7,5,7,5,7,5,7};
    cout << "Input Array :";
    
    for(int i=0;i<arr.size();i++){
        cout << arr[i] << " ";
    }
    cout << endl;
    
    SortedArr(arr);
    cout << "Sorted Array :";
    for(int i=0;i<arr.size();i++){
        cout << arr[i] << " ";
    }
}
You can also try this code with Online C++ Compiler
Run Code

 

Output

Input Array :7 5 7 5 2 3 5 4 6 7 5 7 5 7 5 7
Sorted Array :2 3 4 5 5 5 5 5 5 6 7 7 7 7 7 7 

 

Complexity Analysis

Time Complexity: O(NLogM)

Explanation: Searching and inserting into AVL Tree takes O(NlogM) where M is the number of distinct elements and inorder traversal to sort the array take O(N) time. Hence,an total time Complexity is O(NlogM + N) ~ O(NlogM). 

Space ComplexityO(M)

Explanation: M distinct elements are stored in the AVL Tree. 

Optimal Approach(Hashmap)

To sort the array with repetitions of elements, The idea is simply to store the key and their frequency in the hashmap, traverse the sorted hashmap and put the key count times in the arr[]. 

Steps of algorithm

  1. Create an empty hash table 
  2. Input value will be stored in the form of key count pairs. 
  3. Consider the keys of hashtable and sort them. 
  4. Traverse the sorted hashmap and put the count times key in arr[]. 
  5. Finally, we will get the sorted arr[]. 

Implementation in C++

// C++ function to sort the big array with repetitions of elements 
#include<bits/stdc++.h>
using namespace std;
void SortedArr(vector<int>&arr){
    map<int,int>mp;
    
    // inserting the values in the map
    // and increasing the count
    // if already exists
    for(int i=0;i<arr.size();i++){\
        mp[arr[i]]++;
    }
    
    // initializing index to put value in arr[]
    // in sorted order
    int index = 0;
    for(auto it=mp.begin();it!=mp.end();it++){
        // putting key count number of times
        // in arr[]
        for(int i=0;i<it->second;i++){
            arr[index++] = it->first;
        }
    }
    return;
}
int main(){
    vector<int>arr = {7,5,7,5,2,3,5,4,6,7,5,7,5,7,5,7};
    
    cout << "Input Array :";
    for(int i=0;i<arr.size();i++){
        cout << arr[i] << " ";
    }
    cout << endl;
    
    SortedArr(arr);
    cout << "Sorted Array :";
    for(int i=0;i<arr.size();i++){
        cout << arr[i] << " ";
    }

}
You can also try this code with Online C++ Compiler
Run Code

 

Output

Input Array :7 5 7 5 2 3 5 4 6 7 5 7 5 7 5 7
Sorted Array :2 3 4 5 5 5 5 5 5 6 7 7 7 7 7 7 

 

Complexity Analysis

Time Complexity: O(N + MlogM)

Explanation: Assuming the Searching and inserting into hashmap takes O(1), it takes a total of O(N) time to get keys inserted into the hashmaps, and then we will sort hash map based on a key, it takes O(MlogM), so it takes a total of O(N + MlogM). 

Space ComplexityO(M)

Explanation: M distinct elements are stored in the hashmap. 

Frequently asked questions

Q1. Why is AVL Tree better than the Binary Search Tree? 

Ans: In the AVL tree, all the operations such as insertion, deletion, and searching take O(logN) time, while in the case of the Binary Search Tree, in the worst case, these operations can take up to O(N). 

 

Q2. What is the difference between a binary tree and a binary search tree?

Ans: A tree that has a maximum of 2 children is called a binary tree, whereas a binary search tree is a particular binary tree having the following properties. 

Keys in the left subtree are always smaller than the root’s node, whereas keys in the right are greater than the root’s node. 

 

Q3. How to get the sorted output from the binary search tree? 

Ans: Traversing the binary search tree in the inorder traversal gives us the sorted output.

Key takeaways

In this article, we have discussed the problem of sorting the big array with many repetitions of elements. We discussed the Multiple approaches with their complexities. We hope you understand the problem and solution properly. Now you can do more similar questions. 

Recommended Problem - Merge K Sorted Arrays

If you are a beginner, interested in coding, and want to learn DSA, you can look for our guided path for DSA, which is free! 

Thank you for reading. 

Until then, Keep Learning and Keep Coding.

Live masterclass