Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Approach(Naive)
2.1.
PseudoCode
2.2.
CODE IN C++
2.3.
Complexity Analysis
3.
Approach(Efficient)
3.1.
Intuition behind Sorting
3.2.
CODE IN C++(Efficient)
3.3.
Complexity Analysis
4.
Frequently Asked Questions
5.
Key Takeaways
Last Updated: Mar 27, 2024

Non-Overlapping Intervals

Author aniket verma
0 upvote

Introduction

In this blog, we will discuss how to find the number of overlapping intervals to be removed to maximize the Non-Overlapping Intervals. It is an important problem to get a strong grip over greedy algorithms and has also been asked in many interviews.

A common prerequisite is having a basic knowledge of sorting, dp, and how comparators are written. Without knowing this, it is recommended to learn sorting, dp, and comparators. Nevertheless, we’ll try to cover each point in-depth that is required to solve the problem, Non-Overlapping Intervals.


What do we mean by the Overlapping Intervals?


Well, it’s a trivial question, but still, for the sake of clarity, we’ll define that let

Two intervals be [a, b], and  [c, d] where a<=b and c <= d, are said to be overlapping iff their intersection is not empty. In the mathematical form, we can write as:

Two intervals [a, b] and [c, d] are said to be overlapping if,

                      → a <= c <= b <= d  

      Or, 

→  [a, b] ⋂ [c, d] ≠ 𝟇

Let’s look at the problem statement:

You are given a set of intervals [ai, bi ] of length N, where i 𝞊 { 1, 2, ….. N }. Return the minimum number of intervals you need to remove to make the rest of the intervals non-overlapping.

Let’s understand the problem using an example.

Input:   N and a  set of intervals of length N, 

    Intervals = {[1,2], [2,3], [3,4], [1,3]}

Output:  1

Explanation: The minimum number of intervals that can be removed is 1 i.e, [1, 3].

Approach(Naive)

First, let’s look at the naive approach that would probably strike anybody’s mind before jumping to an optimal approach.

So, the basic approach considers all subsets of the given set of intervals and then finds the maximum length of the subset of non-overlapping intervals using the definition mentioned earlier.

Hence the minimum number of intervals removed = total number of intervals - the maximum size of the subset of non-overlapping intervals 

So the naive/basic approach could be formulated as:

  1. For each subset in the given set of intervals, check if the subset contains only non-overlapping intervals.
  2. If the subset contains only non-overlapping intervals then maximize the size.
  3. Finally, return the answer as the total number of intervals - the maximum size of the subset of non-overlapping intervals.

Now let’s look at the PseudoCode.

PseudoCode

Algorithm

___________________________________________________________________

procedure removeMinimumOverlappingIntervals(intervals):

___________________________________________________________________

 1.    subsets ← computeSubsets(intervals) #compute all subsets of intervals

2.    maxLength ← 0                       # initially set to 0 in case the intervals size is 0

3.    for subset in subsets do      

4.    if containsNonOverlappingIntervals(subset) == True do      

5.     maxLength ← max(maxLength, subset.size())

6.     end if

7.    return intervals.size()  - maxLength 

8.    end procedure

___________________________________________________________________

CODE IN C++

//C++ program to find the minimum number of overlapping intervals to be removed
#include <bits/stdc++.h>
using namespace std;
vector<vector<vector<int>>> computeSubsets(vector<vector<int>> & intervals, int N){
    vector<vector<vector<int>>> subsets;
    // we know that subsets can be represented using an integer
    // and there are at most 1<<N subsets
    // hence iterating over all integers from 0 to 1<<N
    for(int i=0; i<(1<<N); ++i){
        vector<vector<int>> v; // current subset
        int j = 0; // variable to check if the bit at a position is set in i
        while((1<<j)<=i){    
            if(((1<<j)&i)>0){ // if bit at current position is set in both i and j
                v.push_back(intervals[j]); // push it in the current subset
            }
            j++;      // shift to the next bit
        }
        if(v.size()>0)      // if the subset is not empty push it
            subsets.push_back(v); // in the subsets vector
    }
    // return the subsets
    return subsets;
}
// boolean function that finds whether the set 
// of intervals contain the set of non-overlapping intervals 
bool containsNonOverlappingIntervals(vector<vector<int>> subset){
    sort(subset.begin(), subset.end());
    for(int i=0;i<subset.size()-1;++i){
        // if intersection is not null
        if(subset[i][1]>=subset[i+1][0] or subset[i][1]>=subset[i+1][0]){
            return false; // this subset doesn't contain non-overlapping intervals
        }
    }
    return true;
}
// finds the minimum number of overlapping intervals to be removed
int removeMinimumOverlappingIntervals(vector<vector<int>> & intervals, int N){
    int maxLength = 0; // maximum length of the non-overlapping intervals set    
    // subsets
    vector<vector<vector<int>>> subsets = computeSubsets(intervals, N);
    // iterate over all subsets 
    for(vector<vector<int>> subset : subsets){
        // if it contains non-overlapping Intervals
        if(containsNonOverlappingIntervals(subset)){
            //maximise the length
            maxLength = max(maxLength, (int)subset.size());
        }
    }
    // return maxi_length
    return N - maxLength;
}
int main() {
    // number of intervals
    int N = 4;
    vector<vector<int>> v{{1, 2}, {2, 3}, {3, 4}, {1, 3}};
    cout<<"The minimum number of intervals to be removed: ";
    cout<<removeMinimumOverlappingIntervals(v, N);
    return 0;
}

Output

The minimum number of intervals to be removed: 1

Complexity Analysis

Time Complexity: O((N*logN) *2N)

This is because we are computing all the subsets, and for each subset, we are checking if it contains all non-overlapping intervals. So to compute the number of subsets, it takes O(2N) time. The maximum length can be N. Sorting the N length subset takes O(NlogN) time. Hence in total, it takes O((NlogN) *2N) time.

Space complexity: O(2N) at the worst case as we are computing all the subsets.

The above algorithm has some loopholes that we need to tackle.

The foremost problem is the exponential time complexity the algorithm is taking, and then, the space complexity is also worse, i.e., exponential.

This gives us the motivation to improve our algorithm.

So let’s think about an efficient approach to solve the problem.

Approach(Efficient)

If we want to build up an efficient solution, we will need to find out if there is any redundancy or any repetition that we are doing.

So the redundancy in our above approach is that we are finding all possible subsets and then checking each subset if it’s a disjoint interval subset or not and updating our answer. So we aim to avoid computing all subsets. Hence, we need to make some observations using the definition of overlapping intervals or from the repetitive checks in the previous algorithm and infer some important points to simplify our approach. 

The foremost intuitive idea is to sort the interval set by the endpoints(Why?). The first reason is that sorting would help us reduce the number of comparisons to check non-overlapping intervals in the given interval set. But the most important reason is that it will help us maximize the number of non-overlapping intervals by linearly traversing over the sorted interval set.
 

Intuition behind Sorting

The intuition behind sorting is that when we sort the intervals by the endpoints, we can separate out overlapping intervals from the intervals set. Say for example, [[1, 3], [2, 4], [4, 5]] is the sorted interval set. Then we can easily separate out the interval [2, 4] using the definition of the overlapping intervals. We don’t need to check it individually with each interval. We also maximize the number of non-overlapping intervals. It will maximize because if we find 2 intervals overlapping we can ignore the interval with a higher endpoint as it is very likely that the next interval with a higher endpoint is non-overlapping.

Let’s look at an informal proof of the greedy algorithm.

 

Claim: We claim that from N intervals choosing the intervals with early finishing times and removing conflicting intervals from the set results in the optimal solution, i.e, gives us the maximum number of non-overlapping intervals.

Proof: We will use contradiction to prove our result. 

 Let U = (I1, I2, …. IN) be a set of N intervals, 𝞂 = (I1, I2, …. IL), be the interval set chosen by our algorithm and L be its length. Let 𝞂* = (I1, I2, …. IL*) be the optimal interval set and L* be its length.

Let’s assume that our claim is incorrect. → 𝞂 ≠ 𝞂*.

Assuming 𝞂 differs from 𝞂* in order but L = L*. Since 𝞂 ≠ 𝞂*, even one of the elements in 𝞂 is not same as that in 𝞂*.

But we already claimed that 𝞂 has the maximum number of non-overlapping intervals. Even if one of the elements in 𝞂 is not there in 𝞂*, then they can’t be non-overlapping which is a contradiction.

Now assuming 𝞂 differs from 𝞂* in order and LL*(since L* must be the maximum). Again if L < L* is true then there must be at least one element in 𝞂* ∌ 𝞂. But 𝞂 contains all elements which are non-overlapping and replacing any of them by any interval ϵ U \ 𝞂 would make 𝞂* an interval set containing overlapping intervals which is again a contradiction.

Hence in either case we find a contradiction. Hence, our assumption is incorrect. 

This proves the correctness of our claim.

 

Now we can formulate our approach:

  1. Sort the intervals by the endpoints.
  2. Iterate over each interval, check if it overlaps with the previous interval.
  3. If it overlaps, drop the interval with the higher endpoint.
  4. Finally, return the answer as N - maximum number of non-overlapping intervals.     

Let’s look at the Code.

CODE IN C++(Efficient)

//C++ program to find the minimum number of overlapping intervals to be removed
#include <bits/stdc++.h>
using namespace std;
// comparator function to sort by endpoints
bool comp(vector<int> &a, vector<int> &b) {
    if (a[1] == b[1]) return a[0] < b[0];
    return a[1] < b[1];
}
// finds the minimum number of overlapping intervals to be removed
int removeMinimumOverlappingIntervals(vector<vector<int>> &intervals, int N) {
    int maxLength = 1; // maximum length of the non-overlapping intervals set
    // sort the intervals
    sort(intervals.begin(), intervals.end(), comp);
    // reference vector using which we compare the next intervals
    // if it has to be included in the interval set 
    vector<int> referenceInterval = intervals[0];
    // iterate over all intervals
    for (int i = 1; i < intervals.size(); ++i) {
        // if intersection is not empty
        if (referenceInterval[1] > intervals[i][0]) {
            continue;
                   
        }
        else {
            referenceInterval = intervals[i]; // update the reference Interval      
        }
    }
    //return the answer
    return N - maxLength;
}
int main() {
    // number of intervals
    int N = 4;
    // vector of intervals
    vector<vector<int>> v{{1, 2}, {2, 3}, {3, 4}, {1, 3}};
    // compute the result
    cout << "The minimum number of intervals to be removed: ";
    cout << removeMinimumOverlappingIntervals(v, N);
    return 0;
}

Output

The minimum number of intervals to be removed: 1

Complexity Analysis

Time Complexity: O(N*LogN), Since we are sorting the set of intervals of length N, which takes most of the time, it takes O(N*LogN) time.                       

Space complexity: O(N) at the worst case, as the auxiliary space used by the sorting algorithm is O(N).                                                                              

Note: The problem could be solved using DP also, but it would still take O(NLogN) time because sorting is still required to solve the problem.(It is left as an exercise for you).

Frequently Asked Questions

Q1. Does a greedy algorithm always work?

Answer)  No, a greedy algorithm does not always work. To solve a problem via a greedy algorithm, you need to have intuitive proof in your mind, at least to lead to the correct solution. To show when it doesn’t work, you can think of a counter-example that will help rule out the greedy algorithm.

 

Q2. Why is sorting important?

Answer) Sorting is a very useful technique because we can reduce a significant number of comparisons. With reference to the question discussed in this blog, we moved from an exponential solution to an O(N*LogN) solution. This is sufficient to explain the power of sorting.

 

Q3. What is a  comparator?

Answer) Comparator is an important concept. It is a function object that helps to achieve custom sorting depending on the problem requirements. 

Key Takeaways

This article taught us how to find the Non-Overlapping Intervals by approaching the problem using a naive approach followed by an efficient solution. We discussed an iterative implementation using examples, pseudocode, proof of our solution, and proper code in detail.

We hope you could take away critical techniques like analyzing problems by walking over the execution of the examples and finding out how we can apply sorting to simplify our task and make fewer comparisons, and how you should approach a greedy solution before writing the code. Testing a greedy algorithm as early as possible is a good practice as there is no guarantee that it will always work. Hence, if it does not work, you could rule it out and not follow a similar approach to solve the problem.

Now, we recommend you practice problem sets based on the concepts of Non-Overlapping Intervals to master your fundamentals. You can get a wide range of questions similar to the problem Non-Overlapping Intervals on Coding Ninjas Studio

Live masterclass