Last Updated: 5 Apr, 2021

Data Stream As Disjoint Intervals

Moderate
Asked in company
eBay

Problem statement

You are given a stream of 'n' non-negative integers as the input, and you have to group the stream of integers in the form of disjoint intervals.


Your task is to Implement the ‘DisjointIntervals’ class having the two functions:


1) The first function is ‘addInteger(int val)’, which takes an integer ‘val’ as an argument and adds it to the stream.

2) The second function is ‘getDisjointIntervals()’, which returns a summary of the integers in the stream currently as a list of disjoint intervals.


Example:

Input: 'n' = 5 , stream =  [
                  [1, 1],
                  [1, 3],
                  [2],
                  [1, 2],
                  [2],
                ]

Output: [
          [ [1, 1],  [3, 3] ],
          [ [1,3] ] 
        ]

Explanation: First of all, 1 is added to the stream, and the disjoint interval will be {1, 1}. When 3 will be added to the stream, then the disjoint intervals will be {1, 1}, {3, 3}. But when 2 is added to the stream then the disjoint interval will be {1, 3} as 2 lies between these two sets of disjoint intervals, and both the intervals {1, 1} and {3, 3} merge.
Input format:
The first input line contains an integer ‘n’, representing the total number of queries.

Then the next ‘n’ lines contain ‘n’ queries. A query can be of two types:
1 val → adds the integer ‘val’ to the stream.
2 → returns a list of disjoint intervals.
Output format:
For each test case, print all the disjoint intervals for each query of type 2, and output the answer to the query in a single line.

Each query must be answered in a separate line.
Note:
You do not need to print anything. It has already been taken care of. Just implement the given functions.

Approaches

01 Approach

In this approach, for any new number ‘Val’, we will find the position of the first interval whose starting value is greater than the added ‘Val’ to the stream using linear search.

  • It has three cases:
    • It will be inserted at the beginning of the list or it gets combined with the first interval of the list.
    • It will be inserted at the end of the list or combine with the last interval in the list.
    • It will be inserted between two intervals and then we will try to merge it with the previous and next interval to it.

Algorithm:

 

  • Create a 2D array ‘Intervals’ to store the intervals.
  • Whenever the function ‘addInteger’ is called with an integer ‘Val’ we will first check if the array is empty initially then insert {Val, Val} into the list.
  • Create a variable ‘optimalIndex’ initialized to the size of the array ‘Intervals’. Now do a linear search on the array and find the first position where the start of the interval is greater than the ‘Val’ and save this position in the ‘optimalIndex’ variable. Also while doing a linear search if we find any interval which already contains the ‘Val’ we will return from the function from here only.
  • Now if the value of ‘optimalIndex’ = size of the array ‘Interval’ then:
    • If it can be merged with the last interval i.e Val <= (End of last interval + 1) then make ‘end of the last interval’ = ‘Val’.
    • Otherwise insert {Val, Val} at the last of the array.
    • And return from the function from here only.
  • Now if the value of ‘optimalIndex’ = 0 then:
    • If it can be merged with the first interval i.e (Val + 1) >= Start of the first interval then make ‘start of the first interval’ = ‘Val’.
    • Otherwise insert {Val, Val} at the beginning of the array.
    • And return from the function from here only.
  • Now insert {Val, Val} at the ‘optimalIndex’ value in the array.
  • Now check it can be merged with the interval at (optimalIndex + 1)th position:
    • If ‘start of the interval at (optimalIndex+1)th position’ - ‘end of the inserted interval’ = 1 then update ‘end of the inserted interval’ = ‘end of the interval at (optimalIndex+1)th position’ and erase the interval at the (optimalIndex+1)th position.
  • Now check it can be merged with the interval at (optimalIndex - 1)th position:
    • If ‘start of the interval at the optimalIndex position’ - ‘end of the (optimalIndex-1)th interval’ = 1 then update ‘end of the (optimalIndex-1)th interval’ = ‘end of the interval at optimalIndex position’ and erase the interval at the optimalIndex position.
  • And whenever ‘getDisjointIntervals’ functions is called simply return the array ‘Intervals’.

02 Approach

A disjoint-set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:

  • Find: Determine which subset a particular element is in. This can be used for determining if two elements are in the same subset.
  • Union: Join two subsets into a single subset.
  • The idea is that whenever an integer ‘Val’ has to be added, it will be considered as a separate node of DSU. So if (Val + 1) node is present in DSU then we will unite {Val, Val + 1}. If (Val - 1) node is present in DSU then we will unite {Val, Val - 1}.
  • You can also refer to this doc https://cp-algorithms.com/data_structures/disjoint_set_union.html for more information.

  

Algorithm:

 

  • First, create a class ‘DisjointSetUnion’ which is used to construct a node for each ‘Val’ whenever the ‘addInteger’ function will be called. Each Node of DSU will have the following parameters:
    • ‘Start = Val’, ‘End = Val’, ‘Visited = 0’, ‘Parent = Val’
  • Now create an array of DisjointSetUnion Class' objects, ‘Intervals’ of size 10001 as the maximum value of ‘Val’ can be 10^4.
  • Now, whenever the ‘addInteger’ function is called do the following:
    • If ‘Val’ is already visited then return from the function here itself.
    • If (Val - 1) is a node in DisjointSetUnion then perform union operation on (Val, Val - 1).
    • If (Val + 1) is a node in DisjointSetUnion then perform union operation on (Val, Val + 1).
    • Then mark the ‘Val’ as visited.
  • Now if the ‘getDisjointIntervals’ function is called, create a 2d array ‘Result’ and then iterate over an array DSU and insert the ‘Start’ and ‘End’ of each of the components in the array ‘Result’.
  • Finally, return the Result.