Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Problem
1.1.
Example-
2.
Approach
3.
Algorithm
4.
C++ Code
4.1.
Output:
5.
Algorithm Complexity
5.1.
Time Complexity
5.2.
Space Complexity
6.
FAQs
7.
Key Takeaways
Last Updated: Mar 27, 2024

Queries for Number of Distinct Elements in the Subarray

Problem

Given an array ‘arr’ consisting ‘N’ integers (arr[i] <= ‘N’) and another array ‘queries’ consisting of ‘Q’ queries. Each query is of the following type:

  • l, r - Find the number of distinct elements in the subarray [l, r] for each query.

Example-

Input:

N = 5, arr = [ 1, 2, 1, 3, 2 ]

Q = 3, queries = [ {1, 3}, {4, 5}, {2, 5} ]

Output:

2 2 3

Explanation:

  1. For the first query, ‘l’ = 1 and ‘r’ = 3. Therefore the required subarray is [ 1, 2, 1 ]. The number of distinct elements in the subarray is 2.
  2. The required subarray is [ 3, 2 ] for the second query. The number of distinct elements in the subarray is 2.
  3. For the third query, the required subarray is [2, 1, 3, 2]. The number of distinct elements in the subarray is 3.

Also read, Euclid GCD Algorithm

Approach

Let us assume that the queries are sorted in ascending order of ‘r’. Since we have to count every element only once, we will consider its rightmost occurrence. We will iterate over the given array and keep updating the rightmost occurrence of the current element. Say there is some query whose ‘r’ is equal to the current index of the array. Now, if we have to find the number of distinct elements in the subarray from ‘l’ to ‘r’, we have to consider the rightmost occurrence of the number. If the rightmost occurrence lies in the range ‘l’ to ‘r’, this number will be counted as a part of our final answer, otherwise not.

But how to count the number of distinct elements in the subarray in range ‘l’ to ‘r’. We can use Binary Indexed Tree for this purpose. BIT[i] = 1 will mean that the rightmost occurrence of the element arr[i] was at index ‘i’ till now. Let query(‘i’) denote the function that returns the sum of BIT from 1 till ‘i’, so our final answer will be query(‘r’) - query(‘l’-1).

When we encounter an element that has been encountered before, say at index ‘j’, we will have to update BIT[j] = 0. To find this index ‘j’, we can maintain an array ‘prev’ that will store the index where the current element was previously seen. ‘Prev[i]’ will be -1 initially.

Algorithm

  1. Initialize an array ‘prev’ of size 10^6 with -1 and ‘BIT’ of size ‘N’+1 with 0.
  2. Sort the queries in increasing order of their right end ‘r’.
  3. Traverse the array and check if ‘prev[arr[i]]’ is -1 or not.
  4. If it is -1, update the value of BIT at index ‘i’ with 1.
  5. Else, update the value of BIT at index ‘prev[arr[i]]’ with 0 and at index ‘i’ with 1.
  6. Answer the queries whose right end ‘r’ equals, by performing two queries on BIT - query(‘r’) - query(‘l’-1).

C++ Code

#include<bits/stdc++.h>
using namespace std;
 
const int MAX = 10001;
 
// Comparator to sort queries
bool cmp(vector<int> x, vector<int> y){
    return x[1] < y[1];
}
 
// Update function of BIT
void update(int i, int val, int bit[], int n){
    for (; i <= n; i += i&-i){
        bit[i] += val;
    }
}
 
// Query function of BIT
int query(int i, int bit[], int n){
    int sum = 0;
    for (; i>0; i-=i&-i)
        sum += bit[i];
    return sum;
}
 
void solve(int arr[], int n, vector<int> queries[], int q)
{
    sort(queries, queries+q, cmp);

    // Initialising BIT
    int bit[n+1];
    memset(bit, 0, sizeof(bit));
 
    // Initializing prev
    int prev[MAX];
    memset(prev, -1, sizeof(prev));
 
    // Array to store answer for all queries
    int ans[q];
    memset(ans, -1, sizeof(ans));
    int qind = 0;

    // Iterating over input array
    for (int i=0; i<n; i++){

        // If prev is not -1, updating BIT
        if (prev[arr[i]] !=-1){
            update (prev[arr[i]] + 1, -1, bit, n);
        }

        // Setting prev[arr[i]] as i, updating BIT
        prev[arr[i]] = i;
        update(i + 1, 1, bit, n);
 
        // Checking if r of any query is equal to i
        while (qind < q && queries[qind][1] == i+1){
            int l = queries[qind][0], r = queries[qind][1];
            int idx = queries[qind][2];

            // Calculating the number of distinct elements in the subarray
            ans[idx] = query(r, bit, n)- query(l-1, bit, n);
            qind++;
        }
    }
 
    // Printing the number of distinct elements in the subarray
    for (int i=0; i<q; i++){
        cout << ans[i] << ' ';
    }
    cout<<endl;
}
 
// Driver code
int main()
{
    int n = 5;
    int a[] = {1, 2, 1, 3, 2};
    int q = 3;

    // queries of the form - {l, r, idx}
    vector<int>queries[] = { {1, 3, 0}, {4, 5, 1}, {2, 5, 2} };
   
    solve(a, n, queries, q);
    return 0;
}

Output:

2 2 3 

Algorithm Complexity

Time Complexity

We have firstly sorted the queries which will take O(QlogQ) time.

Then, for every element, we are performing an update on BIT which will take O(NlogN) time. Also, for each query element, we call the query() function of BIT two times to find the number of distinct elements in the subarray. This will take O(QlogN) time.

Therefore, total time complexity will be O(QlogQ + (Q + N)logN)

Space Complexity

We are creating a BIT array of size N+1. Also, we are creating a ‘prev’ array to store the previous index of an element. This will also have N+1 size as arr[i]<=N.

Therefore, the total space complexity of the above approach is O(N).

Check out this problem - Subarray With 0 Sum

FAQs

  1. What is a Fenwick Tree?
    A Binary Indexed Tree or Fenwick tree is a data structure that can efficiently update elements and calculate prefix sums of a list of numbers. It allows both the operations to be performed in time complexity of O(logN).
     
  2. What is an offline query?
    An offline query means that we can first store all the queries and then solve them in any order to calculate the answer more efficiently. This can be used only when the answer for one query is independent of the other.
     
  3. Why do we decide to use offline queries in this problem?
    In this problem, we had to find the number of distinct elements in the subarray between L and R for multiple queries. Since we have to count distinct elements, we have to consider each element only once. Now suppose for a single query ‘l’ and ‘r’, we need to consider only the elements at index smaller than ‘r’. Also, for each such element, we just have to check if its rightmost occurrence is at an index greater than ‘l’ or not. Therefore, we decided to use the offline queries approach.

Key Takeaways

This article discussed finding the number of distinct elements in the subarray from ‘l’ to ‘r’ for multiple queries. We also discussed the time and space complexity of our solution. If you want to solve more such problems, you can visit Coding Ninjas Studio.

If you think that this blog helped you share it with your friends!. To be more confident in data structures and algorithms, try out our DS and Algo Course.

Until then, all the best for your future endeavors, and Keep Coding.

Live masterclass