Problem of the day
Design a data structure that stores a mapping of a key to a given value and supports the following operations in constant time.
1. INSERT(key, value): Inserts an integer value to the data structure against a string type key if not already present. If already present, it updates the value of the key with the new one. This function will not return anything.
2. DELETE(key): Removes the key from the data structure if present. It doesn't return anything.
3. SEARCH(key): It searches for the key in the data structure. In case it is present, return true. Otherwise, return false.
4. GET(key): It returns the integer value stored against the given key. If the key is not present, return -1.
5. GET_SIZE(): It returns an integer value denoting the size of the data structure.
6. IS_EMPTY(): It returns a boolean value, denoting whether the data structure is empty or not.
Note :
1. Key is always a string value.
2. Value can never be -1.
Operations Performed :
First(Denoted by integer value 1): Insertion to the Data Structure. It is done in a pair of (key, value).
Second(Denoted by integer value 2): Deletion of a key from the Data Structure.
Third(Denoted by integer value 3): Search a given key in the Data Structure.
Fourth(Denoted by integer value 4): Retrieve the value for a given key from the Data Structure.
Fifth(Denoted by integer value 5): Retrieve the size of the Data Structure.
Sixth(Denoted by integer value 6): Retrieve whether the Data Structure is empty or not.
The first line contains an integer 'N' which denotes the number of operations to be performed. Then the operations follow.
Every 'N' lines represent an operation that needs to be performed.
For the 'INSERT' operation, the input line will have three input values separated by a single space, representing the type of the operation in integer, the key inserted as a string, and the value against the key as an integer respectively.
For the rest of the operations except fifth and sixth, the input line will have two values separated by a single space, representing the type of the operation in integer and the key to be inserted as a string respectively.
For the last two operations(fifth and sixth), the input will contain a single integer, denoting only the type of operation in the integer.
Output Format :
For type 1 operation, you do not need to return anything.
For type 2 operation, remove the element from the data structure and don't return anything.
For type 3 operation, return true if the key is present in the data structure. Else, return false.
For type 4 operation, return the value stored against the key. If the key is not present, return -1.
For type 5 operation, return the size.
For type 6 operation, return the boolean denoting whether the data structure is empty or not.
1 <= N <= 10 ^ 5
1 <= T <= 3
1 <= V <= 10 ^ 5
Where 'T' is the type of operation and 'V' is the value of the operand.
Time Limit: 3 sec
3
1 qwerty 35
1 qwerty 50
5
1
1 operation: We need to insert 'qwerty' with a value of 35.
2 operation: We need to insert 'qwerty' with a value of 50.
3 operation: We need to return the size of HashMap. So, We will return 1.
6
1 code 9
3 code
2 code
3 code
5
6
true
false
0
true
Can you think of a data structure that does this already? How is it implemented?
Approach:
HashMap is the data structure used to store key-value pairs, where the average retrieval time for get() and insert() operations is constant i.e. O(1).
Hashmap uses the array of Singly Linked List internally.
The array is called the buckets and at every bucket(i-th index of the array) holds a pointer to head of the Singly Linked List.
Every Linked List Node has fields called the ‘KEY’, ‘VALUE’ and certainly the pointer to the next node.
For every entry of a key, we try to generate a ‘unique value’ pointing to the bucket where it can be stored. It can also be said that we try to find an index against every key that needs to be inserted or updated in the array/buckets.
If we know which bucket to look at, then we can make the operations, ‘INSERT’ , and ‘GET’ in the constant time since fetching a value or updating a value at a known index in an array is a constant time operation.
Note:
There might be a scenario where the unique value we want to make against each key has a collision. That means, two or more unique keys may have the same unique value/index that we are trying to calculate. We address this later in the approach.
So the basic intuition behind the working would be to calculate the index in the array/bucket where the Node can be placed and it is placed there. Now while getting the element from the HashMap, it again calculates the index of the element to retrieve and goes to the array index/bucket and returns the value of the element/Node(if exists).
So this is the basic functionality of the Hashmap.
But it is not as simple, because there as mentioned earlier, there can be cases of collision as well. The collision is when two or more elements have the same index or needs to be put in the same bucket. Is it possible?!
Let’s try to insert a key with the name ‘FIRST_KEY_VALUE’ and suppose the index calculated for it is 2. This means we are required to insert it in the second bucket. Similarly, we encounter another key, say, ‘SECOND_KEY_VALUE’ which again needs to be inserted in the second bucket. Such a scenario is said to be a collision.
To make hashmap work even in cases of collision we keep a singly linked list in every bucket. The two keys mentioned above will be put in the second bucket where every new key is inserted at the head of the list. Hence, in the above example, we can visualize the list at second bucket as,
head - ‘SECOND_KEY_VALUE’ -> ‘FIRST_KEY_VALUE’
So while updating the key, we first calculate the index, then iterate over the LinkedList at that index to get the item.
To calculate the index against every key that needs to be inserted we use hashing algorithms. The values thus computed are called Hash Values. This hash value serves as the location or index of the bucket.
Functions to be implemented:
1. INSERT(VALUE) :
As discussed, we will first compute the hash value or location in the buckets where the key needs to be inserted using any of the hashing algorithms available. In case the location doesn’t have a list already, we make it as the head or if the location already contains a list(case of collision) we insert the new key at the head.
To make sure that every insert operation happens in constant time, we keep a threshold of 0.7(less than 1) over the load factor. If the load factor exceeds the threshold value, we run a rehash function.
loadFactor = (size * 1.0) / buckets.size()
The load factor is defined to be the total load that is being put on the buckets. We try to keep it less than or equal to 0.7 to maintain the operations performed are constant.
RE_HASH(): This function basically increases the total bucket size by either a factor of multiple of 2 or by the power of 2 and every KEY present is inserted into the updated buckets(you may read more about it on the internet).
2. DELETE(VALUE):
It works in a similar way we discussed above for the insert function.
3. SEARCH(KEY):
Generate the hash value against the key to be searched and get the bucket corresponding to the key by taking modulo with the total bucket size.
O(1).
All the functions take constant time. Hence, the overall Time Complexity is O(1).
O(N), where ‘N’ is the number of elements in the HashMap.
The Space Complexity O(N) is used to store all the elements in the HashMap. Hence, the overall Space Complexity is O(N).