Introduction
We will look at the Partitioner in Hadoop. A Partitioner acts as a condition in the processing of an input dataset. The partitioning phase occurs after the Map phase but before the Reduce phase. MapReduce Partitioner allows for uniform distribution of Map output across the reduction. The Map output is partitioned based on the key and value.

MapReduce partitioner is essential for simplifying the process of processing large volumes of data in parallel. Head over to the below article to have an in-depth discussion of MapReduce Partitioner.
What is MapReduce Partitioner?
In MapReduce job execution, the Partitioner controls the partitioning of the keys of the intermediate Map outputs. The partition is derived using the hash function and key (or a subset of the key). The sum of the partitions equals the number of Reduce tasks. Each mapper output is based on the key value and framework partitions. Records with the same key value are placed in the same partition (within each mapper). The partitions are then submitted to a reducer. The partition class determines a (key, value) pair's partition assignment. The partitioning phase in MapReduce data flow occurs after the Map phase and before the reduce phase.
Need of MapReduce Partitioner
An input data set is used to construct a list of key-value pairs during the execution of a MapReduce job. The Map phase produced this key-value pair. When the input data is divided, each task handles the division, and each Map creates a list of key-value pairs. The framework then delivers the Map output to the task Reduce phase. Map outputs are processed by using the user-defined reduce function. Before the reduce phase, the Map output partitioning is based on the key.
According to Partitioning, each key's values are grouped. It ensures that every key's value is assigned to the same reducer. This enables the Map output to be distributed evenly across the reducer. The Partitioner in a MapReduce task routes the mapper output to the reducer by identifying which reducer handles the specific key.
Poor Partitioning in MapReduce
Suppose one key appears more than any other key in the data input to a MapReduce task. In this scenario, we employ two techniques to transfer data to the partition, which are as follows:
-
The key that appears most of the time will be transmitted to one partition.
-
All other keys will be routed to partitions based on their hashCode().
If the hashCode() function does not distribute additional key data over the partition range. The data will not be transmitted to the reducers as a result.
Poor data partitioning means certain reducers will have more data input than others. More labor than other reducers will fall on their shoulders. As a result, the entire project must wait for one reducer to finish its extra-large share of the load.
We can construct a custom Partitioner to overcome MapReduce's poor Partitioner. This enables the spreading of burden among various reducers.
Number of Partitioners
The overall number of Partitioners is determined by the number of reducers specified by JobConf.setNumReduceTasks() method. Thus, the data from a single Partitioner is processed by a single reducer. It's important to note that the framework only builds Partitioners when there are numerous reducers.
Hash Partitioner is the default Partitioner. It calculates the key's hash value. Based on this outcome, it also assigns the partition.