Introduction
MapReduce's architecture is primarily based on master-slave architecture. The processing phases of a task assigned to slave nodes are map and reduce. Intermediate results are generated by the mapping process and fed into the reduction process as input.
Reduce sorts the intermediate results by keys before merging them into a single final output. Between the map and reduce operations, there is a synchronization step.
The synchronization phase is a data communication step between the mapper and reducer nodes to start the reduction process.
The Execution Framework
As we know, MapReduce can run on multiple machines in a network and produce the same results as if all of the work were done on a single machine. It can also pull data from a variety of internal and external sources. MapReduce keeps track of its work by generating a unique key that ensures all processing is related to the same problem.
This key is also used to bring all the output from the distributed tasks together at the end.
When the map and reduce functions are combined, they operate as a single job within the cluster. The MapReduce engine's execution framework handles all of the dividing and conquering, and all of the work is distributed to one or more nodes in the network. To better understand why things work the way they do, you must first understand some characteristics of the execution framework.
This can aid in developing better applications and optimizing their execution for performance and efficiency. MapReduce's foundational behaviors are as follows:
Scheduling
Each MapReduce job is broken down into smaller chunks known as tasks. A map task, for example, might be in charge of processing a specific block of input key-value pairs (known as an input split in Hadoop), while a reduce task might be in charge of a portion of the intermediate keyspace. It's not uncommon for MapReduce jobs to have tens of thousands of individual tasks that must be assigned to cluster nodes.
When working on large jobs, the total number of tasks may exceed the number of tasks that can be run on the cluster at the same time, necessitating the use of a task queue and tracking the progress of running tasks so that waiting tasks can be assigned to nodes as they become available.
Coordination of tasks belonging to different jobs is another aspect of scheduling (e.g., from other users). Speculative execution (also known as "backup tasks") is an optimization implemented by Hadoop and Google's MapReduce implementation. The map phase of a job is only as fast as the slowest map task due to the barrier between the map and reduce tasks. Similarly, the time it takes to complete a job is limited by the time it takes to complete the slowest reduce task.
As a result, stragglers, or tasks that take an unusually long time to complete, can affect the speed of a MapReduce job. Flaky hardware is one source of stragglers: for example, a machine with recoverable errors may become significantly slower. An identical copy of the same task is run on a different machine when speculative execution is used. The framework uses the result of the first task attempt to complete the task.
Zaharia et al. discussed various execution strategies in a recent paper, and Google claims that speculative execution can reduce job running times by 44 percent.
Although both map and reduce tasks in Hadoop can be executed speculatively, the consensus is that the technique is more beneficial for map tasks than for reduce tasks because each copy must pull data over the network. However, another common source of stragglers is skew in the distribution of values associated with intermediate keys, which speculative execution cannot adequately address (reducing stragglers).
Zipfian distributions are common in text processing, which means that the task or tasks responsible for processing the most common few elements will take much longer than the typical task. One solution to this problem is better local aggregation.

Source: Image
Synchronization
Synchronization, in general, refers to the mechanisms that allow multiple concurrently running processes to "join up," for example, to share intermediate results or exchange state information. A barrier between the map and reduce processing phases achieve synchronization in MapReduce.
Intermediate key-value pairs must be grouped by key, which is done using a large distributed sort involving all nodes that perform map tasks and all of the nodes that will perform reduce tasks. Because this necessitates copying intermediate data over the network, the process is commonly referred to as "shuffle and sort."
Because each Mapper may have intermediate output going to each reducer, a MapReduce job with m mappers and r reducers can involve up to m r distinct copy operations.
Because the execution framework cannot guarantee that all values associated with the same key have been gathered unless all mappers have finished emitting key-value pairs and all intermediate key-value pairs have been shuffled and sorted, the reduce computation cannot begin until all mappers have finished casting key-value pairs and all intermediate key-value pairs have been shuffled and sorted.
The aggregation function g in a fold operation is a function of the intermediate value and the next item in the list, meaning values can be generated lazily. Aggregation can start as soon as values are available. On the other hand, the reducer in MapReduce receives all values associated with the same key simultaneously. However, as soon as each Mapper completes, you can begin copying intermediate key-value pairs over the network to the nodes running the reducers—this is a standard optimization that Hadoop implements.

Source: image
Code/data colocation
The term "data distribution" is misleading because one of the main goals of MapReduce is to move code rather than data. However, the more significant point remains: for computation to take place, data must be fed into the code somehow. This problem is inextricably linked to scheduling in MapReduce, and it is heavily reliant on the design of the underlying distributed file system. The scheduler achieves data locality by starting tasks on the node that has a specific block of data (i.e., on its local drive) that the job requires. This results in the code being moved to the data.
If this isn't possible (for example, if a node is already overburdened), new tasks will be started elsewhere, and the required data will be streamed over the network. Because inter-rack bandwidth is significantly less than intra-rack bandwidth, it is essential to prefer nodes on the same rack in the data center as the node holding the relevant data block.
Fault/error handling
The MapReduce execution framework must complete all of the tasks above in an environment where errors and faults are the norms, not the exception. Because MapReduce was created with low-cost commodity servers, the runtime must be highly resilient. Disk failures are common in large clusters, and RAM has more errors than expected. Datacenters experience planned and unplanned outages (for example, system maintenance and hardware upgrades) (e.g., power failure, connectivity loss, etc.).
That's just in terms of hardware. Exceptions must be caught, logged, and recovered for the software to be bug-free. Large-data issues have a habit of revealing obscure corner cases in otherwise bug-free code.
Furthermore, any sufficiently large dataset will contain corrupted data or records beyond a programmer's imagination, resulting in errors one would never think to check for or trap. This hostile environment requires the MapReduce execution framework to thrive.
Click on the following link to read further: Multitasking Operating System