Introduction
YARN is a resource management framework in Hadoop that supports various workloads. MapReduce is a framework for processing large parallel datasets across multiple Hadoop cluster nodes. We will learn about both frameworks in detail in this article and understand the difference between YARN vs MapReduce.
YARN
YARN(Yet Another Resource Negotiator) is a resource management framework introduced in Hadoop 2.0 to replace the MapReduce job tracker. We use YARN for storing and processing large amounts of data.
YARN is a platform for managing and allocating resources such as CPU and memory across multiple applications on a Hadoop cluster.
It shares resources across multiple users and applications. It supports various workloads, such as batch and interactive processing. YARN is known as Resource Manager, ApplicationMaster, NodeManager, and a distributed application.
Architecture of YARN
-
YARN which stands for yet another resource negotiator is a resource management and scheduling framework in Hadoop.
-
It has a Resource manager, Node Manager and an Application master for each application. Resource manager in YARN acts as a central coordinator for managing and allocating resources in a cluster.
-
Node managers are responsible for managing resources on individual nodes in the cluster. They are responsible for monitoring the resources such as the CPU and memory.
-
The Application master is responsible for coordinating the execution of tasks within an application.
- YARN supports various types of workloads, making it a key component in large-scale distributed data processing systems.
Application Running Process in YARN
-
Apply YARN.
-
YARN allocates the resources from the Hadoop cluster to the application, such as CPU, memory.
-
YARN launches applications in the containers on the selected cluster nodes. The application runs within these containers.
-
YARN ensures that the application has the required resources and monitors the application's progress.
- After the application is finished, YARN cleans up the resources it uses.
Advantages of YARN
-
YARN provides a highly scalable framework for managing resources in the Hadoop cluster. It efficiently handles large scale data workloads in Hadoop clusters.
-
It supports various data processing frameworks that allow users to choose the most suitable framework according to their application requirements.
-
It provides a resource manager that ensures fair sharing of cluster resources among different workloads.
- It promotes a platform for new technologies and innovations without affecting the stability of the cluster.
Limitations of YARN
-
YARN’s resource management framework supports resource manager and node manager which consumes a lot of memory.
-
It has an additional complexity as compared to the original MapReduce framework. It might be challenging for first time users.
- The additional components of YARN require more computational resources such as CPU which can increase the hardware and infrastructure costs.