Table of contents
1.
Introduction
2.
Definition
3.
Why map and reduce put together? 
4.
The Map Task
5.
The Reduce Task
6.
How does MapReduce Task work?
7.
Frequently Asked Questions
7.1.
What is MapReduce?
7.2.
Why do we need MapReduce?
7.3.
What are the advantages of MapReduce?
7.4.
What are the disadvantages of Mapreduce?
8.
Conclusion 
Last Updated: Mar 27, 2024
Easy

Putting Map and Reduce Together

Author Saumya Gupta
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

When I was pursuing my engineering, there was a subject in my 4th year called Big Data. Back then, I was interested to know about it because my teacher told me that amazon, google, Flipkart, and many more big tech giants use the significant data phenomena to handle and secure their extensive set of data. 

During my research, I learned about MapReduce, and the concept of MapReduce amazed me so much. So today, I will share the most powerful and valuable tool to handle the data.

Please bear with me for a few minutes to read this article till the end and acknowledge my effort.

Definition

Whenever we have a large set of data, then, in that case, the vast data set will be divided into pieces, and processing will be done in parallel.

MapReduce is one of the essential components of the Hadoop ecosystem. MapReduce is designed to process a large amount of data in parallel by dividing the work into smaller, independent tasks. It takes input as a list and converts it to output as a list.

Why map and reduce put together? 

When I researched MapReduce, I always wondered why we use the Map and reduce together.

What are the tasks of mapping and reducing individually?

Finally, I found the answers to all the questions. So, first of all, I will discuss the job of the map function.

The Map Task

The map function takes a set of keys and values. It takes a key-value pair as an input. The data can be present in any form, i.e., structured (database tables mean the data can be represented in rows and columns) or Unstructured (text files, videos, images). The framework can make into two parts Keys and Values. The keys can be treated as the reference to a dataset, and the values are nothing but a dataset. The task is applicable for every input.

The Reduce Task

The reducer takes the key-value pair, which the mapper creates as the input. It means that the mapper is taking the information, and the mapper's output will act as an input to the reducer, and the reducer will produce the respective result accordingly. The critical element sort the key-value pairs. We perform the sorting aggregation or summation type jobs in the reducer, like counting, maximum or minimum calculation, etc.

How does MapReduce Task work?

Map-reduce is a massively parallel processing technique distributed on a commodity cluster.

Let us solve some real-world problems.

Amazon wants to calculate its total sales city-wise for 2022 in India.

S.No. Date City Name Amount
1 20/04/2022 Mumbai 12000

Now, in a non-traditional computing environment, you will solve this problem using hash tables where 

Key name: city name

Amount: value

But if you run this on one terabyte of data like amazon has, it will take a long time to read and process it. It may run out of memory because of the enormous amount of data amazon wants to use the MapReduce technique to solve this problem. 

There are two phases in the MapReduce job:

The map phase and the Reduce phase, as we read earlier in this blog. 

So now what amazon does is rather than giving the task to one person, it splits the data into chunks based on months. So we can say that each mapper gets data for each month, so we have 12 mappers who get data for each month and work on it parallelly at the same time with a small fraction of data.

Now, what will the mapper do? 

The first mapper will get the first record as the city's name, the number of sales and write it on the index card, takes the following form like Jaipur, and writes the deals on the other index card and saves them. They give data for the same city on the same pile as they pile up. They pile up the data. By the end, each mapper will have a stack of cards per city now the mapper job is over . after the mapper job is over, the reducer will get these piles of cards. We can pair which city they are responsible for.

For example, we can tell reducer one that it will be accountable for north cities, reducer 2 for south towns, etc. now, the reducer retrieves the piles of cards for their cities, so they collect all small banks from the respective cities.

Now reducer adds all amount of all the cards in a bank and gets the total sales per city. To organize it correctly, they can sort it in alphabetical order. So I hope by this example, you can easily understand the working of MapReduce.

Frequently Asked Questions

What is MapReduce?

MapReduce is a technique for dividing the work across the distributed systems.

Why do we need MapReduce?

We need MapReduce for the following reasons:

  1. Distribute the load 
  2. Reduce the big data and extract the meaningful data. 
  3. Scalable 
  4. Fault-tolerant.

What are the advantages of MapReduce?

The advantages of MapReduce are

  1. Fast working 
  2. Cost-effective Technique 
  3. Security and authentication 
  4. Parallel processing.

What are the disadvantages of Mapreduce?

The disadvantages of MapReduce

  1. Not very easy to implement  
  2. Cache memory is not present  
  3. Mapreduce is not suitable for real-time processing like youtube live streaming videos.  

Conclusion 

In this article, we have extensively discussed what “map” and “reduced” functions are and why we need MapReduce.

We hope that this blog has helped you enhance your knowledge regarding the MapReduce, and if you would like to learn more, check out our articles on MapReduce

A ninja never stops learning, so to feed your quest to learn and become more advanced and skilled, head over to our practice platform Coding Ninjas Studio to practice advanced-level problems. Attempt 100 SQL problems, read interview experiences, and much more!

Happy Coding!

Live masterclass