Introduction
It is not always a good idea to keep every piece of information in a single table. When the data is less, it may not be a big issue however, when there is a lot of information, and every information is stored in a single table, it will drastically impact our query time.
Normalization is a technique to remove or reduce redundancy from a table. It is a significant step of a Database Management System. More formally, normalization can be defined as a process of analyzing a given relational schema based on their functional dependencies and primary keys to achieve desired properties of minimizing redundancy and minimizing the insertion, deletion, and update anomalies. Through normalization, we try to reduce the redundancy as much as possible in a table and also avoid anomalies.
Before we discuss the various normal forms, it is important to understand the meaning of redundancy and anomalies in a database table.
Also read anomalies in database
Redundancy and anomalies in a table
Redundancy
When some columns in a table have repeated data, it is said to be redundant. Redundancy impacts the overall database design. It uses unnecessary space to store the same entry again and again. Consider the below students table.
It can be observed that the information regarding the branch, HOD_ID, and HOD_name is getting repeated every time a new student gets admission in CSE or IT. Consider a situation where 100 students take admission in the CSE branch. In that case, we have to store the above information regarding the branch, HOD_ID, HOD_name hundred times for hundred different students. But there is only one branch named CSE. It has only one HOD. So there is no point in storing the same information again and again. This condition when some columns have repeated information is known as redundancy. Through normalization, we try to reduce the redundancy as much as possible.
Anomalies
There are three types of anomalies in a table.
-
Insertion anomalies
In the above students table, if we try to insert a new tuple where we want to add the information about a new branch (say ECE), it's HOD_ID and the HOD_name, it is not possible to do without adding a stud_id value. Since stud_id is a primary key, we cannot keep it null. Here a situation occurs, where we are unable to store the information about the new branch. It may be possible that no student has at present taken admission in this new branch. These are known as insertion anomalies. -
Update anomalies
Suppose in the students table, and there are 100 records for hundred students of CSE branch. Each of them has repeated information about the branch, HOD_ID, HOD_name corresponding to their entry in the table. Suppose the CSE branch has got a new HOD now and we want to change both the HOD_ID and HOD_name. We have to do it a hundred times. This is highly inefficient. This problem is known as an update anomaly. -
Deletion anomalies
Suppose we delete all the information of the students corresponding to the IT branch. We will not have any entry of any student who is from the IT branch in the table. While deleting the student information, we also deleted the information about the IT branch, i.e., branch, HOD_ID, HOD_name corresponding to the IT branch. This results in loss of information. Such a problem is known as deletion anomalies.