Table of contents
1.
Introduction
2.
Redundancy and anomalies in a table
2.1.
Redundancy
2.2.
Anomalies
3.
Types of Normalization
3.1.
First Normal Form (1NF)
3.2.
Second Normal Form (2NF)
3.3.
Third Normal Form (3NF)
3.4.
Boyce Codd Normal Form (BCNF)
3.5.
Fourth Normal Form (4NF)
4.
Advantages of Normalization
5.
FAQs
6.
Key takeaways
Last Updated: Mar 27, 2024

Need for Normalization

Author ANKIT KUMAR
1 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

It is not always a good idea to keep every piece of information in a single table. When the data is less, it may not be a big issue however, when there is a lot of information, and every information is stored in a single table, it will drastically impact our query time.

Normalization is a technique to remove or reduce redundancy from a table. It is a significant step of a Database Management System. More formally, normalization can be defined as a process of analyzing a given relational schema based on their functional dependencies and primary keys to achieve desired properties of minimizing redundancy and minimizing the insertion, deletion, and update anomalies. Through normalization, we try to reduce the redundancy as much as possible in a table and also avoid anomalies.

Before we discuss the various normal forms, it is important to understand the meaning of redundancy and anomalies in a database table.

Also read anomalies in database

Redundancy and anomalies in a table

Redundancy

When some columns in a table have repeated data, it is said to be redundant. Redundancy impacts the overall database design. It uses unnecessary space to store the same entry again and again. Consider the below students table.

It can be observed that the information regarding the branch, HOD_ID, and HOD_name is getting repeated every time a new student gets admission in CSE or IT. Consider a situation where 100 students take admission in the CSE branch. In that case, we have to store the above information regarding the branch, HOD_ID, HOD_name hundred times for hundred different students. But there is only one branch named CSE. It has only one HOD. So there is no point in storing the same information again and again. This condition when some columns have repeated information is known as redundancy. Through normalization, we try to reduce the redundancy as much as possible.

Anomalies

There are three types of anomalies in a table. 

  • Insertion anomalies
    In the above students table, if we try to insert a new tuple where we want to add the information about a new branch (say ECE), it's HOD_ID and the HOD_name, it is not possible to do without adding a stud_id value. Since stud_id is a primary key, we cannot keep it null. Here a situation occurs, where we are unable to store the information about the new branch. It may be possible that no student has at present taken admission in this new branch. These are known as insertion anomalies.
  • Update anomalies
    Suppose in the students table, and there are 100 records for hundred students of CSE branch. Each of them has repeated information about the branch, HOD_ID, HOD_name corresponding to their entry in the table. Suppose the CSE branch has got a new HOD now and we want to change both the HOD_ID and HOD_name. We have to do it a hundred times. This is highly inefficient. This problem is known as an update anomaly.
  • Deletion anomalies
    Suppose we delete all the information of the students corresponding to the IT branch. We will not have any entry of any student who is from the IT branch in the table. While deleting the student information, we also deleted the information about the IT branch, i.e., branch, HOD_ID, HOD_name corresponding to the IT branch. This results in loss of information. Such a problem is known as deletion anomalies.

Types of Normalization

After discussing redundancy and anomalies, now we shall see the various normal forms through which we reduce the redundancy and avoid the anomalies in a database. There are a set of rules through which we achieve our goal. This set of rules is the normal form. As we move towards higher normal forms, our table is more "normalized" or free from redundancy and anomalies.

The various normal forms are:

First Normal Form (1NF)

A relation or a table is said to be in the first normal form if all the columns in the table have atomic values. No column should have more than one value. For example, a column mobile number should contain only one mobile number and not more than one.

The above table is in the first normal form as no attribute has a composite value, i.e., no column has more than one value. This is the first step towards normalization.

Second Normal Form (2NF)

The following conditions are necessary for a table to be in the second normal form.

  • The relation or the table must be in the first normal form.
  • There should not be any partial dependencies in the table.
  • A relational schema R is in 2NF, if every non-prime attribute “A” in R is not partially dependent on any key of R.

If there is a partial dependency, then there will be redundancy in the table. Then we should decompose the table so that the partial dependency is removed.

Third Normal Form (3NF)

The following conditions are necessary for a table to be in the third normal form.

  • The table must be in the second normal form.
  • A non-prime attribute should not determine a non-prime attribute.

Consider a function dependency  X->Y, of a relation R. It is important that there is no transitive dependency. Either of the two conditions should be met.

  • X should be a super key.
  • Y must be a prime attribute.

If all the FDs of the relation strictly follow the above condition, then we can say that the relation is in 3NF. Otherwise, again we have to decompose the table until all the transitive dependencies are removed.

Boyce Codd Normal Form (BCNF)

The Boyce Codd normal form (BCNF) is an extended version of the third normal form (3NF). For a table to be in BCNF following conditions must be fulfilled:

  • The table must be in the third normal form, i.e., there should not be any transitive dependency.
  • For every FD X->Y in the relation R, X must be a super key.

In the third normal form for an FD X->Y, it was not necessary that X should be a super key. Even if Y was a prime attribute, we could say the table is in 3NF. However, in BCNF, X must be a super key.

Fourth Normal Form (4NF)

For a table to be in the fourth normal form, the following conditions must be fulfilled:

  • The table should be in Boyce Codd Normal Form (BCNF).
  • There should not be any multivalued dependency in the table, i.e., for a dependency X->Y, for every value of X, there should be a single value of Y.

Apart from the above four normal forms, we also have the fifth normal form, sixth normal form, and the domain key normal form.

Advantages of Normalization

  • Through normalization, we reduce the redundancy in the table to a great extent.
  • Insertion, update, and deletion anomalies can be removed.
  • When the tables are decomposed into smaller tables, data gets more organized and table specific.
  • It reduces memory wastage.
  • Loss of data can be reduced to a large extent.

FAQs

  1. What is normalization in DBMS?
    Normalization is the process of organizing the data in the database by reducing redundancy and avoiding anomalies.
     
  2. What are the various types of normalization?
    First normal form, second normal form, third normal form, BCNF, fourth normal form, fifth normal form, sixth normal form.
     
  3. What is the difference between BCNF and 3NF?
    For a table to be in BCNF, it is necessary that for each FD X->Y, X is a super key, whereas it is not mandatory in case of 3NF.
     
  4. What is the main objective of normalization?
    The main objective is to reduce redundancy and avoid anomalies.
     
  5. When can we say that a table is in the second normal form?
    There should be no partial dependency in the table.

Take this awesome course from coding ninjas.

Key takeaways

  • Normalization is a technique by which we reduce redundancy and avoid anomalies in a database.
  • When some columns in a table have repeated data, it is said to be redundant.
  • Three types of anomalies are insertion anomalies, deletion anomalies, and update anomalies.
  • Various types of normal forms are First normal form, second normal form, third normal form, BCNF, fourth normal form, fifth normal form, sixth normal form.

Never stop learning. Explore the top 100 SQL problems here.

Happy learning!

Live masterclass