Table of contents
1.
Introduction 
2.
What is Normalization in DBMS?
3.
Why do we need normalization in databases?
3.1.
Insertion Anomaly
3.2.
Deletion Anomaly 
3.3.
Updation / Modified Anomaly
4.
Types of DBMS Normal Form
4.1.
First Normal Form (1NF)
4.2.
Second Normal Form (2NF)
4.3.
Third Normal Form (3NF)
4.4.
Boyce-Codd Normal Form(BCNF)
4.5.
Fourth Normal Form (4NF)
5.
Importance of Normalization in Database Design
6.
Advantages of Normalization
7.
Disadvantages of Normalization
8.
Frequently Asked Questions
8.1.
What is 1NF 2NF and 3NF?
8.2.
What are the four 4 types of database normalization?
8.3.
What are the 3 rules for normalizing a database?
9.
Conclusion
Last Updated: Mar 23, 2025
Easy

Normalization in DBMS

Author Harshita Rajput
2 upvotes
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction 

Normalization in DBMS is an essential and widely used concept, as it helps us reduce the redundancy in data present in relations. Through this article, we will understand how redundancy impacts databases and how we can minimize it using the concept of normalization in databases.

Normalization in DBMS

Let’s first start with the basics of normalization in dbms,i.e., functional dependency.

What is Normalization in DBMS?

  • Normalization is the process of organizing database data by minimizing redundancy and eliminating anomalies such as update, insertion, and deletion issues.
     
  • It involves breaking a single table into smaller tables and linking them through relationships, following different normal forms to reduce redundancy.
     
  • Normalization continues until the database follows the Single Responsibility Principle (SRP), ensuring that each table has a specific role for better efficiency and consistency.

Why do we need normalization in databases?

Redundancy in data occurs when the same piece of information exists in a single database. Database redundancy can lead to many drawbacks and introduces three anomalies (or abnormalities). These anomalies are-

Insertion Anomaly

This anomaly occurs when specific data cannot be inserted into the table or database due to the absence of some other data where both of these are independent of each other.

Deletion Anomaly 

While deleting some data, when some critical information is lost that was necessary to maintain the integrity of data, it is known as a deletion anomaly. 

Updation / Modified Anomaly

  • An update anomaly occurs when modifying a single piece of data requires changes in multiple rows, leading to potential data inconsistency.
     
  • It increases storage costs and database complexity due to redundant data, making maintenance more challenging.
     
  • Normalization helps eliminate redundancy, optimizing the database and ensuring data consistency.

Types of DBMS Normal Form

Normalization in a database is done through a series of normal forms. 

series of normal forms
Normal FormDescription
1NFIf a table has no repeated groups, it is in 1NF.
2NFIf a table is in 1NF and every non-key attribute is fully dependent on the primary key, then it is in 2NF.
3NFIf a table is in 2NF and has no transitive dependencies, it is in 3NF.
BCNFIf a table is in 3NF and every non-prime attribute fully dependent on the candidate keys, then it is in BCNF.
4NFIf a table is in BCNF and has no multi-valued dependencies, it is in 4NF.

First Normal Form (1NF)

In 1NF, every database cell or relation contains an atomic value that can’t be further divided, i.e., the relation shouldn’t have multivalued attributes.

Example:

The following table contains two phone number values for a single attribute.

Emp_IDStudent NamePhone Number
1John12345767890
2Claire9242314321, 7689025341

So to convert it into 1NF, we decompose the table as the following - 

Emp_IDStudent NamePhone Number
1John12345767890
2Claire9242314321
2Claire7689025341

Here, we can notice data repetition, but 1NF doesn’t care about it.

Second Normal Form (2NF)

In 2NF, the relation present should be 1NF, and no partial dependency should exist. Partial dependency is when the non-prime attributes depend entirely on the candidate or primary key, even if the primary key is composite.

 

Example 1: (depicting partial dependency issues) 

If given with a relation R(A, B, C, D) where we have {A, B} as the primary key where A and B can’t be NULL simultaneously, but both can be NULL independently and C, D are non-prime attributes. If B is NULL, and we are given the functional dependency, say, B  C. So can this ever hold?

As B contains NULL, it can never determine the value of C. So, as B → C is a partial dependency, it creates a problem. Therefore, the non-prime attributes cannot be determined by a part of the primary key. We can remove the partial dependency present by creating two relations ( the 2NF conversion)- 

Relation 1 = R1(ABD), where {A, B} is the primary key. AB determines D.

Relation 2 = R1(BC), where B is the primary key. And from this, B determines C.

 

Example 2:

Consider the following table. Its primary key is {StudentId, ProjectId}.

The Functional dependencies given are - 

StudentId → StudentName

ProjectId  ProjectName

StudentIdProjectIdStudent NameProject Name
1P2JohnIOT
2P1ClaireCloud
3P7ClaraIOT
4P3AbhkCloud

As it represents partial dependency, we decompose the table as follows - 

StudentIdProjectIdStudent Name
1P2John
2P1Claire
3P7Clara
4P3Abhk
ProjectIdProject Name
P2IOT
P1Cloud
P7IOT
P3Cloud

Here projectId is mentioned in both tables to set up a relationship between them.

Third Normal Form (3NF)

In 3NF, the given relation should be 2NF, and no transitivity dependency should exist, i.e., non-prime attributes should not determine non-prime attributes.

Example:

Consider the following scenario where the functional dependencies are -  

B and B → C, where A is the primary key. 

As here, a non-prime attribute can be determined by a prime attribute, which implies transitivity dependency exists. To remove this, we decompose this and convert it into 3NF. So, we create two relations -

R1(A, B), where A is the primary key and R2(B, C), where B is the primary key.

Boyce-Codd Normal Form(BCNF)

In BCNF, the relation should be in 3NF.If given a relation, say A → B, A should be a super key in this. This implies that no prime attribute should be determined or derived from any other prime or non-prime attribute.

Example:

Given the following table. Its candidate keys are {Student, Teacher} and {Student, Subject}.

The Functional dependencies given are - 

{Student, Teacher} → Subject

{Student, Subject}  Teacher

Teacher → Subject

Student NameSubjectTeacher
JohnPhysicsOlivia
ClaireEnglishEmma
ClaraPhysicsOlivia
AbhkEnglishSophia

As this table is not in BCNF form, so we decompose it into the following tables:

Student NameTeacher
JohnOlivia
ClaraEmma
RobinOlivia
KaleySophia
TeacherSubject
OliviaPhysics
EmmaEnglish
OliviaPhysics
SophiaEnglish

Here Teacher is mentioned in both tables to set up a relationship between them.

Fourth Normal Form (4NF)

For any relation to be in 4NF, it should have no multi-valued dependencies and is in Boyce Codd Normal Form. It simplifies the database by eliminating the non-trivial multi-valued dependencies besides those including the candidate key.

Example:

Consider the following table:

StudentSubjectStudent Phone Number
JohnPhysics12345767890
ClaraEnglish9242314321
RobinMathematics7689025341
KaleyDatabase9878632656

In the above table, the subject and student phone numbers are two independent entities, showing no relation between the subject and phone number. So to convert it into 4NF, we decompose the table as - 

StudentStudent Phone Number
John12345767890
Clara9242314321
Robin7689025341
Kaley9878632656
StudentSubject
JohnPhysics
ClaraEnglish
RobinMathematics
KaleyDatabase

Here the Student is mentioned in both tables to set up a relationship between them.

Importance of Normalization in Database Design

  • Reduces Data Redundancy: Normalization minimizes the duplication of data by organizing it into related tables, ensuring that each piece of information is stored only once.
  • Improves Data Integrity: By enforcing rules and constraints, normalization ensures the accuracy and consistency of data across the database.
  • Simplifies Database Maintenance: A normalized database is easier to update, modify, and maintain, as changes need to be made in only one place.
  • Enhances Query Performance: Properly normalized databases can improve query efficiency by reducing the amount of data scanned during operations.
  • Prevents Update Anomalies: Normalization eliminates issues like insertion, deletion, and update anomalies, ensuring that data remains consistent.
  • Facilitates Scalability: A well-structured, normalized database can handle growth and changes in data requirements more effectively.
  • Supports Better Design Practices: Normalization encourages a logical and organized approach to database design, making it easier to understand and manage.

Advantages of Normalization

The following are the advantages of normalization in a database:

  1. The redundancy in data is minimized, leading to a smaller size of the database.
     
  2. It removes the data inconsistency.
     
  3. The database becomes easy to maintain when we organize it using normal forms.
     
  4. It becomes comparatively easier to write queries as the size of the database decreases.
     
  5. Decreased database size further reduces the complexity of sorting and finding any value in the database.

Disadvantages of Normalization

The following are the disadvantages of normalization in database:

  1. Decomposing the table in Normalization can lead to a poorer database design and severe problems.
     
  2. The process of normalization in the database is very time-consuming as we decompose the table repeatedly into different normal forms until we reach the SRP situation.
     
  3. It becomes tough to normalize relations that are of a higher degree.

Frequently Asked Questions

What is 1NF 2NF and 3NF?

First Normal Form, or 1NF, removes repeated groups from a table to guarantee atomicity. The Second Normal Form, or 2NF, lessens redundancy by eliminating partial dependencies. In a relational database, the Third Normal Form, or 3NF, reduces data duplication by removing transitive dependencies.

What are the four 4 types of database normalization?

First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and Boyce-Codd Normal Form (BCNF) are the four methods of database normalization. They enhance data integrity in relational databases by gradually removing redundant data.

What are the 3 rules for normalizing a database?

Normalization rules in database design include: 1) Eliminate data redundancy by organizing data into separate tables, 2) Ensure each table has a primary key for unique identification, and 3) Establish relationships between tables using foreign keys for data integrity.

Conclusion

Normalization in the database is a crucial concept that helps minimize data redundancy and further helps organize the database system. In this article, we learned how by decomposing the table into smaller ones using different normal forms, we can remove the anomalies. Further, we learned to remove data inconsistency and improve data integrity.

To learn more about normalization in dbms, we recommend reading the following articles:

Live masterclass