Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Normalization in DBMS is an essential and widely used concept, as it helps us reduce the redundancy in data present in relations. Through this article, we will understand how redundancy impacts databases and how we can minimize it using the concept of normalization in databases.
Let’s first start with the basics of normalization in dbms,i.e., functional dependency.
What is Normalization in DBMS?
Normalization is the process of organizing database data by minimizing redundancy and eliminating anomalies such as update, insertion, and deletion issues.
It involves breaking a single table into smaller tables and linking them through relationships, following different normal forms to reduce redundancy.
Normalization continues until the database follows the Single Responsibility Principle (SRP), ensuring that each table has a specific role for better efficiency and consistency.
Why do we need normalization in databases?
Redundancy in data occurs when the same piece of information exists in a single database. Database redundancy can lead to many drawbacks and introduces three anomalies (or abnormalities). These anomalies are-
Insertion Anomaly
This anomaly occurs when specific data cannot be inserted into the table or database due to the absence of some other data where both of these are independent of each other.
Deletion Anomaly
While deleting some data, when some critical information is lost that was necessary to maintain the integrity of data, it is known as a deletion anomaly.
Updation / Modified Anomaly
An update anomaly occurs when modifying a single piece of data requires changes in multiple rows, leading to potential data inconsistency.
It increases storage costs and database complexity due to redundant data, making maintenance more challenging.
Normalization helps eliminate redundancy, optimizing the database and ensuring data consistency.
Types of DBMS Normal Form
Normalization in a database is done through a series of normal forms.
Normal Form
Description
1NF
If a table has no repeated groups, it is in 1NF.
2NF
If a table is in 1NF and every non-key attribute is fully dependent on the primary key, then it is in 2NF.
3NF
If a table is in 2NF and has no transitive dependencies, it is in 3NF.
BCNF
If a table is in 3NF and every non-prime attribute fully dependent on the candidate keys, then it is in BCNF.
4NF
If a table is in BCNF and has no multi-valued dependencies, it is in 4NF.
First Normal Form (1NF)
In 1NF, every database cell or relation contains an atomic value that can’t be further divided, i.e., the relation shouldn’t have multivalued attributes.
Example:
The following table contains two phone number values for a single attribute.
Emp_ID
Student Name
Phone Number
1
John
12345767890
2
Claire
9242314321, 7689025341
So to convert it into 1NF, we decompose the table as the following -
Emp_ID
Student Name
Phone Number
1
John
12345767890
2
Claire
9242314321
2
Claire
7689025341
Here, we can notice data repetition, but 1NF doesn’t care about it.
Second Normal Form (2NF)
In 2NF, the relation present should be 1NF, and no partial dependency should exist. Partial dependency is when the non-prime attributes depend entirely on the candidate or primary key, even if the primary key is composite.
Example 1: (depicting partial dependency issues)
If given with a relation R(A, B, C, D) where we have {A, B} as the primary key where A and B can’t be NULL simultaneously, but both can be NULL independently and C, D are non-prime attributes. If B is NULL, and we are given the functional dependency, say, B → C. So can this ever hold?
As B contains NULL, it can never determine the value of C. So, as B → C is a partial dependency, it creates a problem. Therefore, the non-prime attributes cannot be determined by a part of the primary key. We can remove the partial dependency present by creating two relations ( the 2NF conversion)-
Relation 1 = R1(ABD), where {A, B} is the primary key. AB determines D.
Relation 2 = R1(BC), where B is the primary key. And from this, B determines C.
Example 2:
Consider the following table. Its primary key is {StudentId, ProjectId}.
The Functional dependencies given are -
StudentId → StudentName
ProjectId → ProjectName
StudentId
ProjectId
Student Name
Project Name
1
P2
John
IOT
2
P1
Claire
Cloud
3
P7
Clara
IOT
4
P3
Abhk
Cloud
As it represents partial dependency, we decompose the table as follows -
StudentId
ProjectId
Student Name
1
P2
John
2
P1
Claire
3
P7
Clara
4
P3
Abhk
ProjectId
Project Name
P2
IOT
P1
Cloud
P7
IOT
P3
Cloud
Here projectId is mentioned in both tables to set up a relationship between them.
Third Normal Form (3NF)
In 3NF, the given relation should be 2NF, and no transitivity dependency should exist, i.e., non-prime attributes should not determine non-prime attributes.
Example:
Consider the following scenario where the functional dependencies are -
A → B and B → C, where A is the primary key.
As here, a non-prime attribute can be determined by a prime attribute, which implies transitivity dependency exists. To remove this, we decompose this and convert it into 3NF. So, we create two relations -
R1(A, B), where A is the primary key and R2(B, C), where B is the primary key.
Boyce-Codd Normal Form(BCNF)
In BCNF, the relation should be in 3NF.If given a relation, say A → B, A should be a super key in this. This implies that no prime attribute should be determined or derived from any other prime or non-prime attribute.
Example:
Given the following table. Its candidate keys are {Student, Teacher} and {Student, Subject}.
The Functional dependencies given are -
{Student, Teacher} → Subject
{Student, Subject} → Teacher
Teacher → Subject
Student Name
Subject
Teacher
John
Physics
Olivia
Claire
English
Emma
Clara
Physics
Olivia
Abhk
English
Sophia
As this table is not in BCNF form, so we decompose it into the following tables:
Student Name
Teacher
John
Olivia
Clara
Emma
Robin
Olivia
Kaley
Sophia
Teacher
Subject
Olivia
Physics
Emma
English
Olivia
Physics
Sophia
English
Here Teacher is mentioned in both tables to set up a relationship between them.
Fourth Normal Form (4NF)
For any relation to be in 4NF, it should have no multi-valued dependencies and is in Boyce Codd Normal Form. It simplifies the database by eliminating the non-trivial multi-valued dependencies besides those including the candidate key.
Example:
Consider the following table:
Student
Subject
Student Phone Number
John
Physics
12345767890
Clara
English
9242314321
Robin
Mathematics
7689025341
Kaley
Database
9878632656
In the above table, the subject and student phone numbers are two independent entities, showing no relation between the subject and phone number. So to convert it into 4NF, we decompose the table as -
Student
Student Phone Number
John
12345767890
Clara
9242314321
Robin
7689025341
Kaley
9878632656
Student
Subject
John
Physics
Clara
English
Robin
Mathematics
Kaley
Database
Here the Student is mentioned in both tables to set up a relationship between them.
Importance of Normalization in Database Design
Reduces Data Redundancy: Normalization minimizes the duplication of data by organizing it into related tables, ensuring that each piece of information is stored only once.
Improves Data Integrity: By enforcing rules and constraints, normalization ensures the accuracy and consistency of data across the database.
Simplifies Database Maintenance: A normalized database is easier to update, modify, and maintain, as changes need to be made in only one place.
Enhances Query Performance: Properly normalized databases can improve query efficiency by reducing the amount of data scanned during operations.
Prevents Update Anomalies: Normalization eliminates issues like insertion, deletion, and update anomalies, ensuring that data remains consistent.
Facilitates Scalability: A well-structured, normalized database can handle growth and changes in data requirements more effectively.
Supports Better Design Practices: Normalization encourages a logical and organized approach to database design, making it easier to understand and manage.
Advantages of Normalization
The following are the advantages of normalization in a database:
The redundancy in data is minimized, leading to a smaller size of the database.
It removes the data inconsistency.
The database becomes easy to maintain when we organize it using normal forms.
It becomes comparatively easier to write queries as the size of the database decreases.
Decreased database size further reduces the complexity of sorting and finding any value in the database.
Disadvantages of Normalization
The following are the disadvantages of normalization in database:
Decomposing the table in Normalization can lead to a poorer database design and severe problems.
The process of normalization in the database is very time-consuming as we decompose the table repeatedly into different normal forms until we reach the SRP situation.
It becomes tough to normalize relations that are of a higher degree.
Frequently Asked Questions
What is 1NF 2NF and 3NF?
First Normal Form, or 1NF, removes repeated groups from a table to guarantee atomicity. The Second Normal Form, or 2NF, lessens redundancy by eliminating partial dependencies. In a relational database, the Third Normal Form, or 3NF, reduces data duplication by removing transitive dependencies.
What are the four 4 types of database normalization?
First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), and Boyce-Codd Normal Form (BCNF) are the four methods of database normalization. They enhance data integrity in relational databases by gradually removing redundant data.
What are the 3 rules for normalizing a database?
Normalization rules in database design include: 1) Eliminate data redundancy by organizing data into separate tables, 2) Ensure each table has a primary key for unique identification, and 3) Establish relationships between tables using foreign keys for data integrity.
Conclusion
Normalization in the database is a crucial concept that helps minimize data redundancy and further helps organize the database system. In this article, we learned how by decomposing the table into smaller ones using different normal forms, we can remove the anomalies. Further, we learned to remove data inconsistency and improve data integrity.
To learn more about normalization in dbms, we recommend reading the following articles: