Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Redundancy is a big issue in DBMS. It causes a lot of problems; one of those is Data anomalies. Data redundancy is a phenomenon that occurs inside a database or data storage system when the identical piece of data is stored in two different locations. This can refer to two distinct fields inside a single database or two distinct locations across numerous software environments or platforms. For example, When a name and address are both contained in distinct columns inside a database, If the relationship between these data points is specified in each and every new database entry, it will result in unnecessary duplication. And hence Data Anomalies. To counter this big issue, Normalisation was created to reduce the negative effects of creating tables that will introduce anomalies into the database.
Normalisation is the process of structuring data in a database in order to eliminate data redundancy, insertion anomalies, update anomalies, and deletion anomalies. Before moving forward, what do anomalies mean. Theoretically, an anomaly is something unexpected, an abnormality. Due to anomalies, the overall integrity of the data recorded in the database will deteriorate over time. These can happen naturally in DBMS and lead to unwanted results. Let's explore different types of anomalies and then go on to normal forms using examples.
Types of Anomalies
We are now moving to the types of anomalies. There are three types of anomalies Update, Insertion, Deletion. Let's understand different types of anomalies with the following Table.
Student_ID
Student_Name
Student_Address
Student_Grades
Student_Courses
001
Aditya
Hajipur
A
Biology
002
Shahid
Kushinagar
A+
Biology
003
Abhay
Hathwa
B
Biology
004
Ankit
Hajipur
A+
C.Sc
004
Ankit
Chandigarh
A+
C. Sc
005
Uttkarsh
Dartbhanga
C.Sc
Update Anomalies - Assume we have five columns in a table, two of which are student Name and student Address. Now, whenever one of the students changes locations, we must update the Table. For example, Ankit moves from Hajipur to Chandigarh, and the entry wasn't made. However, if the database is not normalised, one student may have many entries, and one of them may be missed while updating all of those entries.
Insertion Anomalies - Assume we have a table with five columns. Student_ID, Student_Name, Student_Address, Student_Grades and Student_Courses are all required. When a new student enrols in school, the first three properties and last column can be filled in, but the fourth attribute will have a NULL value because he has no marks yet since Uttkarsh has joined the school but is yet to give a test, so his grades are bound to be empty. But Student_Grades does not allow NULL values.
Deletion Anomalies - This anomaly suggests that crucial information was removed from the Table that was not necessary. Assume we have the following information about students and the courses they have taken: (ID, Name, Course, address). If a student departs the school, the entry for that student will be removed. However, that deletion will also erase the course information, despite the fact that the course is determined by the institution, not the student.
So, care to think about how Normalisation works here to solve the anomalies. Well, it tries to break tables into numerous tables and uses keys to describe relationships between them. In the next section, we will discuss normalisation techniques.
Normalisation techniques
Following are the techniques for Normalisation.
1NF
The first normal form indicates that each cell in the Table must contain just one value. As a result, every intersection of rows and columns must include atomic values.
Example of 1NF:
Suppose a school hostel wants to store the names and contact details of its students. It creates a table that is following:
Student_Roll
Student_Name
Student_Address
Student_Mobile
001
Aditya
Hajipur
9431444444
002
Shahid
Kushinagar
9944442222
9977332222
003
Abhay
Hathwa
8823234411
004
Ankit
Hajipur
8945456767
9123234545
005
Uttkarsh
Darbhanga
8969690000
We can see that two students have two mobile numbers, so the hostel staff saved that detail. But now we have a problem as 1NF says, "each attribute of a table must have atomic (single) values". So, here Student_Mobile violates that rule. Now, the Table that follows 1NF will be like the following Table.
Student_ID
Student_Name
Student_Address
Student_Mobile
001
Aditya
Hajipur
9431444444
002
Shahid
Kushinagar
9944442222
002
Shahid
Kushinagar
9977332222
003
Abhay
Hathwa
8823234411
004
Ankit
Hajipur
8945456767
004
Ankit
Hajipur
9123234545
005
Uttkarsh
Darbhanga
8969690000
2NF
We discussed candidate key before, and now this is where it comes into play. The 2NF rule indicates that none of the Table's non-prime characteristics is reliant on any of the candidate keys. In other words, it must follow two conditions. i.e., Table is in 1NF, and No non-prime attribute is dependent on the proper subset of any table candidate key. An attribute that is not part of any candidate key is known as a non-prime attribute.
Example for 2NF:
In the school hostel, teachers' data. So after the Table is following 1NF
Teachers_ID
Teahcers_Subjects
Teachers_Age
1001
English
38
1002
Biolog
39
1003
Hindi
45
1003
Sanskrit
45
1004
Maths
25
1005
Civics
35
1005
History
35
Because each attribute has atomic values, the Table is in 1 NF. However, it is not in 2NF since the non-primary attribute Teachers_Age is reliant on Teachers_ID, which is a valid subset of the candidate key. Together Teachers_ID and Teachers_Subjects are forming a candidate key for this table. Which in turn can be the primary key for the table.
As the rule states, no non-prime attribute is dependent on the proper subset of any candidate key of the Table. This is a violation of the 2NF rule. So the Table that will follow 2NF will look like this.
To comply with 2NF, we can divide the table into two tables as follows:
Teachers_ID
Teachers_Age
1001
38
1002
39
1003
45
1004
25
1005
35
Subject Table:
Teachers_ID
Teahcers_Subjects
1001
English
1002
Biolog
1003
Hindi
1003
Sanskrit
1004
Maths
1005
Civics
1005
History
Now we move to the next type, 3NF
3NF
This rule specifies that tables must be in the 2NF format and that each table should only include columns that are not transitively dependent on their own table's primary key. The attribute unrelated to the candidate key is a non-prime attribute and vice-versa.
A table is in 3NF if it is in 2NF, and for each functional dependency, X-> Y, at least one of the following conditions hold:
X is the super key of the Table
Y is a prime attribute of the Table
Example of 3NF:
The hostel details of students contain a ZIP code along with the address. The Table will have to follow the 2NF rule as the rule for 3NF states.
Student_ID
Student_Name
Student_Address
Student_Vill
Student_ZIP
001
Aditya
Hajipur
Heerapur
820002
002
Shahid
Kushinagar
Tamkuhiraj
274000
003
Abhay
Hathwa
Hathwa
841435
004
Ankit
Hajipur
Babupur
823202
005
Uttkarsh
Darbhanga
Silvarat
847222
Super keys: {Student_ID}, {Student_ID, Student_Name}, {Student_ID, Student_Name, Student_ZIP}…so on
Candidate Keys: {Student_ID}
Non-prime attributes: all attributes except Student_ID are non-prime as they are not part of any candidate keys.
Here, Student_Address and Student_Vill are dependent on Student_ZIP. And, Student_ZIP is dependent on Student_ID that makes non-prime attributes (Student_Address and Student_Vil) transitively dependent on super key (Student_ID). This violates the rule of 3NF.
So, now to make this Table in 3NF, we remove the transitive dependency.
Student table:
Student_ID
Student_Name
Student_ZIP
001
Aditya
820002
002
Shahid
274000
003
Abhay
841435
004
Ankit
823202
005
Uttkarsh
847222
Stud_ZIP table:
Student_ZIP
Student_Address
Student_Vill
820002
Hajipur
Heerapur
274000
Kushinagar
Tamkuhiraj
841435
Hathwa
Hathwa
823202
Hajipur
Babupur
847222
Darbhanga
Silvarat
Now we will see the last one, BCNF.
BCNF
Since it is a more advanced variant of 3NF, it is sometimes known as 3.5NF. BCNF is more no-nonsense than 3NF. A table conforms with BCNF if it is in 3NF, and X is the Table's super key for every functional dependence X->Y.
Example of BCNF:
In the hostel, students play more than one sport, so now in the Table:
Student_ID
Student_Game
Sports_Type
Student_Address
001
Football
O1
Hajipur
002
Football
O1
Kushinagar
002
Basketball
O2
Kushinagar
003
Basketball
O2
Hathwa
003
Badminton
I1
Hathwa
004
Badminton
I1
Hajipur
005
Football
O1
Darbhanga
005
Table Tennis
I2
Darbhanga
Functional dependencies in the table above: Student_ID -> Student_Address
Student_Game -> Sports_Type
Candidate key: {Student_Name, Student_Game}
The Table is not in BCNF as neither Student_ID nor Student_Game alone are keys.
To make the Table comply with BCNF, we can break the Table into three tables like this:
Stud_Add table:
Student_ID
Student_Address
001
Hajipur
002
Kushinagar
003
Hathwa
004
Hajipur
005
Darbhanga
Stud_Sport Table:
Student_ID
Student_Game
001
Football
002
Football
002
Basketball
003
Basketball
003
Badminton
004
Badminton
005
Football
005
Table Tennis
Sports table:
Student_Game
Sports_Type
Football
O1
Basketball
O2
Badminton
I1
Table Tennis
I2
Functional dependencies:
Student_ID -> Student_Address
Student_Game -> Sports_Type
Candidate keys:
For the first Table: Student_ID
For the second table: {Student_ID, Student_Sports}
For the third Table: Sports_Type
This is now in BCNF as in both the functional dependencies left side part is a key.
So, with this, we move to FAQs.
FAQs
What is Normalisation? Normalisation is the process of structuring data in a database in order to eliminate data redundancy.
What is Primary Key? Each entry in a table is uniquely identified by the Primary key. This value cannot be duplicated within a table and cannot include null values.
How many types of Anomalies are there, and what is their name? There are three types of Anomalies. The three anomalies are as follows: Update Anomalies Insertion Anomalies Deletion Anomalies
What is BCNF also called? BCNF is also called 3.5NF. Since it is a more advanced variant of 3NF, it is sometimes known as 3.5NF. A table conforms with BCNF if it is in 3NF, and X is the Table's super key for every functional dependence X->Y.
What is Data redundancy? Data redundancy is a phenomenon that occurs inside a database or data storage system when the identical piece of data is stored in two different locations. This can refer to two distinct fields inside a single database or two distinct locations across numerous software environments or platforms.
Key Takeaways
In this article, we have extensively discussed Normalisation conversions. We started with the need for Normalisation, and then we learned about the types of anomalies. Well, after knowing about anomalies, we learnt about techniques of Normalisation. There are three types of Normalisation 1NF, 2NF, 3NF and BCNF, respectively. We also saw examples of them and how the tables are separated. In the last section, we learnt about the FAQs.
We hope that this blog has helped you enhance your knowledge regarding Normalisation conversions and if you would like to learn more, check out our articles on Top 100 SQL Problems and if you would like to learn more, check out our articles on Coding Ninjas Studio. Do upvote our blog to help other ninjas grow. Happy Coding!
Live masterclass
Become a YouTube Analyst: Use Python to analyze viewers data
by Coding Ninjas
04 Feb, 2025
02:30 PM
Get hired as an Amazon SDE : Resume building tips
by Coding Ninjas
03 Feb, 2025
02:30 PM
Expert tips: Ace Leadership roles in Fortune 500 companies
by Coding Ninjas
03 Feb, 2025
12:30 PM
Become a YouTube Analyst: Use Python to analyze viewers data