Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Sometimes, tables in databases contain duplicate records. To get rid of duplicate records in databases, we first need to identify them and remove them from the given table. It will also be great to prevent inserting duplicate records into our table. We will use different methods to deal with duplicates.
To find duplicates in a table, we need to use two following clauses.
The GROUP BY clause is used to group all rows of the target column. The target column is the column that we want to check for duplicate values.
The COUNT function in the HAVING clause checks if any groups have more than one entry. If they have, it means those have duplicate values.
Syntax
SELECT column1, coulmn2, … COUNT(*)
FROM table_name
GROUP BY column1,coulmn2, ….
HAVING COUNT(*) > 1;
Example
A few duplicates are present in the OrderID column in the given example. In an ideal situation, each row should have a unique value for OrderID since each order is assigned its value. But this thing is not implemented here. We can use the given query to find duplicates.
SELECT OrderID, COUNT(OrderID)
FROM Orders
GROUP BY OrderID
HAVING COUNT(OrderID)>1;
Output
The above SQL code prints all entries whose duplicates are present in the table.
Number of Records: 2
OrderID
COUNT(OrderID)
10251
2
10276
2
Prevent Duplicates
The UNIQUE constraint is used so that a column does not contain duplicates.
Both the PRIMARY KEYand UNIQUE constraints guarantee the uniqueness of a column or set of columns.
A PRIMARY KEY constraint intuitively has a UNIQUE constraint.
However, we can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table.
The distinct clause is used to eliminate duplicate records present inside a table.
The DISTINCT keyword will fetch unique records.
Syntax
SELECT DISTINCT expressions
FROM table
[WHERE conditions];
Example
In this example, we used a table called suppliers having following data:
We find all the unique states in the suppliers table using the SQL command below.
SELECT DISTINCT state
FROM suppliers
ORDER BY state ;
Output
This SQL command will return all unique state values from the suppliers table and eliminate duplicates. As we can see, the state of California only appears once in the result set instead of four times.
The DISTINCT keyword is used to fetch distinct records from a database table.
It is used to remove duplicates from the output of the SELECT statement.
Q2.What is the Difference Between UNIQUE and DISTINCT in SQL?
Distinct keyword in SQL is used with SELECT Statement to remove duplicates from the result of SELECT Statement. But The UNIQUE keyword in SQL is used to mark one of its database constraints. The UNIQUE constraint on a column can prevent duplicate values from being stored in this column or set.
Q3. Can we use the DISTINCT keyword with all aggregate functions in SQL?
With SUM(), AVG(), and COUNT(expression), DISTINCT eliminates duplicate values before the sum, average, or count is calculated. It is meaningless in the case of MAX() and MIN().
Key Takeaways
Duplicates in tables cause redundancy in databases. It requires extra memory to store these duplicate records. We first identify and then remove these duplicates. We used GROUP BY and COUNT clauses to identify duplicates and the DISTINCT clause to remove duplicates. It is also better to prevent these duplicate records from inserting into the database. We use the UNIQUE clause to prevent duplicates.
Visit here to learn more about different topics related to database and management systems. Ninjas, don’t stop here. Check out the Top 100 SQL Problems to master frequently asked questions in big companies and land your dream job. Also, try Coding Ninjas Studioto practice a wide range of DSA questions asked in many interviews.