It is a common pattern to prefer non-semantic primary keys — auto-incremented integers or UUIDs — over semantic ones. It’s advisable to define unique indexes on the semantic identity fields, but sometimes you avoid it for performance reasons or you just forget.
When things screw up, you end up with multiple records relating to the same entity or relation. How do you clean things up?
Let’s take the simple case of a many-to-many relation, implemented as a linking table:
CREATE TABLE First ( First_ID INT IDENTITY(1,1), -- more fields PRIMARY KEY (First_ID) ); CREATE TABLE Second ( Second_ID INT IDENTITY(1,1), -- more fields PRIMARY KEY (Second_ID) ); CREATE TABLE First_Second( First_Second_ID INT IDENTITY(1,1), First_ID INT FOREIGN KEY REFERENCES First, Second_ID INT FOREIGN KEY REFERENCES Second );
If we end up with duplicate First_Second records, we can get rid of them using this code:
WITH records(First_Second_ID, RecNo) AS
SELECT
First_Second_ID,
ROW_NUMBER() OVER( PARTITION BY First, Second ORDER BY First, Second) AS RecNo
FROM First_Second
DELETE FROM records WHERE RecNo > 1;
Doing the same without CTE is possible, but less clean and straightforward. It involves temporary tables, cursors or nested subqueries.
I learned this trick with Microsoft SQL Server. Postgresql, DB2, Oracle and a bunch of other databases support CTE as well.