Schema and sample data
| id | name | |
|---|---|---|
| 1 | Alice Wong | alice@example.com |
| 2 | Bob Diaz | bob@example.com |
| 3 | Alice Wong | alice@example.com |
| 4 | Cara Lee | cara@example.com |
| 5 | Bob Diaz | bob@example.com |
| 6 | Alice Wong | alice@example.com |
| 7 | Cara Lee | cara@example.com |
Show setup SQL (copy to run)
CREATE TABLE Contacts (
id INT PRIMARY KEY,
name VARCHAR(50) NOT NULL,
email VARCHAR(100) NOT NULL
);
INSERT INTO Contacts (id, name, email) VALUES
(1, 'Alice Wong', 'alice@example.com'),
(2, 'Bob Diaz', 'bob@example.com'),
(3, 'Alice Wong', 'alice@example.com'),
(4, 'Cara Lee', 'cara@example.com'),
(5, 'Bob Diaz', 'bob@example.com'),
(6, 'Alice Wong', 'alice@example.com'),
(7, 'Cara Lee', 'cara@example.com');
The Contacts table has accumulated duplicate rows: the same name and email pair was inserted several times, each time getting a new id. Write a statement that deletes the duplicates so that exactly one row survives for each distinct (name, email) pair. Keep the copy with the smallest id and remove the rest.
The result shown below is the contents of the table after the delete, ordered by id.
Expected result
| id | name | |
|---|---|---|
| 1 | Alice Wong | alice@example.com |
| 2 | Bob Diaz | bob@example.com |
| 4 | Cara Lee | cara@example.com |
Show hint
You cannot tell two identical rows apart with a plain WHERE, so number the copies instead. Assign a running number within each (name, email) group with ROW_NUMBER() OVER (PARTITION BY name, email ORDER BY id). The row you want to keep gets number 1; everything with a number above 1 is a duplicate you can delete.
Try to write the query yourself before you open this.
Show solution and explanation
BEGIN TRANSACTION;
WITH ranked AS (
SELECT
id,
ROW_NUMBER() OVER (
PARTITION BY name, email
ORDER BY id
) AS rn
FROM Contacts
)
DELETE FROM ranked
WHERE rn > 1;
COMMIT TRANSACTION;
-- Inspect the survivors
SELECT id, name, email
FROM Contacts
ORDER BY id;
The core idea is to give every row a position within its own duplicate group. PARTITION BY name, email starts the numbering over for each distinct pair, and ORDER BY id decides which copy is number 1. Because ids are ascending, the smallest id in each group always receives rn = 1 and every later copy gets 2, 3, and so on. See ROW_NUMBER() for how the ranking is built.
In SQL Server you can DELETE directly from the common table expression: DELETE FROM ranked WHERE rn > 1 removes exactly the rows the CTE tagged as extras, and the change flows back to the base Contacts table. With the sample data, ids 3 and 6 (Alice), 5 (Bob), and 7 (Cara) are deleted, leaving ids 1, 2, and 4.
Choosing which copy to keep is a real decision, not just a detail. Here we keep the smallest id (the oldest record); flip to ORDER BY id DESC to keep the newest, or order by any other column if that is your tie breaker. Whatever you pick, wrap the operation in a transaction as shown so you can run the SELECT first, confirm the row count, and ROLLBACK instead of COMMIT if the numbers look wrong. A delete without a defined tie breaker or a safety net is the classic way to lose the wrong rows.
Before deleting, it is worth confirming the duplicates exist the way you expect - see the scenario walkthrough for the matching "find the duplicates" query. Once the table is clean, a UNIQUE constraint on (name, email) stops the problem from coming back.