SQL Practice Problem

Find duplicate email addresses

Medium Aggregation

Schema and sample data

Users
user_idnameemail
1Alice Chenalice@example.com
2Bob Ncubebob@example.com
3Carol Diazalice@example.com
4Dan Petrovdan@example.com
5Eve Novakbob@example.com
6Frank Oseialice@example.com
7Grace Kimgrace@example.com
Show setup SQL (copy to run)
CREATE TABLE Users (
    user_id INT PRIMARY KEY,
    name    VARCHAR(50) NOT NULL,
    email   VARCHAR(100) NOT NULL
);

INSERT INTO Users (user_id, name, email) VALUES
(1, 'Alice Chen', 'alice@example.com'),
(2, 'Bob Ncube',  'bob@example.com'),
(3, 'Carol Diaz', 'alice@example.com'),
(4, 'Dan Petrov', 'dan@example.com'),
(5, 'Eve Novak',  'bob@example.com'),
(6, 'Frank Osei', 'alice@example.com'),
(7, 'Grace Kim',  'grace@example.com');
Your task

The Users table stores one row per account, but a data entry problem has let some people register more than once with the same email. From Users(user_id, name, email), return every email that appears more than once, together with the number of times it appears as num_accounts. Sort by num_accounts from highest to lowest, then by email alphabetically to break ties.

Expected result

Your query should return
emailnum_accounts
alice@example.com3
bob@example.com2
Show hint

Collapse the rows into one group per email with GROUP BY, then count how many rows landed in each group. To keep only the groups with more than one row, filter the groups themselves with HAVING rather than WHERE.

Try to write the query yourself before you open this.

Show solution and explanation
SELECT email, COUNT(*) AS num_accounts
FROM Users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY num_accounts DESC, email;

GROUP BY email folds every row that shares an email into a single group, and COUNT(*) reports how many rows fell into each one. Here alice@example.com collects three rows and bob@example.com collects two, while dan@example.com and grace@example.com each have a single row.

The key idea is where the filter goes. WHERE runs before the grouping, so it cannot see COUNT(*); you must use HAVING COUNT(*) > 1, which is applied after the groups are formed, to drop the emails that appear only once. That is why the two single row emails never reach the output. This GROUP BY plus HAVING pattern is the standard answer to how to find duplicate rows.

Finding the duplicates is only the first half of the job. To actually remove the extra rows and keep one account per email, you delete the higher user_id copies, usually with ROW_NUMBER() partitioned by email inside a CTE. Work through that in the harder delete duplicate rows exercise.

Keep practising

Work through more Medium exercises, or test yourself with the SQL interview questions.