Schema and sample data
| user_id | name | |
|---|---|---|
| 1 | Alice Chen | alice@example.com |
| 2 | Bob Ncube | bob@example.com |
| 3 | Carol Diaz | alice@example.com |
| 4 | Dan Petrov | dan@example.com |
| 5 | Eve Novak | bob@example.com |
| 6 | Frank Osei | alice@example.com |
| 7 | Grace Kim | grace@example.com |
Show setup SQL (copy to run)
CREATE TABLE Users (
user_id INT PRIMARY KEY,
name VARCHAR(50) NOT NULL,
email VARCHAR(100) NOT NULL
);
INSERT INTO Users (user_id, name, email) VALUES
(1, 'Alice Chen', 'alice@example.com'),
(2, 'Bob Ncube', 'bob@example.com'),
(3, 'Carol Diaz', 'alice@example.com'),
(4, 'Dan Petrov', 'dan@example.com'),
(5, 'Eve Novak', 'bob@example.com'),
(6, 'Frank Osei', 'alice@example.com'),
(7, 'Grace Kim', 'grace@example.com');
The Users table stores one row per account, but a data entry problem has let some people register more than once with the same email. From Users(user_id, name, email), return every email that appears more than once, together with the number of times it appears as num_accounts. Sort by num_accounts from highest to lowest, then by email alphabetically to break ties.
Expected result
| num_accounts | |
|---|---|
| alice@example.com | 3 |
| bob@example.com | 2 |
Show hint
Collapse the rows into one group per email with GROUP BY, then count how many rows landed in each group. To keep only the groups with more than one row, filter the groups themselves with HAVING rather than WHERE.
Try to write the query yourself before you open this.
Show solution and explanation
SELECT email, COUNT(*) AS num_accounts
FROM Users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY num_accounts DESC, email;
GROUP BY email folds every row that shares an email into a single group, and COUNT(*) reports how many rows fell into each one. Here alice@example.com collects three rows and bob@example.com collects two, while dan@example.com and grace@example.com each have a single row.
The key idea is where the filter goes. WHERE runs before the grouping, so it cannot see COUNT(*); you must use HAVING COUNT(*) > 1, which is applied after the groups are formed, to drop the emails that appear only once. That is why the two single row emails never reach the output. This GROUP BY plus HAVING pattern is the standard answer to how to find duplicate rows.
Finding the duplicates is only the first half of the job. To actually remove the extra rows and keep one account per email, you delete the higher user_id copies, usually with ROW_NUMBER() partitioned by email inside a CTE. Work through that in the harder delete duplicate rows exercise.