Find duplicate email addresses

Question

The Users table stores one row per account, but a data entry problem has let some people register more than once with the same email. From Users(user_id, name, email), return every email that appears more than once, together with the number of times it appears as num_accounts. Sort by num_accounts from highest to lowest, then by email alphabetically to break ties.

Accepted Answer

SELECT email, COUNT(*) AS num_accounts
FROM Users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY num_accounts DESC, email;

GROUP BY email folds every row that shares an email into a single group, and COUNT(*) reports how many rows fell into each one. Here alice@example.com collects three rows and bob@example.com collects two, while dan@example.com and grace@example.com each have a single row. The key idea is where the filter goes. WHERE runs before the grouping, so it cannot see COUNT(*); you must use HAVING COUNT(*) > 1, which is applied after the groups are formed, to drop the emails that appear only once. That is why the two single row emails never reach the output. This GROUP BY plus HAVING pattern is the standard answer to how to find duplicate rows. Finding the duplicates is only the first half of the job. To actually remove the extra rows and keep one account per email, you delete the higher user_id copies, usually with ROW_NUMBER() partitioned by email inside a CTE. Work through that in the harder delete duplicate rows exercise.

Find duplicate email addresses

Schema and sample data

Expected result

Keep practising

user_id	name	email
1	Alice Chen	alice@example.com
2	Bob Ncube	bob@example.com
3	Carol Diaz	alice@example.com
4	Dan Petrov	dan@example.com
5	Eve Novak	bob@example.com
6	Frank Osei	alice@example.com
7	Grace Kim	grace@example.com