Schema and sample data
| user_id | login_date |
|---|---|
| 101 | 2024-06-01 |
| 101 | 2024-06-02 |
| 101 | 2024-06-03 |
| 101 | 2024-06-04 |
| 102 | 2024-06-01 |
| 102 | 2024-06-01 |
| 102 | 2024-06-02 |
| 102 | 2024-06-05 |
Show setup SQL (copy to run)
CREATE TABLE Logins (
user_id INT NOT NULL,
login_date DATE NOT NULL
);
INSERT INTO Logins (user_id, login_date) VALUES
(101, '2024-06-01'),
(101, '2024-06-02'),
(101, '2024-06-03'),
(101, '2024-06-04'),
(102, '2024-06-01'),
(102, '2024-06-01'),
(102, '2024-06-02'),
(102, '2024-06-05');
The Logins table records one row every time a user logs in, with the login_date they logged in on. A single user can log in more than once on the same day, so the same date may appear twice.
For each user_id, return the length of their longest run of consecutive calendar days with at least one login. A run is a set of dates where each day is exactly one day after the previous one; a single isolated day counts as a streak of length 1. Return two columns, user_id and longest_streak, one row per user, sorted by user_id ascending.
Expected result
| user_id | longest_streak |
|---|---|
| 101 | 4 |
| 102 | 2 |
Show hint
First collapse the duplicate logins so each user has one row per date. Then number those dates per user with ROW_NUMBER() ordered by date. On any run of consecutive days, the date climbs by one and the row number climbs by one too, so date minus row_number stays constant. That constant is a group key you can GROUP BY and COUNT.
Try to write the query yourself before you open this.
Show solution and explanation
WITH distinct_logins AS (
SELECT DISTINCT user_id, login_date
FROM Logins
),
streak_groups AS (
SELECT
user_id,
login_date,
DATEADD(
day,
-ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date),
login_date
) AS streak_key
FROM distinct_logins
),
streak_lengths AS (
SELECT
user_id,
streak_key,
COUNT(*) AS streak_length
FROM streak_groups
GROUP BY user_id, streak_key
)
SELECT
user_id,
MAX(streak_length) AS longest_streak
FROM streak_lengths
GROUP BY user_id
ORDER BY user_id;
This is the classic "gaps and islands" pattern. The trick is that within a run of consecutive dates, both the date and a per row sequence number increase by exactly 1 at each step. So if you subtract the sequence number (in days) from the date, every date in the same run maps to the exact same anchor value, while the first day of the next run jumps to a different anchor. That shared anchor becomes a group key for the streak.
The first CTE uses SELECT DISTINCT user_id, login_date so that a user who logged in twice on the same day is counted as one calendar day. Without this, user 102 would have two rows for 2024-06-01, which would throw off both the row numbering and the count. The second CTE builds the key with ROW_NUMBER(): DATEADD(day, -ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date), login_date).
Walk through user 102 (distinct dates 06-01, 06-02, 06-05). Row numbers are 1, 2, 3. The keys are 06-01 minus 1 = 05-31, 06-02 minus 2 = 05-31, and 06-05 minus 3 = 06-02. So 06-01 and 06-02 share the key 05-31 (a streak of 2), while 06-05 sits alone under 06-02 (a streak of 1). Grouping by (user_id, streak_key) and counting gives the length of each run, and the outer MAX per user picks the longest: 4 for user 101 (the full 06-01 to 06-04 run) and 2 for user 102. For a fuller walkthrough of the same idea, see find consecutive login days.
Common mistake: forgetting the DISTINCT step. If duplicate same day logins are left in, the row numbers no longer advance in lockstep with the calendar and the streak keys break, quietly inflating some counts. Always reduce to one row per user per day before applying the row number trick.