Schema and sample data
| value |
|---|
| 1 |
| 2 |
| 3 |
| 5 |
| 8 |
| 9 |
| 10 |
| 11 |
| 15 |
| 16 |
Show setup SQL (copy to run)
CREATE TABLE Readings (
value INT PRIMARY KEY
);
INSERT INTO Readings (value) VALUES
(1), (2), (3), (5), (8), (9), (10), (11), (15), (16);
The Readings table holds a set of integer values in the value column. Some values run consecutively (for example 8, 9, 10, 11) and some are separated by gaps. Collapse every run of consecutive integers into a single island and return two columns, island_start and island_end, giving the first and last value of each run. Sort the result by island_start ascending.
Expected result
| island_start | island_end |
|---|---|
| 1 | 3 |
| 5 | 5 |
| 8 | 11 |
| 15 | 16 |
Show hint
Number the rows in value order with ROW_NUMBER(). Inside a single consecutive run the value climbs by 1 each step and so does the row number, so the difference value - row_number stays constant. That constant is a grouping key: GROUP BY it and take MIN and MAX.
Try to write the query yourself before you open this.
Show solution and explanation
SELECT
MIN(value) AS island_start,
MAX(value) AS island_end
FROM (
SELECT
value,
value - ROW_NUMBER() OVER (ORDER BY value) AS grp
FROM Readings
) AS numbered
GROUP BY grp
ORDER BY island_start;
This is the classic "gaps and islands" problem, and the neat trick behind it is the ROW_NUMBER() difference. Order the rows by value and hand each one a sequential number 1, 2, 3, and so on. Now look at what happens inside a consecutive run: the value goes up by exactly 1 at every step, and the row number also goes up by exactly 1. Because both climb in lockstep, their difference value - row_number never changes for the length of that run.
The moment a gap appears, the value jumps ahead faster than the row number, so the difference increases and a new constant begins. That means every island gets its own distinct value of grp. Working through the sample data, ordered by value: 1,2,3 give grp 0; 5 gives grp 1; 8,9,10,11 all give grp 3; and 15,16 both give grp 6. Each run shares one key.
Once each run has a shared key, the rest is ordinary aggregation: GROUP BY grp collapses each run into a single row, and MIN(value) and MAX(value) report where the run starts and ends. Note that grp itself is meaningless as an output value - it is only a device to bucket consecutive rows together, so we never select it.
A common mistake is to reach for LAG() to compare each value with the previous one and flag gaps. That works too, but it needs an extra pass to turn the flags into group numbers (a running SUM over the flags). The row-number difference does the same job in one window function with no self join. If the values could repeat you would switch to DENSE_RANK() so duplicates share a number. For more scenario style variations, see the gaps and islands interview walkthrough.