Advanced questions probe depth: recursive CTEs, window frame clauses, isolation levels, indexing internals and query rewriting. These are the questions that distinguish senior candidates. Every answer pairs a diagram with concrete SQL.
A recursive CTE is a common table expression that refers to itself inside its own definition. It lets a single query loop over rows that point at other rows in the same table, which is something a plain JOIN cannot do when the depth is unknown ahead of time.
Every recursive CTE has the same two part structure joined by UNION ALL:
The classic use is walking a hierarchy stored as a self referencing table: an org chart where each employee row holds a manager_id, or a bill of materials where each part is assembled from sub parts. Because the number of levels is not known in advance, you cannot write a fixed set of joins; the recursion follows the chain to whatever depth it reaches. Recursive CTEs are also handy to generate sequences, such as a run of integers or a continuous date range for a calendar or report with no gaps.
In SQL Server the recursion depth is capped at 100 passes by default. If a legitimate hierarchy is deeper, or you build a longer series, raise the limit with the OPTION (MAXRECURSION n) query hint, where n is 0 to 32767. Setting MAXRECURSION 0 removes the cap entirely, so use it only when you are certain the query terminates. See the Microsoft Learn docs on CTEs for the full syntax.
WITH OrgChart AS (
-- Anchor member: employees with no manager (the top)
SELECT emp_id, name, manager_id, 0 AS level
FROM Employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive member: attach each report to its manager
SELECT e.emp_id, e.name, e.manager_id, oc.level + 1
FROM Employees AS e
INNER JOIN OrgChart AS oc
ON e.manager_id = oc.emp_id
)
SELECT emp_id, name, manager_id, level
FROM OrgChart
ORDER BY level, name;
The anchor picks the top manager; the recursive member follows manager_id down one level per pass until no rows report to the last set.
WITH Dates AS (
-- Anchor member: the first date
SELECT CAST('2026-01-01' AS date) AS d
UNION ALL
-- Recursive member: add one day, stop at the end date
SELECT DATEADD(DAY, 1, d)
FROM Dates
WHERE d < '2026-12-31'
)
SELECT d
FROM Dates
ORDER BY d
OPTION (MAXRECURSION 366);
The WHERE clause is the termination condition; MAXRECURSION 366 lifts the default 100 pass cap so a full year of dates fits.
Related: What is a common table expression (CTE)? · How do you model hierarchical data such as an org chart? · What is a self join and when would you use one?
A window frame is the slice of rows within the current partition that a window function actually reads to produce its value for the current row. You write it inside the OVER clause after ORDER BY, for example ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The frame slides as the window moves from row to row, which is what turns SUM into a running total instead of a single grand total.
A frame has two parts: a frame unit (ROWS or RANGE) and a set of bounds such as UNBOUNDED PRECEDING, CURRENT ROW, or N PRECEDING. The unit decides how those bounds are measured.
CURRENT ROW means exactly this one row, so the frame stops at the row you are on regardless of its value.CURRENT ROW under RANGE extends the frame to include all peers. That means tied rows are always pulled in together.The common surprise is the default frame. When you supply ORDER BY in the OVER clause but omit an explicit frame, the engine applies RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. If your ORDER BY column has duplicate values, a running total then jumps in blocks because every tied row receives the same cumulative sum. Switching to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW makes the total advance one row at a time.
Rule of thumb: use ROWS when you want a precise, row-by-row running total or moving window, and reach for RANGE only when you deliberately want tied rows treated as a single group. See the OVER clause documentation for the full frame grammar.
SELECT order_date, amount,
SUM(amount) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM Sales
ORDER BY order_date;
Two orders on the same date get different running totals, because ROWS counts each physical row separately.
SELECT order_date, amount,
SUM(amount) OVER (
ORDER BY order_date
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM Sales
ORDER BY order_date;
Omitting the frame gives this same RANGE behaviour: every row sharing a date is a peer, so tied dates all show the identical running total.
Related: How do you calculate a running total with a window function? · What is a window function and how is it different from GROUP BY? · How do the LAG and LEAD window functions work?
A running total (or cumulative sum) adds each row to the sum of all rows before it. The modern way is a windowed SUM() aggregate: SUM(amount) OVER (ORDER BY order_date ...). The OVER clause turns the aggregate into a window function, so it returns a value for every row instead of collapsing the result into one grand total.
Two parts of the OVER clause control the result. The ORDER BY sets the sequence in which rows accumulate, and the frame (ROWS BETWEEN ...) defines the window of rows that feed into each running value. For a classic running total the frame is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: everything from the first row up to and including the current one.
Always specify ROWS explicitly. If you write ORDER BY order_date with no frame, the SQL standard default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. RANGE groups by value, so any rows that tie on order_date all receive the same total (the sum through the last tied row) instead of a true row by row running total. Writing ROWS avoids that surprise and is usually faster.
To keep a separate running total per group - say per customer - add PARTITION BY customer_id. The sum then resets at each new partition, and the ORDER BY orders rows within it. This is the pattern interviewers are looking for when they ask for a per-customer or per-account cumulative balance. See the SQL window functions guide for a deeper walkthrough of frames and partitions.
Before window functions existed, people computed running totals with a correlated subquery such as (SELECT SUM(x) FROM T t2 WHERE t2.d <= t1.d). That re-scans the table once per row (O(n^2) work) and is awkward to read. The windowed SUM does it in a single ordered pass, so it is both clearer and dramatically faster on large tables.
SELECT
order_date,
amount,
SUM(amount) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM Orders
ORDER BY order_date;
Each row shows the cumulative sum of amount from the first row through the current one. The explicit ROWS frame avoids the RANGE tie default.
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS customer_running_total
FROM Orders
ORDER BY customer_id, order_date;
PARTITION BY customer_id restarts the total for each customer, so you get an independent cumulative balance per account.
Related: What is a window frame, and what is the difference between ROWS and RANGE? · Write a query to produce a running total of sales. · What is a window function and how is it different from GROUP BY?
The PIVOT operator rotates the distinct values of one column into a set of output columns, applying an aggregate to a value column for each new column. You take a tall, narrow result (one row per category) and turn it into a wide, short one (one column per category). It is the classic way to build a cross tab or report grid, for example turning a list of monthly sales into one column per month.
Every PIVOT has three moving parts: the aggregate and the value column it runs on (such as SUM(amount)), the spreading column whose distinct values become new column names (listed in the FOR ... IN (...) clause), and the grouping columns, which are simply every other column you keep in the source query. SQL Server groups by those remaining columns implicitly, so you control the grouping by choosing exactly which columns feed the PIVOT.
UNPIVOT does the reverse: it takes several columns and folds them into two columns, one holding the former column name and one holding its value. It is useful for normalizing a wide spreadsheet style table back into rows so you can filter, join, or aggregate over it. Note that UNPIVOT drops NULLs, so it is not a perfect inverse of PIVOT if the pivoted grid had empty cells.
The portable alternative to PIVOT is conditional aggregation: SUM(CASE WHEN col = value THEN metric END) as one expression per output column, wrapped in a normal GROUP BY. This works on every database, reads clearly, and lets you mix several aggregates in one pass. Many engineers prefer it to the PIVOT syntax for exactly those reasons. See the CASE expression and SUM for the building blocks.
The big limitation is that both PIVOT and CASE require you to list the target column values in advance. When those values are not known until run time (for example an unknown set of months or product codes), you need dynamic PIVOT: query the distinct values, build the column list into a string, and execute it with dynamic SQL via sp_executesql.
SELECT product, [Q1], [Q2], [Q3], [Q4]
FROM (
SELECT product, quarter, amount
FROM Sales
) AS src
PIVOT (
SUM(amount)
FOR quarter IN ([Q1], [Q2], [Q3], [Q4])
) AS pvt
ORDER BY product;
The distinct quarter values become columns; SUM(amount) fills each cell. product is the implicit grouping column because it is the only other column in src.
SELECT product,
SUM(CASE WHEN quarter = 'Q1' THEN amount END) AS [Q1],
SUM(CASE WHEN quarter = 'Q2' THEN amount END) AS [Q2],
SUM(CASE WHEN quarter = 'Q3' THEN amount END) AS [Q3],
SUM(CASE WHEN quarter = 'Q4' THEN amount END) AS [Q4]
FROM Sales
GROUP BY product
ORDER BY product;
Same result as the PIVOT, but runs on any database. Each CASE selects the rows for one quarter and SUM ignores the NULLs the CASE returns for other quarters.
Related: How do you turn rows into columns (pivot monthly sales)? · What is the CASE expression and how do you use it? · What is dynamic SQL and when should you use it?
An isolation level controls how much one transaction is affected by the concurrent work of other transactions. It is the I in ACID. The SQL standard defines four levels, and each one is defined by which of three read anomalies it permits.
The three anomalies, from least to most subtle, are:
The four standard levels build on each other:
NOLOCK hint is the same thing applied to one table.SQL Server also offers row versioning so readers do not block writers. SNAPSHOT isolation gives each transaction a transaction-consistent view from the version store in tempdb, avoiding all three anomalies without holding read locks. READ_COMMITTED_SNAPSHOT is a database option that makes the ordinary READ COMMITTED level use row versioning at the statement level, which removes most reader/writer blocking with almost no code change.
The core trade-off is consistency versus concurrency: stricter levels give you more correctness guarantees but hold locks longer, which lowers throughput and raises the chance of blocking and deadlocks (error 1205). See transactions, locks and deadlocks for how the locking behaves in practice. Microsoft documents the details in the SET TRANSACTION ISOLATION LEVEL reference.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION;
SELECT balance FROM Accounts WHERE account_id = 1;
-- other work here; the row read above cannot change
SELECT balance FROM Accounts WHERE account_id = 1;
COMMIT TRANSACTION;
REPEATABLE READ holds a shared lock on the read row until commit, so the second SELECT returns the same value.
ALTER DATABASE Sales SET ALLOW_SNAPSHOT_ISOLATION ON;
ALTER DATABASE Sales SET READ_COMMITTED_SNAPSHOT ON;
-- opt a single transaction into snapshot isolation
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;
SELECT * FROM Orders WHERE status = 'open';
COMMIT TRANSACTION;
With versioning enabled readers get a consistent view from tempdb instead of blocking writers.
Related: What is a transaction and what does ACID mean? · What causes a deadlock and how do you prevent one? · What is the difference between optimistic and pessimistic concurrency?
A deadlock occurs when two (or more) transactions each hold a lock that the other one needs, so they end up waiting on each other in a cycle that can never resolve on its own. Session A holds a lock on row 1 and wants row 2; session B holds a lock on row 2 and wants row 1. Neither will release what it has until it gets the thing the other is holding, so both would wait forever.
SQL Server runs a background deadlock monitor (the lock monitor thread) that periodically looks for these cycles in the wait-for graph. When it finds one it breaks the tie by choosing a deadlock victim, rolls that transaction back, releases its locks so the other can continue, and returns error 1205 to the losing session. By default the engine picks the victim with the least expensive transaction to roll back, but you can influence the choice with SET DEADLOCK_PRIORITY.
The key thing to say in an interview is that a deadlock is different from ordinary blocking. Blocking is one session simply waiting for another to release a lock, and it resolves as soon as the first transaction commits. A deadlock is a circular wait that cannot resolve itself, which is why the engine has to intervene and kill one side. For the wider picture of how locks and blocking lead up to this, see transactions, locks and deadlocks.
You prevent deadlocks by reducing either the chance of a cycle or the size of the lock footprint:
READ_COMMITTED_SNAPSHOT (RCSI) lets readers use row versions instead of shared locks, so reads no longer block writers and vice versa, removing a large class of deadlocks.To diagnose one you need the deadlock graph, which shows both processes, the resources they held and requested, and which one was chosen as the victim. The easiest way to capture it is Extended Events: the system_health session already records the xml_deadlock_report event, so recent deadlock graphs are usually there without any setup. You can also add a Trace Flag (1222) to log the graph to the error log, or build a dedicated Extended Events session.
-- Session A
BEGIN TRAN;
UPDATE Accounts SET balance = balance - 100 WHERE id = 1; -- locks row 1
-- (small delay, then it asks for row 2)
UPDATE Accounts SET balance = balance + 100 WHERE id = 2; -- waits for row 2
COMMIT;
-- Session B (running at the same time)
BEGIN TRAN;
UPDATE Accounts SET balance = balance - 50 WHERE id = 2; -- locks row 2
-- (small delay, then it asks for row 1)
UPDATE Accounts SET balance = balance + 50 WHERE id = 1; -- waits for row 1
COMMIT;
Each session holds one row and asks for the other, forming a cycle. SQL Server rolls one back with error 1205. Having both update id = 1 before id = 2 would avoid it.
DECLARE @attempt int = 0, @maxRetries int = 3;
WHILE @attempt < @maxRetries
BEGIN
BEGIN TRY
BEGIN TRAN;
UPDATE Accounts SET balance = balance - 100 WHERE id = 1;
UPDATE Accounts SET balance = balance + 100 WHERE id = 2;
COMMIT TRAN;
BREAK; -- success, leave the loop
END TRY
BEGIN CATCH
IF XACT_STATE() <> 0 ROLLBACK TRAN;
IF ERROR_NUMBER() = 1205 AND @attempt < @maxRetries - 1
BEGIN
SET @attempt = @attempt + 1;
WAITFOR DELAY '00:00:00.200'; -- brief backoff, then retry
END
ELSE
THROW; -- not a deadlock, or out of retries: re-raise
END CATCH
END
Because the victim is rolled back cleanly, error 1205 is safe to retry. The loop backs off briefly and re-runs, and re-raises anything that is not a deadlock.
Related: What are the transaction isolation levels and what anomalies do they prevent? · How do you reduce locking and blocking under load? · What is a transaction and what does ACID mean?
A covering index is a nonclustered index that holds all of the columns a particular query references - those used for filtering, joining, ordering and in the SELECT list. Because every needed value is already in the index, the query is satisfied by reading the index alone and never has to go back to the base table. In an execution plan this shows up as an Index Seek with no accompanying Key Lookup (or RID Lookup on a heap).
Without covering, a nonclustered index stores only its key columns plus a pointer (the clustered key or a row id). If the query asks for extra columns, the engine must follow that pointer for each qualifying row to fetch the missing values. That extra step is the key lookup, and when many rows match it can dominate the cost or push the optimizer to ignore the index and scan the whole table instead.
The INCLUDE clause is how you cover a query cheaply. Included columns are stored only at the leaf level of the index, not in the intermediate (b-tree) levels, and they are not part of the key. That means they do not affect the sort order of the index and are not used for seeking - they are simply carried along so a lookup is unnecessary. This lets you keep a narrow, selective key while still covering the columns the query returns.
The trade-off is width. Every column you add makes the index larger on disk and in memory, and every INSERT, UPDATE or DELETE that touches those columns must also maintain the index. So cover the queries that matter for performance tuning, not every column of the table.
CREATE NONCLUSTERED INDEX IX_Orders_Customer_Date
ON Orders (customer_id, order_date)
INCLUDE (total, status);
customer_id and order_date are the seekable key; total and status ride along at the leaf so the query does not need a key lookup.
SELECT order_date, total, status
FROM Orders
WHERE customer_id = 42
ORDER BY order_date;
Every referenced column lives in the index, so the plan is a single Index Seek with no Key Lookup back to the table.
Related: What is the difference between a clustered and a non-clustered index? · What is a covering index? · Why does the column order in a composite index matter?
Sargable is a contraction of Search ARGument ABLE. A predicate is sargable when the query engine can use it as a search argument against an index, meaning it can perform an index seek that jumps straight to the qualifying rows instead of reading the whole table. Whether a WHERE (or JOIN or ON) predicate is sargable is one of the biggest levers you have over query cost, which is why it comes up in performance tuning interviews.
The rule of thumb: the indexed column must appear bare on one side of the comparison. The moment you wrap the column in a function or force a conversion, the engine can no longer use the index order, because the index stores the raw column values, not the computed result. Common ways to accidentally break sargability:
YEAR(order_date) = 2024 or LEFT(name, 3) = 'abc'. The index is on order_date, not on YEAR(order_date).LIKE, such as name LIKE '%son'. A trailing wildcard like 'son%' is still sargable because the prefix is fixed, but a leading % forces a scan.NVARCHAR column to a numeric literal, or an NVARCHAR parameter against a VARCHAR column. SQL Server converts the column to match the literal and the seek is lost. See error 245 for the conversion failures this same mismatch can throw.salary * 12 > 100000 instead of salary > 100000 / 12.The fix is almost always to rewrite the predicate so the column stands alone and the transformation moves to the literal side. The classic example is turning a function over a date column into a half-open range: instead of YEAR(order_date) = 2024 you write order_date >= '2024-01-01' AND order_date < '2025-01-01'. This returns the same rows but lets the optimizer seek the range in the index.
Why it matters: a non-sargable predicate typically produces a clustered or table scan, so cost grows with total table size and stays high even when only a handful of rows match. A sargable rewrite turns that into a seek whose cost tracks the number of matching rows. On a large table that is the difference between milliseconds and seconds. Note that sargability only helps if a suitable index exists in the first place; making a predicate sargable does not create the index for you.
-- YEAR() wraps the indexed column, so the index on order_date cannot be seeked
SELECT order_id, customer_id, total
FROM Orders
WHERE YEAR(order_date) = 2024;
The engine must compute YEAR() for every row, producing a full scan regardless of how many rows fall in 2024.
-- The column is bare on the left, so an index on order_date can seek the range
SELECT order_id, customer_id, total
FROM Orders
WHERE order_date >= '2024-01-01'
AND order_date < '2025-01-01';
Same result set as the YEAR() version, but the optimizer seeks directly to the 2024 rows. The < upper bound avoids missing time components on Dec 31.
Related: What is the difference between an index seek and an index scan? · How do you troubleshoot a slow query? · What is a sargable query and how do you make one?
Both a temporary table and a table variable store an intermediate result set, and both physically live in tempdb. The difference that matters in an interview is how the optimizer treats them and how they behave inside a transaction, because that is what drives the plans you get for larger workloads.
A temp table is created with CREATE TABLE #t (or SELECT ... INTO #t). SQL Server maintains column statistics on it, so the optimizer can estimate row counts accurately. You can add clustered and nonclustered indexes, and it fully participates in transactions, which means a ROLLBACK undoes changes to it. That makes temp tables the right choice when the set is large enough that a bad row estimate would produce a bad plan.
A table variable is declared with DECLARE @t TABLE (...). It has no column statistics, so historically the optimizer estimated it as 1 row regardless of contents, which is fine for a handful of rows but disastrous for thousands. On SQL Server 2019 and later, table variable deferred compilation lets the optimizer see the real row count when the statement first compiles, which greatly improves estimates, though there are still no per-column histograms.
ROLLBACK; a table variable is not, so its rows survive a rolled-back transaction.Rule of thumb: reach for a table variable for small sets (roughly a few hundred rows), inside functions, or when you want to avoid recompiles; reach for a temp table for larger sets, when you need indexes or statistics for a good plan, or when you want the intermediate data to honor transaction rollback. See also temp tables in stored procedures and performance tuning.
CREATE TABLE #BigOrders (
order_id INT PRIMARY KEY,
customer_id INT,
total DECIMAL(10,2)
);
INSERT INTO #BigOrders (order_id, customer_id, total)
SELECT order_id, customer_id, total
FROM Orders
WHERE total > 1000;
-- Add an index after load; the optimizer keeps statistics on it.
CREATE NONCLUSTERED INDEX IX_BigOrders_Customer
ON #BigOrders (customer_id);
SELECT customer_id, COUNT(*) AS big_orders
FROM #BigOrders
GROUP BY customer_id;
DROP TABLE #BigOrders;
The temp table has statistics and a nonclustered index, so a large GROUP BY gets an accurate row estimate and a good plan.
DECLARE @TopCustomers TABLE (
customer_id INT PRIMARY KEY,
total_spent DECIMAL(12,2)
);
INSERT INTO @TopCustomers (customer_id, total_spent)
SELECT TOP (20) customer_id, SUM(total)
FROM Orders
GROUP BY customer_id
ORDER BY SUM(total) DESC;
SELECT * FROM @TopCustomers ORDER BY total_spent DESC;
A handful of rows, no need for statistics or extra indexes; @t is scoped to this batch and is not touched by a ROLLBACK.
Related: How do you use temp tables inside a stored procedure? · What is a common table expression (CTE)? · What are statistics and how does cardinality estimation affect plans?
An upsert means "update the row if it already exists, otherwise insert it." The MERGE statement does this in a single pass: it joins a source (a table, view, subquery or table variable) to a target table on a key you specify, classifies every row as matched or not matched, and then fires one action per row. This lets you sync a table from a feed of changes without writing three separate statements.
The behavior is driven by up to three WHEN clauses:
UPDATE the target row (or DELETE it).INSERT it. This is the "new row" case.DELETE it (or update it, for example to flag it inactive). Use this one carefully: with an unfiltered source it can wipe rows you did not intend to touch.MERGE looks elegant, but it carries real baggage. Over the years it has shipped with a long list of correctness and optimizer bugs, and Microsoft engineers like Aaron Bertrand have publicly cautioned against using it for anything important. Common traps include: it does not protect you from concurrent inserts by default, so two sessions can both decide a row is "not matched" and race to insert it (a primary key violation or duplicate); triggers, foreign keys and filtered indexes have historically interacted with it in surprising ways; and a single mistyped ON clause can update or delete the wrong rows. Because of this, many teams prefer an explicit UPDATE followed by an INSERT, which is easier to read, reason about and tune. See wrapping DML in a transaction for how to make the multi-statement version atomic.
For concurrency, the fix is to hold a range lock on the key while you decide. Adding the HOLDLOCK (equivalent to SERIALIZABLE) hint to the target - MERGE dbo.Target WITH (HOLDLOCK) - forces the statement to lock the key range it probed, so a competing session must wait instead of inserting a duplicate. The same idea applies to the UPDATE-then-INSERT pattern: take an UPDLOCK, HOLDLOCK when you test for existence, and run the whole thing inside a transaction so it either fully commits or rolls back. See the MERGE documentation on Microsoft Learn for the full syntax.
MERGE dbo.Product AS tgt
USING dbo.ProductStaging AS src
ON tgt.product_id = src.product_id
WHEN MATCHED AND (tgt.price <> src.price OR tgt.name <> src.name) THEN
UPDATE SET tgt.name = src.name,
tgt.price = src.price
WHEN NOT MATCHED BY TARGET THEN
INSERT (product_id, name, price)
VALUES (src.product_id, src.name, src.price)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
One statement inserts new products, updates changed ones and deletes rows no longer in staging. The AND condition on WHEN MATCHED skips no-op updates.
BEGIN TRAN;
UPDATE tgt
SET tgt.name = src.name,
tgt.price = src.price
FROM dbo.Product AS tgt WITH (UPDLOCK, HOLDLOCK)
JOIN dbo.ProductStaging AS src
ON tgt.product_id = src.product_id;
INSERT INTO dbo.Product (product_id, name, price)
SELECT src.product_id, src.name, src.price
FROM dbo.ProductStaging AS src
WHERE NOT EXISTS (
SELECT 1 FROM dbo.Product AS tgt
WHERE tgt.product_id = src.product_id
);
COMMIT;
UPDLOCK, HOLDLOCK takes a range lock so a concurrent session cannot insert the same key between the UPDATE and the INSERT. The transaction makes both steps commit or roll back together.
Related: How do you use a transaction inside a stored procedure? · What is a transaction and what does ACID mean? · How do you delete duplicate rows but keep one copy?
LAG and LEAD are window functions that let a row read a value from another row in the same result set without a self join. LAG looks backward and returns a column value from a previous row, while LEAD looks forward and returns it from a following row.
Both are controlled by an OVER clause. The ORDER BY inside it defines what "previous" and "next" mean, so the ordering is required and matters. An optional PARTITION BY splits the data into groups and resets the sequence at each group boundary, so LAG on the first row of a partition has nothing to look back to.
The full signature is LAG(expr, offset, default). The offset is how many rows to step (default 1), and the default is the value returned when the step falls outside the partition (default NULL). Supplying a default is a clean way to avoid NULLs on the first row for LAG or the last row for LEAD.
See the function references for exact syntax and edge cases: LAG and LEAD. Both are ordered analytic functions and cannot appear in a WHERE clause directly, so wrap the query in a CTE or subquery when you need to filter on their result.
SELECT
sales_date,
amount,
LAG(amount, 1, 0) OVER (ORDER BY sales_date) AS prev_day,
amount - LAG(amount, 1, 0) OVER (ORDER BY sales_date) AS day_change
FROM DailySales
ORDER BY sales_date;
LAG pulls the previous day amount; subtracting it gives the day-over-day difference. The default 0 avoids a NULL on the first day.
SELECT
customer_id,
order_date,
LEAD(order_date) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS next_order_date
FROM Orders
ORDER BY customer_id, order_date;
PARTITION BY restarts the sequence per customer, so LEAD returns each customer next order date, NULL on their most recent one.
Related: What is a window function and how is it different from GROUP BY? · Write a query for month over month growth. · How do you calculate a running total with a window function?
CROSS APPLY is a T-SQL operator that runs a table-valued expression (a subquery, a derived table, or a table-valued function) once for every row of the left input, and then joins the left row to whatever rows that expression produced. The key difference from a JOIN is correlation: the right side of APPLY can reference columns from the current left row, so it is re-evaluated per row.
A regular JOIN cannot do this. In a join, both inputs are logically produced on their own and then matched through the ON predicate; the right table does not get to see a specific left row while it is being built. That is exactly why you reach for APPLY when the right side is a function that takes the left row as an argument, or a subquery whose TOP or ORDER BY must depend on the left row.
APPLY comes in two forms, and the analogy to joins makes them easy to remember:
The two classic uses are top-N-per-group (for each parent row, grab its most recent or highest few child rows with a correlated TOP ... ORDER BY) and calling a table-valued function per row so it can be parameterized by that row. See the FROM clause documentation for the full syntax.
SELECT c.name, o.order_id, o.order_date, o.total
FROM Customers AS c
CROSS APPLY (
SELECT TOP (1) o.order_id, o.order_date, o.total
FROM Orders AS o
WHERE o.customer_id = c.customer_id
ORDER BY o.order_date DESC
) AS o;
The inner TOP (1) is correlated to c.customer_id and re-runs per customer. Customers with no orders are dropped, just like an INNER JOIN.
SELECT c.name, o.order_id, o.total
FROM Customers AS c
OUTER APPLY (
SELECT TOP (1) o.order_id, o.total
FROM Orders AS o
WHERE o.customer_id = c.customer_id
ORDER BY o.total DESC
) AS o;
Same correlated subquery, but a customer with zero orders still appears with NULLs, exactly like a LEFT JOIN.
Related: How do you get the top N rows per group (for example top 3 per category)? · What is a SQL JOIN and what are the main types? · What is the difference between a scalar and a table valued function?
An execution plan is the set of physical steps the query optimizer chose to run your statement. In SQL Server Management Studio you read it right to left and top to bottom: the operators on the far right touch the raw tables, and data flows leftward into joins, sorts and aggregates until it reaches the final SELECT. Always capture the actual plan (Include Actual Execution Plan, or SET STATISTICS PROFILE ON) rather than the estimated plan, because only the actual plan carries the real row counts.
The operators are the vocabulary you need to recognize:
INCLUDE columns.Two visual cues matter most. First, the thickness of the arrows between operators is proportional to the number of rows flowing along them, so a fat arrow feeding a sort or a loop join is where your cost lives. Second, hover any operator and compare Estimated Number of Rows with Actual Number of Rows: a large gap means the optimizer guessed wrong, usually because of stale or missing statistics, and that bad estimate cascades into poor operator and join choices. See statistics and cardinality for why that estimate drives everything.
A practical reading checklist: look for scans on big tables, repeated key lookups, expensive sorts or spools, and any estimate versus actual mismatch. Back the plan up with runtime numbers using SET STATISTICS IO ON (logical reads per table) and SET STATISTICS TIME ON (CPU and elapsed time). Fewer logical reads after a change is the clearest proof it helped. For the full workflow of turning these findings into fixes, see performance tuning.
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT o.order_id, o.total
FROM Orders AS o
WHERE o.customer_id = 4821;
-- Read the Messages tab: 'logical reads' is pages touched.
-- A scan reports reads near the table's page count; a seek reports only a handful.
Logical reads and CPU time turn an abstract plan into concrete numbers you can compare before and after a change.
-- Before: no useful index, so the plan shows a Table/Clustered Index Scan.
SELECT order_id, total
FROM Orders
WHERE customer_id = 4821;
-- Add an index that covers the predicate and the output columns.
CREATE NONCLUSTERED INDEX IX_Orders_customer
ON Orders (customer_id)
INCLUDE (total);
-- After: the same query now shows an Index Seek with no Key Lookup,
-- and STATISTICS IO reports far fewer logical reads.
INCLUDE (total) makes the index covering, so the seek returns everything the SELECT needs without a Key Lookup back to the clustered index.
Related: What is the difference between an index seek and an index scan? · How do you troubleshoot a slow query? · What are statistics and how does cardinality estimation affect plans?
A composite index (also called a multi-column or compound index) is built on two or more columns in a fixed order. The B-tree is sorted by the first key column, then by the second column within each value of the first, then by the third within that, and so on. It behaves exactly like a phone book sorted by last name and then first name: the ordering is only useful if you know the leading part of the key.
Because of that layout, the optimizer can perform an efficient index seek only on a leftmost prefix of the key columns. An index on (a, b) can seek on a, or on a and b together, but it cannot seek on b alone. A predicate on b by itself has to fall back to a full scan of the index or table, because rows matching a given b value are scattered throughout the structure.
=) belong at the front so the seek narrows the range as quickly as possible.>, <, BETWEEN, LIKE 'x%') on a column, the columns after it in the key can no longer be used for seeking, so put range filters at the end.ORDER BY (and the leading columns are fixed by equality), the engine can return rows already sorted and skip a separate sort operator.This is why column order, not just column choice, drives whether an index helps. A well-ordered composite index can satisfy filtering and sorting in a single seek; a poorly ordered one forces scans and sorts. For more depth on how indexes are structured and chosen, see the guide on SQL indexes and the performance tuning notes.
CREATE NONCLUSTERED INDEX IX_Orders_Cust_Date
ON Orders (customer_id, order_date);
-- Seekable: equality on the leading column, range on the last
SELECT order_id, total
FROM Orders
WHERE customer_id = 42
AND order_date >= '2026-01-01';
customer_id (equality) leads and order_date (range) trails, so the whole predicate is one efficient index seek.
-- Uses the index: filters on the leftmost prefix (a)
SELECT order_id FROM Orders
WHERE customer_id = 42;
-- Cannot seek: order_date is the second key column, not a prefix
SELECT order_id FROM Orders
WHERE order_date = '2026-01-01';
The first query seeks on the leading column; the second must scan because order_date alone is not a leftmost prefix of (customer_id, order_date).
Related: What is a covering index and how does an included column help? · How do you order the columns in a composite index? · What is the difference between a clustered and a non-clustered index?
The two strategies answer the same question - how do I stop two sessions from clobbering each other - with opposite assumptions about how often conflicts happen.
Pessimistic concurrency assumes conflicts are likely, so it takes a lock the moment a row is read and holds it until the transaction commits. Any other session that wants the same row waits. This guarantees that nobody reads stale data or overwrites your change, but it costs concurrency: readers and writers block each other, transactions must be kept short, and holding locks across multiple rows raises the risk of deadlocks.
Optimistic concurrency assumes conflicts are rare, so it takes no locks on read. Instead it remembers a version marker for the row - typically a rowversion (also called timestamp) column or a LastModified value - and, when it finally writes, it adds that marker to the WHERE clause. If another session changed the row in the meantime the version no longer matches, the UPDATE affects zero rows, and the application detects the lost update and retries with fresh data. This gives much higher concurrency because reads never block, at the price of having to handle the retry.
Row versioning isolation levels support the optimistic pattern. Turning on snapshot isolation (or READ_COMMITTED_SNAPSHOT) lets readers see a consistent point-in-time version of a row from the version store instead of taking shared locks, so reads and writes stop blocking each other. Explicit lock hints like UPDLOCK drive the pessimistic pattern by reserving a row the instant you read it.
-- Product has a rowversion column named RowVer.
-- 1) Read the row and remember its version.
DECLARE @ver rowversion;
SELECT @ver = RowVer
FROM Products
WHERE ProductId = 42;
-- 2) ...application edits the price, no locks held...
-- 3) Write only if nobody changed the row since we read it.
UPDATE Products
SET Price = 19.99
WHERE ProductId = 42
AND RowVer = @ver;
IF @@ROWCOUNT = 0
THROW 50001, 'Row changed by another user - reload and retry.', 1;
The RowVer comparison detects a lost update: if another session wrote first, no rows match and the app retries with fresh data.
BEGIN TRANSACTION;
-- UPDLOCK takes an update lock on read, so any other session
-- trying to read-for-update this row must wait until we commit.
SELECT Balance
FROM Accounts WITH (UPDLOCK, HOLDLOCK)
WHERE AccountId = 7;
UPDATE Accounts
SET Balance = Balance - 100
WHERE AccountId = 7;
COMMIT TRANSACTION;
Holding the lock from read through write prevents a competing transaction from slipping in, at the cost of making it block.
Related: What are the transaction isolation levels and what anomalies do they prevent? · What causes a deadlock and how do you prevent one? · What is a transaction and what does ACID mean?
Explore the other interview categories, or drill the fundamentals with the functions library and the SQL Server error library.