Performance questions check whether you can make slow SQL fast. They cover indexing, execution plans, sargability and common anti-patterns. Each answer uses a diagram to show what the engine does and SQL to demonstrate the fix.
Without a useful index, the engine has to do a table scan (or clustered index scan): it reads every data page and checks each row against your predicate. On a large table that is a lot of I/O even when only one row qualifies. An index gives the optimizer a faster path to the rows it needs. See SQL indexes for the full picture.
A nonclustered index is stored as a B-tree: a shallow, balanced tree whose keys are kept in sorted order. To find a value the engine starts at the root page, follows one pointer per level down through the intermediate pages, and lands on the leaf page that holds the key. Because the tree is only a few levels deep, even a table with millions of rows is reachable in a handful of page reads. That targeted lookup is an index seek, and it is the payoff an interviewer wants you to describe.
ORDER BY and GROUP BY on the same columns without a separate sort operator.INCLUDE columns), the engine answers entirely from the index and skips the base table lookup altogether.Indexes are not free. Every INSERT, UPDATE and DELETE must also maintain each affected index, keeping its B-tree sorted and balanced, so write-heavy tables pay a real cost and each index also consumes storage. The craft is to add the indexes that turn expensive scans into cheap seeks without over-indexing a table that is mostly written to. For a deeper walkthrough see how SQL indexes improve performance.
CREATE NONCLUSTERED INDEX IX_Orders_CustomerId
ON Orders (customer_id)
INCLUDE (order_date, total);
The key supports seeks and ordering on customer_id; the INCLUDE columns let common queries be answered from the index without touching the base table.
-- Before the index: this scans every page in Orders
SELECT order_id, order_date, total
FROM Orders
WHERE customer_id = 4172;
-- After IX_Orders_CustomerId exists, the same query
-- uses an index seek and reads only a few pages.
SELECT order_id, order_date, total
FROM Orders
WHERE customer_id = 4172;
The query text is identical; the index changes the plan from a full scan to a targeted seek. Check the execution plan to confirm Index Seek instead of Table Scan.
Related: What is an index and how does it speed up queries? · What is the difference between an index seek and an index scan? · When can an index hurt performance?
Both operators describe how SQL Server uses an index, but they touch very different amounts of data. An index seek walks the B-tree from the root down through the intermediate levels straight to the leaf pages that hold the rows you asked for. Because the tree is sorted, the engine can discard whole branches at every level, so a seek on a large table often reads only a handful of pages.
An index scan reads the index (or, for a heap or clustered table, the whole table) from one end to the other and checks every row. The cost of a scan grows with the size of the object, not with the number of rows that actually match, so on a big table it can read millions of pages to return a few results.
A seek is possible only when the predicate is sargable (Search ARGument ABLE) and selective. Sargable means the column is left bare on one side of the comparison so its value can be used to navigate the tree. Wrapping the column in a function, applying arithmetic to it, or using a leading wildcard defeats this and forces a scan, because the engine can no longer predict where the matching values sit in the ordered index.
A scan is not automatically a problem. On a small table, or when a query genuinely needs most of the rows (low selectivity, such as a report over the whole table), a scan is often the cheaper plan and the optimizer chooses it deliberately. The warning sign is a scan on a large table where you expected a seek: that usually means a missing or unsuitable index or a predicate written in a non-sargable way. Fixing it is a core skill covered in performance tuning.
-- Assumes an index on Orders(OrderDate)
SELECT order_id, customer_id, total
FROM Orders
WHERE YEAR(OrderDate) = 2025;
YEAR() is applied to every row, so the index cannot be navigated and the engine scans the whole index.
-- Same index on Orders(OrderDate)
SELECT order_id, customer_id, total
FROM Orders
WHERE OrderDate >= '2025-01-01'
AND OrderDate < '2026-01-01';
The column is left bare, so the B-tree is navigated directly to the 2025 range and only the matching pages are read.
Related: What makes a predicate sargable, and why does it matter? · How does an index improve query performance? · How do you read a query execution plan?
Troubleshooting a slow query is a structured process, not a guess. The goal is to find where the time actually goes and attack the biggest cost first, then re-measure. Start by reproducing the problem with realistic parameters and clearing the noise: run SET STATISTICS IO ON; and SET STATISTICS TIME ON; so you can see logical reads and CPU per table, and capture the actual execution plan (not the estimated one) so you compare estimated versus actual row counts. A large gap between the two usually points at stale statistics or bad parameter sniffing.
Read the plan from the most expensive operator outward and work down a checklist:
INCLUDE columns so it covers the query.Sort or hash operator, or a spill to tempdb (shown by a warning triangle), often means an ORDER BY or GROUP BY that an index could satisfy in order.sys.dm_db_missing_index_details and confirm the index is not duplicating one you already have. See SQL indexes.WHERE YEAR(order_date) = 2024, forces a scan. Rewrite it as a range so the index can seek. See sargable predicates.UPDATE STATISTICS so the optimizer picks a better plan.OPTION (RECOMPILE) or OPTIMIZE FOR can stabilize it.sys.dm_exec_requests wait types and see transactions, locks and deadlocks.Change one thing at a time and re-run with statistics on so you can prove the fix reduced logical reads or CPU. Deeper method and worked examples live in the performance tuning guide and the 15 SQL performance tuning tips.
The habit interviewers reward is measuring before changing: show the plan and the reads, fix the single biggest cost operator, then measure again rather than scattering indexes and hoping.
SET STATISTICS IO, TIME ON;
SELECT o.order_id, o.total
FROM Orders AS o
WHERE o.customer_id = 4821;
-- Read the Messages tab: 'logical reads' per table shows
-- how many 8KB pages were touched. A huge number for a
-- small result set means a scan you can remove.
Logical reads are the currency of query cost; drive them down and the query gets faster and more scalable.
-- Before: no useful index, so this scans all of Orders.
SELECT order_id, total
FROM Orders
WHERE customer_id = 4821;
-- Add a nonclustered index that seeks on customer_id
-- and INCLUDEs the output columns so no key lookup is needed.
CREATE NONCLUSTERED INDEX IX_Orders_customer
ON Orders (customer_id)
INCLUDE (order_id, total);
After the index the plan shows an Index Seek instead of a scan, and logical reads drop from thousands to a handful.
Related: How do you read a query execution plan? · What makes a predicate sargable, and why does it matter? · What are statistics and how does cardinality estimation affect plans?
Sargable (Search ARGument ABLE) describes a query whose WHERE predicate the optimizer can turn into an index seek: it jumps straight to the qualifying rows instead of reading the whole table. A query is sargable when the indexed column stands alone on one side of the comparison. The moment you wrap that column in something, the engine loses the ability to use the index order and falls back to a scan. This is one of the highest leverage ideas in performance tuning.
Why does wrapping the column break the seek? An index stores the raw column values in sorted order. If you compare YEAR(order_date), the index has no entry for the computed year, so the engine must compute the expression for every single row before it can test the predicate. That is a full scan by definition. The same thing happens with arithmetic, conversions, and leading wildcards.
The things that make a predicate non-sargable:
YEAR(order_date) = 2024 or UPPER(name) = 'ACME'.salary * 12 > 100000 instead of comparing salary against a precomputed constant.VARCHAR column to an NVARCHAR parameter, which forces SQL Server to convert the column and drops the seek. The same mismatch can also throw error 245.LIKE, such as name LIKE '%son'. A trailing wildcard like 'son%' stays sargable because the prefix is fixed.To make a query sargable, rewrite the predicate so the column is bare and push the transformation onto the literal side. The classic move is turning a function over a date into a half-open range: replace YEAR(order_date) = 2024 with order_date >= '2024-01-01' AND order_date < '2025-01-01', which returns the same rows but lets the index seek the range. Rewrite salary * 12 > 100000 as salary > 100000 / 12. Fix conversions by matching the parameter type to the column type.
When a filter cannot be rewritten (for example you genuinely need to search on UPPER(name) or on a value derived from several columns), add a computed column and index it, ideally PERSISTED so the value is stored rather than recomputed. The optimizer can then match the indexed expression and seek. Remember that sargability only helps if a suitable index actually exists; a sargable predicate does not create the index for you.
-- YEAR() wraps order_date and salary * 12 wraps salary,
-- so neither index can be seeked; the engine scans every row.
SELECT order_id, customer_id, total
FROM Orders
WHERE YEAR(order_date) = 2024
AND salary * 12 > 100000;
Both predicates compute an expression per row, forcing a full scan no matter how few rows qualify.
-- Column bare on the left, transformation moved to the literal side.
SELECT order_id, customer_id, total
FROM Orders
WHERE order_date >= '2024-01-01'
AND order_date < '2025-01-01'
AND salary > 100000 / 12.0;
Same result set, but an index on order_date (or salary) can now seek the range instead of scanning.
-- Make the derived value seekable by storing and indexing it.
ALTER TABLE Customers
ADD name_upper AS UPPER(name) PERSISTED;
CREATE INDEX IX_Customers_name_upper ON Customers(name_upper);
SELECT customer_id, name
FROM Customers
WHERE name_upper = 'ACME';
The optimizer matches the indexed expression, so UPPER(name) = ... becomes a seek instead of a scan.
Related: What makes a predicate sargable, and why does it matter? · What is the difference between an index seek and an index scan? · How do you troubleshoot a slow query?
Read an execution plan right to left and top to bottom, the way SQL Server actually feeds rows through it, but the first thing your eye should hunt for is the most expensive operator. Each operator shows a cost as a percentage of the whole plan, so the operator with the highest number is where your tuning effort will pay off. Chasing a cheap operator while a 90 percent operator sits next to it is the classic beginner mistake.
Always use the actual execution plan, not just the estimated one, when you can run the query. The actual plan carries the real row counts the engine measured, which is what lets you catch bad estimates. With that in hand, work through this checklist:
Clustered Index Scan or Table Scan that touches millions of rows to return a handful is a red flag; it usually means no useful index or a non-sargable predicate. A Seek on a big table is normally what you want instead.Key Lookup (or RID Lookup) means a nonclustered index found the rows but had to jump back to the base table for extra columns. Once per row on a large result set, that gets expensive; a covering index with INCLUDE columns removes it.Sort or Hash Match that runs out of its memory grant spills to tempdb, shown by a yellow warning triangle. Spills turn an in-memory operation into disk IO and crater performance.Pair the plan with SET STATISTICS IO ON to see the logical reads each table costs; a scan that reports huge read counts confirms what the plan is telling you. From there the fixes are usually indexing, rewriting a predicate to be sargable, or updating statistics. See performance tuning for the wider workflow, and Microsoft has a deeper reference on displaying actual execution plans.
SET STATISTICS XML ON;
SELECT o.order_id, o.total, c.name
FROM Orders AS o
JOIN Customers AS c
ON c.customer_id = o.customer_id
WHERE o.order_date >= '2026-01-01';
SET STATISTICS XML OFF;
In SSMS you can also press Ctrl+M first, then run the query, to get the graphical actual plan with real row counts.
SET STATISTICS IO ON;
SELECT o.order_id, o.total, c.name
FROM Orders AS o
JOIN Customers AS c
ON c.customer_id = o.customer_id
WHERE o.order_date >= '2026-01-01';
SET STATISTICS IO OFF;
A large logical reads count against a table confirms a scan; a covering index turns it into a cheap seek with far fewer reads.
Related: How do you read a query execution plan? · How do you troubleshoot a slow query? · What are statistics and how does cardinality estimation affect plans?
A composite index is an index on two or more columns, and the order of those columns decides whether SQL Server can perform an efficient seek or is forced into a scan. The proven rule of thumb is equality columns first, then the range column, then INCLUDE columns. You design the key to match how the query filters, joins and sorts, not by picking the columns alphabetically or by how selective each one looks in isolation.
The reason is the leftmost-prefix rule. A B-tree index is sorted by the first key column, then within that by the second, and so on, like a phone book sorted by last name then first name. The engine can only seek on a leading prefix of the key. An index on (status, created_at) can seek a query filtering on status alone, or on status and created_at together, but it cannot seek a query that filters only on created_at, because that column is not the leftmost key.
= (or IN) narrow the search to a contiguous run of rows and should lead the key.>, <, BETWEEN or LIKE 'abc%' goes immediately after the equality columns. Once you hit a range, columns after it in the key can no longer be seeked, only scanned within the range, so at most one range column earns a key position.SELECT list and are not used for filtering or ordering belong in the INCLUDE clause. They are stored at the leaf level to make the index covering without bloating the key or its sort order.Also let the ORDER BY guide you. If a query filters on an equality column and then sorts by another column, placing that sort column next in the key lets the index return rows already ordered and skip a sort operator. Read more on how SQL Server indexes work and on performance tuning for the full picture.
In short: match the index key to the shape of the query. Equality columns pin down the starting point, one range column defines the span to read, INCLUDE columns cover the output, and the leftmost-prefix rule is the constraint that ties it all together.
-- Query: recent orders for one status, newest first
SELECT order_id, created_at, total
FROM Orders
WHERE status = 'shipped'
AND created_at >= '2026-01-01'
ORDER BY created_at DESC;
-- Index: equality column, then range/sort column, then output columns
CREATE NONCLUSTERED INDEX IX_Orders_Status_Created
ON Orders (status, created_at)
INCLUDE (order_id, total);
status leads (equality), created_at follows to satisfy both the range filter and the ORDER BY, and order_id and total are INCLUDE columns so the index covers the query.
-- With IX on (created_at, status) this query CANNOT seek on status,
-- because status is not the leftmost prefix of the key.
SELECT order_id, total
FROM Orders
WHERE status = 'shipped';
-- With IX on (status, created_at) the same query seeks directly to
-- the 'shipped' rows, then reads only that contiguous range.
SELECT order_id, total
FROM Orders
WHERE status = 'shipped';
Leading with the equality column (status) lets the engine seek. Leading with created_at would force a scan when only status is filtered.
Related: Why does the column order in a composite index matter? · What is a covering index and how does an included column help? · What is a covering index?
A covering index is a nonclustered index that already holds every column a specific query references. Because all the needed values are in the index, the query is served by reading the index alone and never has to visit the base table. When an index covers a query the phrase you use is that it covers it, and the plan collapses to a single seek.
The reason this matters is the key lookup. A nonclustered index normally stores only its key columns plus a pointer to the row (the clustered key, or a row id on a heap). If your query asks for a column that is not in the index, the engine must follow that pointer once per qualifying row to fetch the missing value. That extra hop is the key lookup (a RID lookup on a heap), and when many rows match it can dominate the cost or make the optimizer abandon the index and scan the whole table instead.
You spot the need by reading the execution plan: an Index Seek paired with a Key Lookup (often joined by a Nested Loops) is the tell. That Key Lookup is the round trip to the table for columns the index does not carry. Cover the query and it vanishes, leaving just the seek.
The trade-off is width. Every column you add makes the index bigger on disk and in the buffer pool, and every INSERT, UPDATE or DELETE that touches those columns must maintain the index too. So cover the queries that actually hurt, not every column of the table, and keep the key narrow while INCLUDE carries the rest.
-- Index only has the key column
CREATE NONCLUSTERED INDEX IX_Orders_Customer
ON Orders (customer_id);
-- total and status are not in the index, so each
-- matching row triggers a Key Lookup to the table
SELECT customer_id, total, status
FROM Orders
WHERE customer_id = 42;
The plan is an Index Seek plus a Key Lookup, because total and status must be fetched from the base table.
-- Same key, but the returned columns ride along at the leaf
CREATE NONCLUSTERED INDEX IX_Orders_Customer
ON Orders (customer_id)
INCLUDE (total, status)
WITH (DROP_EXISTING = ON);
SELECT customer_id, total, status
FROM Orders
WHERE customer_id = 42;
Every referenced column now lives in the index, so the Key Lookup disappears and the plan is a single Index Seek.
Related: What is a covering index and how does an included column help? · How do you order the columns in a composite index? · What do you look for first in an execution plan?
SELECT * tells the server to return every column of every row the query touches. That is convenient while you are exploring, but in production code it costs you on several fronts at once, which is exactly why an interviewer raises it as a performance tuning topic.
VARCHAR, NVARCHAR(MAX), blobs). Pulling them all inflates network traffic, buffer pool usage and the memory grant for sorts and joins.* and the engine must go back to the base table for the missing columns, turning a cheap index seek into a key lookup per row or a full clustered index scan.SELECT * silently changes shape. Positional code, INSERT ... SELECT *, and views created WITH SCHEMABINDING can break or return the wrong column in the wrong slot.The fix is simple: select only the columns you need. That keeps result sets narrow, lets a well designed index cover the query, and makes the code resilient to schema changes.
It is not that * is always wrong. For ad hoc exploration in SSMS, quick SELECT TOP (10) * checks, or an EXISTS subquery where the column list is irrelevant, it is perfectly fine. The rule is about queries that ship in application code and run at scale.
-- Index on (Status) INCLUDE (CustomerId) exists
CREATE INDEX IX_Orders_Status
ON Orders (Status) INCLUDE (CustomerId);
-- SELECT * still needs Total, CreatedAt, Notes, ... so the
-- engine does a key lookup per row or scans the whole table.
SELECT *
FROM Orders
WHERE Status = 'Open';
Because * asks for columns the index does not hold, a cheap seek degrades into per-row key lookups or a clustered index scan.
-- Ask for only the columns the index already contains
SELECT OrderId, CustomerId
FROM Orders
WHERE Status = 'Open';
-- Now IX_Orders_Status can satisfy the query with a single
-- index seek and no trip back to the base table.
A narrow, explicit column list lets the covering index answer the query end to end, moving far less data.
Related: What is a covering index? · How do you troubleshoot a slow query? · What is the difference between an index seek and an index scan?
Parameter sniffing is what happens when SQL Server compiles a parameterized query (typically a stored procedure or a query using sp_executesql) for the first time. The optimizer sniffs the actual parameter values passed on that first call, uses the statistics for those specific values to estimate row counts, and builds a plan tuned for them. That plan is then cached and reused for every later call with the same query text, regardless of the values passed next.
This is usually a good thing. Compiling a plan is expensive, so reusing a cached plan is normally a performance win. The problem appears when the data is skewed. Suppose a plan is compiled for a customer with 3 orders: the optimizer picks an index seek plus key lookups, which is perfect for 3 rows. That same plan is then reused for a customer with 3 million orders, where a table scan would have been far cheaper. Now every big-customer call runs the small-customer plan and is painfully slow. The reverse also happens: a plan compiled for the huge value scans, and small-value calls that should seek pay for a scan.
The classic symptom is "the same query is fast sometimes and slow other times", or a procedure that suddenly turns slow after a stats update, an index rebuild, or a server restart clears the plan cache and a different value gets sniffed first. See https://codewithsql.com/performance-tuning for the broader picture of how plans and statistics drive query speed.
Common ways to deal with it:
OPTION (RECOMPILE) - forces the statement to compile fresh on every execution using the current parameter values. You get an ideal plan each time at the cost of extra CPU for compilation. Great for occasional or wildly-varying queries; bad for high-frequency ones.OPTIMIZE FOR (@p = value) - tells the optimizer to build the plan as if a specific typical value were passed, so you deliberately pick a good "average" plan.OPTIMIZE FOR UNKNOWN - ignores the sniffed value and uses the average density from statistics instead, producing a stable, middle-of-the-road plan.The key interview point is nuance: parameter sniffing is not inherently bad. Plan reuse is what makes parameterized queries fast in the first place. You only intervene when a specific query suffers from skewed data, and you pick the fix that trades the least CPU or flexibility for a stable plan.
CREATE OR ALTER PROCEDURE dbo.GetOrdersByCustomer
@CustomerId INT
AS
BEGIN
SELECT o.order_id, o.order_date, o.total
FROM dbo.Orders AS o
WHERE o.customer_id = @CustomerId
OPTION (RECOMPILE);
END;
Every call optimizes for the actual @CustomerId, so small and large customers each get an ideal plan. The cost is a recompile on every execution.
CREATE OR ALTER PROCEDURE dbo.GetOrdersByCustomer
@CustomerId INT
AS
BEGIN
SELECT o.order_id, o.order_date, o.total
FROM dbo.Orders AS o
WHERE o.customer_id = @CustomerId
OPTION (OPTIMIZE FOR UNKNOWN);
END;
Ignores the sniffed value and estimates from average statistics, giving a consistent middle-ground plan instead of one skewed toward the first value seen.
Related: How does plan caching and recompilation work for stored procedures? · What are statistics and how does cardinality estimation affect plans? · How do you troubleshoot a slow query?
An index is a copy of some of your data, kept sorted, so the engine can find rows without scanning the whole table. That is a great deal for reads, but it is not free. Every index is a separate structure the storage engine has to keep in sync, so the cost lands on writes and on storage. Seeing indexes as a read/write trade-off is exactly what an interviewer wants to hear. See how indexes work for the read side.
Write overhead. Every INSERT, UPDATE and DELETE must maintain every affected index, not just the table. Insert a row into a table with six nonclustered indexes and the engine does seven writes, one per index plus the base table. An UPDATE that changes an indexed column also has to move the entry to its new sorted position in each index that covers it. On a write-heavy table (an order queue, an audit log, a message table) this maintenance can cost more than the reads the index saves.
bit flag, a status with three states, gender) is rarely useful. If a value matches 40 percent of the table, a seek plus thousands of lookups is slower than a plain scan, so the optimizer ignores the index and you paid to maintain it for nothing.(CustomerId) is fully redundant if you also have one on (CustomerId, OrderDate), because the composite can serve any query the single-column one could. The duplicate just doubles the write work. See composite index column order.varchar, or stuffing many INCLUDE columns in, makes each index row large. Fewer rows fit per page, the index is bigger on disk and in memory, and every write copies more bytes.Finding the waste. SQL Server tracks how each index is used in sys.dm_db_index_usage_stats. An index with many user_updates but zero or almost zero user_seeks, user_scans and user_lookups since the last restart is being maintained on every write while contributing nothing to reads. Those are the first candidates to drop. Note the counters reset when the instance restarts, so judge them over a representative window, not right after a reboot.
The practical rule: index for the reads you actually run, then periodically review usage and remove indexes that only ever get written. Fewer, well chosen indexes usually beat a large set of speculative ones. More on this in performance tuning.
SELECT
OBJECT_NAME(i.object_id) AS table_name,
i.name AS index_name,
us.user_seeks,
us.user_scans,
us.user_lookups,
us.user_updates
FROM sys.indexes AS i
LEFT JOIN sys.dm_db_index_usage_stats AS us
ON us.object_id = i.object_id
AND us.index_id = i.index_id
AND us.database_id = DB_ID()
WHERE i.type_desc <> 'HEAP'
AND i.is_primary_key = 0
AND i.is_unique_constraint = 0
AND ISNULL(us.user_seeks, 0)
+ ISNULL(us.user_scans, 0)
+ ISNULL(us.user_lookups, 0) = 0
ORDER BY us.user_updates DESC;
Rows with high user_updates but no seeks, scans or lookups are pure write overhead. The counters reset on restart, so check them after the server has run for a while.
-- Every insert into this queue maintains the base table
-- plus all three nonclustered indexes: four writes per row.
CREATE TABLE dbo.OrderQueue (
QueueId BIGINT IDENTITY PRIMARY KEY,
OrderId INT NOT NULL,
Status TINYINT NOT NULL, -- only 3 distinct values
CreatedAt DATETIME2 NOT NULL,
Payload NVARCHAR(2000) NULL
);
CREATE INDEX IX_Queue_Status ON dbo.OrderQueue(Status); -- low selectivity
CREATE INDEX IX_Queue_Order ON dbo.OrderQueue(OrderId); -- redundant with below
CREATE INDEX IX_Queue_OrderDt ON dbo.OrderQueue(OrderId, CreatedAt);
-- High-volume insert path pays for all of them:
INSERT INTO dbo.OrderQueue (OrderId, Status, CreatedAt)
VALUES (1001, 0, SYSUTCDATETIME());
IX_Queue_Status is too low selectivity to help, and IX_Queue_Order is redundant with the composite IX_Queue_OrderDt. Both slow every insert without earning their keep.
Related: How does an index improve query performance? · How do you order the columns in a composite index? · How do you troubleshoot a slow query?
The N plus one query problem is a performance anti pattern most often seen with ORMs and lazy loading. Your code runs 1 query to fetch a list of parent rows (say 100 orders), then loops over the results and runs 1 more query per row to pull each order its related data (the customer, the line items). One list query plus N detail queries gives N + 1 total round trips to the database.
Each of those extra queries is cheap on its own, but the cost is the round trip: network latency, query parsing, planning and connection overhead, paid once per row. At 100 rows that is 101 round trips where 1 would do, and it gets dramatically worse as the list grows. The database is barely working, yet the request is slow because it spends its time waiting on a storm of tiny sequential calls.
The fix is to fetch the related data in bulk instead of one row at a time. There are three common approaches:
WHERE parent_id IN (...), then stitch them together in code. That is 2 queries total regardless of N.Include, Django select_related / prefetch_related, Rails includes) so it generates a JOIN or a batched query for you instead of lazy loading per row.To detect N plus one, watch the query log or an APM trace for many near identical parameterized statements that differ only by a single id value, for example the same SELECT ... FROM Customers WHERE customer_id = @p0 executed hundreds of times in one request. That repeated shape is the signature. If a page is slow for this reason, it is worth also checking why a query is slow more generally.
-- Step 1: one query fetches the parent list
SELECT order_id, customer_id
FROM Orders
WHERE order_date >= '2026-01-01';
-- Step 2: application loops over each order and fires
-- one more query per row, so N times in total:
SELECT name, email FROM Customers WHERE customer_id = 101;
SELECT name, email FROM Customers WHERE customer_id = 102;
SELECT name, email FROM Customers WHERE customer_id = 103;
-- ... one per order = N extra round trips
The list query plus N per row lookups is N + 1 round trips. The repeated identical shape is the tell tale sign.
SELECT o.order_id, o.order_date,
c.name, c.email
FROM Orders AS o
INNER JOIN Customers AS c
ON c.customer_id = o.customer_id
WHERE o.order_date >= '2026-01-01';
One query, one round trip, no matter how many orders. A batched WHERE customer_id IN (...) is a good alternative when a JOIN is awkward.
Related: What is a SQL JOIN and what are the main types? · How do you troubleshoot a slow query? · Why is keyset pagination faster than OFFSET for large pages?
OFFSET pagination does not actually skip work. Even though OFFSET 100000 FETCH NEXT 20 returns only 20 rows, the engine must first read and discard those 100000 preceding rows in sort order to know where the page begins. The deeper you page, the more rows it wastes, so latency grows roughly linearly with the offset value. Page 1 is instant; page 5000 crawls.
Keyset pagination (also called seek or cursor pagination) remembers the sort key of the last row you saw and asks for the next rows directly: WHERE sort_key > @last_value ORDER BY sort_key. With an index on sort_key, the engine performs an index seek straight to that position and reads just the page you want. Cost stays near constant no matter how deep you go, because the work is proportional to the page size, not the page number.
ORDER BY.ORDER BY and the comparison so no rows are skipped or repeated.The trade-off is navigation. Keyset can only move to the next (or previous) page relative to a known row, so it cannot jump to an arbitrary page number like "page 500" the way OFFSET can. That is rarely a real requirement: infinite scroll and "load more" only ever need the next page, which is exactly what keyset is built for. See performance tuning for how indexing choices drive this.
SELECT order_id, customer_id, order_date
FROM Orders
ORDER BY order_date, order_id
OFFSET 100000 ROWS FETCH NEXT 20 ROWS ONLY;
The engine must read and discard the first 100000 rows in sort order before returning 20, so deep pages get progressively slower.
-- @last_date and @last_id come from the last row of the previous page
SELECT TOP (20) order_id, customer_id, order_date
FROM Orders
WHERE order_date > @last_date
OR (order_date = @last_date AND order_id > @last_id)
ORDER BY order_date, order_id;
With an index on (order_date, order_id) this is a single seek that reads only 20 rows, regardless of how deep you page. The order_id tiebreaker keeps paging correct when dates repeat.
Related: What makes a predicate sargable, and why does it matter? · What is the difference between an index seek and an index scan? · How do you troubleshoot a slow query?
Statistics are small metadata objects the query optimizer keeps about the distribution of values in a column or index. The two parts that matter are the histogram, which buckets the column values into up to 200 steps and records how many rows fall in each, and the density vector, which measures how unique the values are. SQL Server does not read your data to compile a plan; it reads these statistics instead.
When the optimizer compiles a query it uses statistics to perform cardinality estimation: for every predicate and join it guesses how many rows will flow out of each operator. Those estimated row counts are the single most important input to plan choice. From them the optimizer decides join type (a nested loop for a few rows versus a hash join for many), join order, whether to seek or scan an index, and how much memory to grant for sorts and hashes.
SQL Server maintains statistics automatically when AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are on. Auto update fires after a change threshold is crossed, but on large tables that threshold can be reached long after the distribution has drifted, so the sampled snapshot goes stale. You can refresh them explicitly with UPDATE STATISTICS (optionally WITH FULLSCAN for an exact histogram rather than a sample), and you can rebuild an index to get a full-scan statistic for free.
In an interview, tie this back to symptoms: a plan that looks reasonable but runs slowly, warnings about spills, or a huge estimated-versus-actual gap all point at cardinality estimation gone wrong. Learning to read that gap is a core performance tuning skill, and it is closely related to parameter sniffing, where a cached estimate is right for one parameter and wrong for the next. See the Microsoft Learn statistics guide for the full model.
-- Update every statistic on the table with an exact (full) scan
UPDATE STATISTICS dbo.Orders WITH FULLSCAN;
-- Or update a single named statistic, sampling a percentage
UPDATE STATISTICS dbo.Orders IX_Orders_CustomerId WITH SAMPLE 30 PERCENT;
FULLSCAN reads all rows for an exact histogram; SAMPLE is faster but approximate. Rebuilding an index also refreshes its statistics at full scan.
-- List the statistics objects on a table
SELECT name, auto_created, user_created, filter_definition
FROM sys.stats
WHERE object_id = OBJECT_ID('dbo.Orders');
-- View the histogram, density and last update date
DBCC SHOW_STATISTICS('dbo.Orders', 'IX_Orders_CustomerId');
DBCC SHOW_STATISTICS returns the header (rows, last updated), the density vector and the histogram steps the optimizer reads to estimate cardinality.
Related: What do you look for first in an execution plan? · What is parameter sniffing and how do you deal with it? · How do you troubleshoot a slow query?
Table partitioning takes one logically single table and stores its rows in several physical partitions, divided by the value of a chosen partitioning column (almost always a date). To the application it is still one table you query normally; underneath, SQL Server routes each row to a partition based on a partition function (which defines the boundary values, for example the start of 2022, 2023 and 2024) mapped onto a partition scheme (which places those partitions on filegroups).
The two real payoffs are specific, and naming them is what an interviewer wants to hear:
WHERE order_date >= '2024-01-01'), the optimizer knows it only has to read the 2024 partition and skips the rest. On a table with years of history that can turn a full scan into a read of one small slice.DELETE or INSERT of millions of rows.Just as important is what partitioning is not. It is a manageability and specific-query feature, not a general performance boost. Partitioning a table does not automatically make arbitrary queries faster, and if a query does not filter on the partitioning column it may actually read every partition. For most "my query is slow" problems a good index matters far more than partitioning - see https://codewithsql.com/performance-tuning for how indexing and statistics drive query speed, and https://codewithsql.com/blog-post/building-enterprise-databases-with-sql for where partitioning fits in a large schema.
Practical notes worth mentioning: the partitioning column should ideally be part of the clustered key and of unique indexes so they can be aligned with the table; SWITCH requires the source and target to have identical schema, indexes and a matching CHECK constraint; and partitioning is most valuable on genuinely large tables (tens of millions of rows and up) with a natural sliding-window pattern, such as time-series or audit data.
-- Boundary values split rows into yearly ranges (RANGE RIGHT:
-- each boundary belongs to the partition on its right).
CREATE PARTITION FUNCTION pfOrderYear (DATE)
AS RANGE RIGHT FOR VALUES ('2023-01-01', '2024-01-01');
-- Map every partition onto a filegroup (PRIMARY here for brevity).
CREATE PARTITION SCHEME psOrderYear
AS PARTITION pfOrderYear ALL TO ([PRIMARY]);
-- Build the table ON the scheme, keyed by the partitioning column.
CREATE TABLE dbo.Orders
(
order_id BIGINT NOT NULL,
order_date DATE NOT NULL,
total DECIMAL(12,2) NOT NULL,
CONSTRAINT PK_Orders PRIMARY KEY CLUSTERED (order_date, order_id)
) ON psOrderYear (order_date);
The function defines the boundaries (pre-2023, 2023, 2024+), the scheme places them, and the table is created on the scheme so rows land in the right partition automatically.
-- Staging table must match schema, indexes and filegroup exactly.
CREATE TABLE dbo.Orders_2022_Archive
(
order_id BIGINT NOT NULL,
order_date DATE NOT NULL,
total DECIMAL(12,2) NOT NULL,
CONSTRAINT PK_Orders_2022 PRIMARY KEY CLUSTERED (order_date, order_id)
) ON [PRIMARY];
-- Move partition 1 out of Orders in an instant, no data copy.
ALTER TABLE dbo.Orders
SWITCH PARTITION 1 TO dbo.Orders_2022_Archive;
SWITCH is a catalog change, not a data move, so archiving millions of rows is near-instant and avoids the logging and locking of a bulk DELETE.
Related: When can an index hurt performance? · How do you troubleshoot a slow query? · How do you order the columns in a composite index?
Blocking happens when one session holds a lock that another session needs, so the second session waits. A little blocking is normal; sustained blocking under load is a design problem. The goal is to hold fewer locks, hold them for less time, and stop readers and writers from fighting over the same rows.
COMMIT. The longer a transaction runs, the longer its locks are held and the more sessions pile up behind it.Parent before Child, two sessions cannot each hold what the other needs, which is what prevents a deadlock.SELECT statements read the last committed version from the version store instead of taking shared locks.READ_COMMITTED_SNAPSHOT (RCSI) is often the single biggest win. Once enabled at the database level, it changes the default READ COMMITTED isolation level to use row versioning: readers see a consistent snapshot without acquiring shared locks, so a long read no longer blocks a writer and a writer no longer blocks a reader. The cost is extra writes to tempdb for the version store and a 14 byte row overhead, which is a good trade for most OLTP workloads. Contrast the lock based and version based approaches in optimistic vs pessimistic locking.
When blocking is already happening, find the head of the chain rather than guessing. Query sys.dm_exec_requests and follow the blocking_session_id column up to the session that has no blocker of its own, or run the community script sp_whoisactive which surfaces the blocking tree, the running SQL, and wait types in one result set. See transactions, locks and deadlocks for the full picture.
Finally, know the two errors that come up here. A lock request time out (error 1222) means your session waited past SET LOCK_TIMEOUT and gave up; it is a symptom of blocking, and the transaction keeps running unless you handle it. A deadlock (error 1205) is different: two sessions block each other in a cycle, SQL Server detects it and kills one as the victim. Blocking is a queue you eventually clear; a deadlock is a cycle that can never clear on its own, which is why the engine has to intervene.
-- Requires a brief moment with no other connections to the database.
ALTER DATABASE Sales SET READ_COMMITTED_SNAPSHOT ON
WITH ROLLBACK IMMEDIATE;
-- Confirm it is on (is_read_committed_snapshot_on = 1).
SELECT name, is_read_committed_snapshot_on
FROM sys.databases
WHERE name = 'Sales';
After this, the default READ COMMITTED isolation uses row versioning, so a SELECT no longer takes shared locks that block writers.
SELECT
r.session_id,
r.blocking_session_id,
r.wait_type,
r.wait_time,
r.wait_resource,
t.text AS running_sql
FROM sys.dm_exec_requests AS r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS t
WHERE r.blocking_session_id <> 0
ORDER BY r.wait_time DESC;
The lead blocker is the session_id that appears as a blocking_session_id but is not itself blocked; sp_whoisactive shows the same chain more readably.
Related: What causes a deadlock and how do you prevent one? · What are the transaction isolation levels and what anomalies do they prevent? · What is the difference between optimistic and pessimistic concurrency?
Explore the other interview categories, or drill the fundamentals with the functions library and the SQL Server error library.