Home › SQL Interview Questions › Advanced

SQL Interview Prep

SQL Performance Interview Questions

Q: How does an index improve query performance?

An index is a sorted B-tree that lets the engine navigate from root to the target leaf in a few page reads (an index seek) instead of scanning every page. Because the keys are ordered, it also speeds up ORDER BY, GROUP BY and joins, at the cost of extra work on every insert, update and delete.

Q: What is the difference between an index seek and an index scan?

An index seek navigates the B-tree directly to just the qualifying rows using a selective, sargable predicate, so it reads very few pages. An index scan reads the entire index or table; it is fine on small or low-selectivity data but signals a missing index or a non-sargable predicate when it appears on a large table.

Q: How do you troubleshoot a slow query?

Reproduce it, then read the actual execution plan and SET STATISTICS IO, TIME output. Hunt for scans, key lookups, expensive sorts, missing or unused indexes, non-sargable predicates, stale statistics, parameter sniffing and blocking. Fix the single most expensive operator first, then re-measure.

Q: What is a sargable query and how do you make one?

A sargable query has a WHERE predicate the engine can satisfy with an index seek because the indexed column sits bare on one side of the comparison. You make one by keeping functions, arithmetic, conversions and leading wildcards off the column and moving them to the literal side, or by adding a computed column.

Q: What do you look for first in an execution plan?

Start with the most expensive operator, then look for scans on large tables, key lookups, thick arrows (high row counts), a big gap between estimated and actual rows, sorts or hash joins spilling to tempdb, and any operator warnings such as implicit conversion or a missing index.

Q: How do you order the columns in a composite index?

Put equality columns first, then the range or inequality column, then columns that are only needed in the output as INCLUDE. This respects the leftmost-prefix rule so the query can seek, and it should mirror the WHERE, JOIN and ORDER BY of the queries you are tuning.

Q: What is a covering index?

A covering index contains every column a query touches - in filtering, joining, ordering and the SELECT list - so the engine answers the query from the index alone and never looks up the base table. Key columns drive the seek; INCLUDE columns just ride along at the leaf.

Q: Why should you avoid SELECT * in production queries?

SELECT * returns every column, so it moves more data than needed, defeats covering indexes by forcing key lookups or scans, and silently breaks when the schema changes. In production you should name only the columns you actually need.

Q: What is parameter sniffing and how do you deal with it?

Parameter sniffing is when SQL Server compiles and caches a query plan based on the first parameter value it sees, then reuses that plan for later calls. The plan can be great for one value and terrible for a skewed one. You fix it with OPTION (RECOMPILE), OPTIMIZE FOR, local variables, or by splitting the query.

Q: When can an index hurt performance?

Indexes speed up reads but every one must be updated on INSERT, UPDATE and DELETE and takes storage, so they slow down writes. Too many indexes, low-selectivity indexes, duplicate or overlapping indexes, and very wide indexes add cost while rarely helping reads, and unused ones are pure overhead.

Performance questions check whether you can make slow SQL fast. They cover indexing, execution plans, sargability and common anti-patterns. Each answer uses a diagram to show what the engine does and SQL to demonstrate the fix.

Advanced 15 questions

1 How does an index improve query performance?

Without a useful index, the engine has to do a table scan (or clustered index scan): it reads every data page and checks each row against your predicate. On a large table that is a lot of I/O even when only one row qualifies. An index gives the optimizer a faster path to the rows it needs. See SQL indexes for the full picture.

A nonclustered index is stored as a B-tree: a shallow, balanced tree whose keys are kept in sorted order. To find a value the engine starts at the root page, follows one pointer per level down through the intermediate pages, and lands on the leaf page that holds the key. Because the tree is only a few levels deep, even a table with millions of rows is reachable in a handful of page reads. That targeted lookup is an index seek, and it is the payoff an interviewer wants you to describe.

Selective lookups. A seek jumps straight to matching rows instead of scanning the whole table, so a query filtering on an indexed column touches a few pages rather than thousands.
Ordering for free. Because the keys are already sorted, an index can satisfy ORDER BY and GROUP BY on the same columns without a separate sort operator.
Faster joins. An index on the join column lets the engine use a seek-based nested loops or merge join instead of scanning and hashing the whole table.
Covering reads. If the index contains every column the query needs (key plus INCLUDE columns), the engine answers entirely from the index and skips the base table lookup altogether.

Indexes are not free. Every INSERT, UPDATE and DELETE must also maintain each affected index, keeping its B-tree sorted and balanced, so write-heavy tables pay a real cost and each index also consumes storage. The craft is to add the indexes that turn expensive scans into cheap seeks without over-indexing a table that is mostly written to. For a deeper walkthrough see how SQL indexes improve performance.

Create a nonclustered index on the filter column

CREATE NONCLUSTERED INDEX IX_Orders_CustomerId
    ON Orders (customer_id)
    INCLUDE (order_date, total);

The key supports seeks and ordering on customer_id; the INCLUDE columns let common queries be answered from the index without touching the base table.

Before and after: scan becomes a seek

-- Before the index: this scans every page in Orders
SELECT order_id, order_date, total
FROM Orders
WHERE customer_id = 4172;

-- After IX_Orders_CustomerId exists, the same query
-- uses an index seek and reads only a few pages.
SELECT order_id, order_date, total
FROM Orders
WHERE customer_id = 4172;

The query text is identical; the index changes the plan from a full scan to a targeted seek. Check the execution plan to confirm Index Seek instead of Table Scan.

Follow-up questions to expect

What is the difference between an index seek and an index scan?
When would adding an index actually hurt performance?
What is a covering index and how does INCLUDE help?
How does the choice of clustered index key affect nonclustered indexes?

2 What is the difference between an index seek and an index scan?

Both operators describe how SQL Server uses an index, but they touch very different amounts of data. An index seek walks the B-tree from the root down through the intermediate levels straight to the leaf pages that hold the rows you asked for. Because the tree is sorted, the engine can discard whole branches at every level, so a seek on a large table often reads only a handful of pages.

An index scan reads the index (or, for a heap or clustered table, the whole table) from one end to the other and checks every row. The cost of a scan grows with the size of the object, not with the number of rows that actually match, so on a big table it can read millions of pages to return a few results.

A seek is possible only when the predicate is sargable (Search ARGument ABLE) and selective. Sargable means the column is left bare on one side of the comparison so its value can be used to navigate the tree. Wrapping the column in a function, applying arithmetic to it, or using a leading wildcard defeats this and forces a scan, because the engine can no longer predict where the matching values sit in the ordered index.

Seek: targeted, cost proportional to rows returned, driven by a selective sargable predicate on a leading index column.
Scan: reads everything, cost proportional to the size of the object, used when no useful index exists or the predicate is non-sargable or non-selective.

A scan is not automatically a problem. On a small table, or when a query genuinely needs most of the rows (low selectivity, such as a report over the whole table), a scan is often the cheaper plan and the optimizer chooses it deliberately. The warning sign is a scan on a large table where you expected a seek: that usually means a missing or unsuitable index or a predicate written in a non-sargable way. Fixing it is a core skill covered in performance tuning.

Non-sargable: wrapping the column forces an index scan

-- Assumes an index on Orders(OrderDate)
SELECT order_id, customer_id, total
FROM Orders
WHERE YEAR(OrderDate) = 2025;

YEAR() is applied to every row, so the index cannot be navigated and the engine scans the whole index.

Sargable rewrite: a range predicate lets the engine seek

-- Same index on Orders(OrderDate)
SELECT order_id, customer_id, total
FROM Orders
WHERE OrderDate >= '2025-01-01'
  AND OrderDate <  '2026-01-01';

The column is left bare, so the B-tree is navigated directly to the 2025 range and only the matching pages are read.

Follow-up questions to expect

How can you tell from an execution plan whether an operator is a seek or a scan?
What makes a predicate non-sargable, and how would you rewrite it?
When would you actually prefer a scan over a seek?
What is a key lookup, and why can it turn a cheap seek into an expensive plan?

3 How do you troubleshoot a slow query?

Troubleshooting a slow query is a structured process, not a guess. The goal is to find where the time actually goes and attack the biggest cost first, then re-measure. Start by reproducing the problem with realistic parameters and clearing the noise: run SET STATISTICS IO ON; and SET STATISTICS TIME ON; so you can see logical reads and CPU per table, and capture the actual execution plan (not the estimated one) so you compare estimated versus actual row counts. A large gap between the two usually points at stale statistics or bad parameter sniffing.

Read the plan from the most expensive operator outward and work down a checklist:

Scans versus seeks. A clustered or table scan where you expected a seek means no useful index, or the query cannot use the one that exists. See how to read an execution plan.
Key lookups. A seek followed by thousands of lookups back into the clustered index is a signal to widen a nonclustered index with INCLUDE columns so it covers the query.
Expensive sorts and hashes. A costly Sort or hash operator, or a spill to tempdb (shown by a warning triangle), often means an ORDER BY or GROUP BY that an index could satisfy in order.
Missing and unused indexes. The plan may show a green missing index hint; treat it as a suggestion, not gospel. Check sys.dm_db_missing_index_details and confirm the index is not duplicating one you already have. See SQL indexes.
Non-sargable predicates. Wrapping a column in a function, like WHERE YEAR(order_date) = 2024, forces a scan. Rewrite it as a range so the index can seek. See sargable predicates.
Stale statistics. If estimated rows are wildly off actual rows, update statistics with UPDATE STATISTICS so the optimizer picks a better plan.
Parameter sniffing. A plan cached for an atypical parameter can be terrible for the common case. OPTION (RECOMPILE) or OPTIMIZE FOR can stabilize it.
Blocking and waits. A query that is fast alone but slow in production may be waiting on locks. Check sys.dm_exec_requests wait types and see transactions, locks and deadlocks.

Change one thing at a time and re-run with statistics on so you can prove the fix reduced logical reads or CPU. Deeper method and worked examples live in the performance tuning guide and the 15 SQL performance tuning tips.

The habit interviewers reward is measuring before changing: show the plan and the reads, fix the single biggest cost operator, then measure again rather than scattering indexes and hoping.

Measure first: logical reads and CPU per table

SET STATISTICS IO, TIME ON;

SELECT o.order_id, o.total
FROM Orders AS o
WHERE o.customer_id = 4821;

-- Read the Messages tab: 'logical reads' per table shows
-- how many 8KB pages were touched. A huge number for a
-- small result set means a scan you can remove.

Logical reads are the currency of query cost; drive them down and the query gets faster and more scalable.

Turn a scan into a seek with a covering index

-- Before: no useful index, so this scans all of Orders.
SELECT order_id, total
FROM Orders
WHERE customer_id = 4821;

-- Add a nonclustered index that seeks on customer_id
-- and INCLUDEs the output columns so no key lookup is needed.
CREATE NONCLUSTERED INDEX IX_Orders_customer
    ON Orders (customer_id)
    INCLUDE (order_id, total);

After the index the plan shows an Index Seek instead of a scan, and logical reads drop from thousands to a handful.

Follow-up questions to expect

What is the difference between the estimated and the actual execution plan?
How do you tell whether a missing index recommendation is worth creating?
What is parameter sniffing and how would you fix a query that suffers from it?
How do you decide between adding INCLUDE columns and adding key columns to an index?

4 What is a sargable query and how do you make one?

Sargable (Search ARGument ABLE) describes a query whose WHERE predicate the optimizer can turn into an index seek: it jumps straight to the qualifying rows instead of reading the whole table. A query is sargable when the indexed column stands alone on one side of the comparison. The moment you wrap that column in something, the engine loses the ability to use the index order and falls back to a scan. This is one of the highest leverage ideas in performance tuning.

Why does wrapping the column break the seek? An index stores the raw column values in sorted order. If you compare YEAR(order_date), the index has no entry for the computed year, so the engine must compute the expression for every single row before it can test the predicate. That is a full scan by definition. The same thing happens with arithmetic, conversions, and leading wildcards.

The things that make a predicate non-sargable:

A function on the column, such as YEAR(order_date) = 2024 or UPPER(name) = 'ACME'.
Arithmetic on the column, such as salary * 12 > 100000 instead of comparing salary against a precomputed constant.
Implicit conversion, for example comparing a VARCHAR column to an NVARCHAR parameter, which forces SQL Server to convert the column and drops the seek. The same mismatch can also throw error 245.
A leading wildcard in LIKE, such as name LIKE '%son'. A trailing wildcard like 'son%' stays sargable because the prefix is fixed.

To make a query sargable, rewrite the predicate so the column is bare and push the transformation onto the literal side. The classic move is turning a function over a date into a half-open range: replace YEAR(order_date) = 2024 with order_date >= '2024-01-01' AND order_date < '2025-01-01', which returns the same rows but lets the index seek the range. Rewrite salary * 12 > 100000 as salary > 100000 / 12. Fix conversions by matching the parameter type to the column type.

When a filter cannot be rewritten (for example you genuinely need to search on UPPER(name) or on a value derived from several columns), add a computed column and index it, ideally PERSISTED so the value is stored rather than recomputed. The optimizer can then match the indexed expression and seek. Remember that sargability only helps if a suitable index actually exists; a sargable predicate does not create the index for you.

Non-sargable: function and arithmetic on the columns

-- YEAR() wraps order_date and salary * 12 wraps salary,
-- so neither index can be seeked; the engine scans every row.
SELECT order_id, customer_id, total
FROM Orders
WHERE YEAR(order_date) = 2024
  AND salary * 12 > 100000;

Both predicates compute an expression per row, forcing a full scan no matter how few rows qualify.

Sargable rewrite: bare columns let the optimizer seek

-- Column bare on the left, transformation moved to the literal side.
SELECT order_id, customer_id, total
FROM Orders
WHERE order_date >= '2024-01-01'
  AND order_date <  '2025-01-01'
  AND salary > 100000 / 12.0;

Same result set, but an index on order_date (or salary) can now seek the range instead of scanning.

When you cannot rewrite: index a persisted computed column

-- Make the derived value seekable by storing and indexing it.
ALTER TABLE Customers
    ADD name_upper AS UPPER(name) PERSISTED;

CREATE INDEX IX_Customers_name_upper ON Customers(name_upper);

SELECT customer_id, name
FROM Customers
WHERE name_upper = 'ACME';

The optimizer matches the indexed expression, so UPPER(name) = ... becomes a seek instead of a scan.

Follow-up questions to expect

Why is LIKE 'son%' sargable but LIKE '%son' is not?
How does an implicit type conversion silently turn a seek into a scan?
When would a non-sargable predicate still not hurt performance?
How does a PERSISTED computed column differ from a non-persisted one for indexing?

5 What do you look for first in an execution plan?

Read an execution plan right to left and top to bottom, the way SQL Server actually feeds rows through it, but the first thing your eye should hunt for is the most expensive operator. Each operator shows a cost as a percentage of the whole plan, so the operator with the highest number is where your tuning effort will pay off. Chasing a cheap operator while a 90 percent operator sits next to it is the classic beginner mistake.

Always use the actual execution plan, not just the estimated one, when you can run the query. The actual plan carries the real row counts the engine measured, which is what lets you catch bad estimates. With that in hand, work through this checklist:

Scans on large tables. A Clustered Index Scan or Table Scan that touches millions of rows to return a handful is a red flag; it usually means no useful index or a non-sargable predicate. A Seek on a big table is normally what you want instead.
Key lookups. A Key Lookup (or RID Lookup) means a nonclustered index found the rows but had to jump back to the base table for extra columns. Once per row on a large result set, that gets expensive; a covering index with INCLUDE columns removes it.
Thick arrows. Arrow thickness is proportional to the row count flowing between operators. A fat arrow early in the plan means a lot of rows are being carried before they are filtered, which is work you often want to push down closer to the source.
Estimated vs actual rows. Hover an operator and compare the two. A large gap (for example, estimated 10, actual 2,000,000) points to stale statistics or a bad cardinality estimate, and it is often the root cause of the optimizer picking the wrong join type or a nested loop where a hash join belonged.
Expensive sorts and spills. A Sort or Hash Match that runs out of its memory grant spills to tempdb, shown by a yellow warning triangle. Spills turn an in-memory operation into disk IO and crater performance.
Warnings. The optimizer flags problems directly: implicit conversion (a data type mismatch that can void an index), a green missing index hint, or the spill warnings above. Read every warning triangle before you touch anything else.

Pair the plan with SET STATISTICS IO ON to see the logical reads each table costs; a scan that reports huge read counts confirms what the plan is telling you. From there the fixes are usually indexing, rewriting a predicate to be sargable, or updating statistics. See performance tuning for the wider workflow, and Microsoft has a deeper reference on displaying actual execution plans.

Capture the actual execution plan for a query

SET STATISTICS XML ON;

SELECT o.order_id, o.total, c.name
FROM Orders AS o
JOIN Customers AS c
    ON c.customer_id = o.customer_id
WHERE o.order_date >= '2026-01-01';

SET STATISTICS XML OFF;

In SSMS you can also press Ctrl+M first, then run the query, to get the graphical actual plan with real row counts.

See the IO cost behind the plan

SET STATISTICS IO ON;

SELECT o.order_id, o.total, c.name
FROM Orders AS o
JOIN Customers AS c
    ON c.customer_id = o.customer_id
WHERE o.order_date >= '2026-01-01';

SET STATISTICS IO OFF;

A large logical reads count against a table confirms a scan; a covering index turns it into a cheap seek with far fewer reads.

Follow-up questions to expect

How do you tell the difference between an index scan and an index seek, and why does it matter?
What causes a large gap between estimated and actual rows, and how do you fix it?
What is a key lookup and how would you eliminate one?
When would a hash join or a sort spill to tempdb, and how do you spot it in the plan?

6 How do you order the columns in a composite index?

A composite index is an index on two or more columns, and the order of those columns decides whether SQL Server can perform an efficient seek or is forced into a scan. The proven rule of thumb is equality columns first, then the range column, then INCLUDE columns. You design the key to match how the query filters, joins and sorts, not by picking the columns alphabetically or by how selective each one looks in isolation.

The reason is the leftmost-prefix rule. A B-tree index is sorted by the first key column, then within that by the second, and so on, like a phone book sorted by last name then first name. The engine can only seek on a leading prefix of the key. An index on (status, created_at) can seek a query filtering on status alone, or on status and created_at together, but it cannot seek a query that filters only on created_at, because that column is not the leftmost key.

Equality predicates first. Columns compared with = (or IN) narrow the search to a contiguous run of rows and should lead the key.
Range or inequality predicate next. A single column compared with >, <, BETWEEN or LIKE 'abc%' goes immediately after the equality columns. Once you hit a range, columns after it in the key can no longer be seeked, only scanned within the range, so at most one range column earns a key position.
Output-only columns as INCLUDE. Columns that appear only in the SELECT list and are not used for filtering or ordering belong in the INCLUDE clause. They are stored at the leaf level to make the index covering without bloating the key or its sort order.

Also let the ORDER BY guide you. If a query filters on an equality column and then sorts by another column, placing that sort column next in the key lets the index return rows already ordered and skip a sort operator. Read more on how SQL Server indexes work and on performance tuning for the full picture.

In short: match the index key to the shape of the query. Equality columns pin down the starting point, one range column defines the span to read, INCLUDE columns cover the output, and the leftmost-prefix rule is the constraint that ties it all together.

Composite index that serves the query

-- Query: recent orders for one status, newest first
SELECT order_id, created_at, total
FROM Orders
WHERE status = 'shipped'
  AND created_at >= '2026-01-01'
ORDER BY created_at DESC;

-- Index: equality column, then range/sort column, then output columns
CREATE NONCLUSTERED INDEX IX_Orders_Status_Created
    ON Orders (status, created_at)
    INCLUDE (order_id, total);

status leads (equality), created_at follows to satisfy both the range filter and the ORDER BY, and order_id and total are INCLUDE columns so the index covers the query.

Why column order matters: the wrong order cannot seek

-- With IX on (created_at, status) this query CANNOT seek on status,
-- because status is not the leftmost prefix of the key.
SELECT order_id, total
FROM Orders
WHERE status = 'shipped';

-- With IX on (status, created_at) the same query seeks directly to
-- the 'shipped' rows, then reads only that contiguous range.
SELECT order_id, total
FROM Orders
WHERE status = 'shipped';

Leading with the equality column (status) lets the engine seek. Leading with created_at would force a scan when only status is filtered.

Follow-up questions to expect

What is the leftmost-prefix rule and how does it affect index seeks?
When would you put a column in INCLUDE instead of the index key?
Does the order of columns in the WHERE clause need to match the index key order?
How does an ORDER BY influence the column order you choose?

7 What is a covering index?

A covering index is a nonclustered index that already holds every column a specific query references. Because all the needed values are in the index, the query is served by reading the index alone and never has to visit the base table. When an index covers a query the phrase you use is that it covers it, and the plan collapses to a single seek.

The reason this matters is the key lookup. A nonclustered index normally stores only its key columns plus a pointer to the row (the clustered key, or a row id on a heap). If your query asks for a column that is not in the index, the engine must follow that pointer once per qualifying row to fetch the missing value. That extra hop is the key lookup (a RID lookup on a heap), and when many rows match it can dominate the cost or make the optimizer abandon the index and scan the whole table instead.

You spot the need by reading the execution plan: an Index Seek paired with a Key Lookup (often joined by a Nested Loops) is the tell. That Key Lookup is the round trip to the table for columns the index does not carry. Cover the query and it vanishes, leaving just the seek.

Key columns are the columns you search, join or order by. Their order defines the b-tree, so a seek can only use a leading prefix of the key.
INCLUDE columns are non-key columns stored only at the leaf level. They are not sorted and cannot be seeked, but they are present so a lookup is unnecessary. Put columns you merely return here.
INCLUDE also sidesteps the index key size limit and lets you carry data types that are not allowed as key columns.

The trade-off is width. Every column you add makes the index bigger on disk and in the buffer pool, and every INSERT, UPDATE or DELETE that touches those columns must maintain the index too. So cover the queries that actually hurt, not every column of the table, and keep the key narrow while INCLUDE carries the rest.

Before: a query that forces a key lookup

-- Index only has the key column
CREATE NONCLUSTERED INDEX IX_Orders_Customer
ON Orders (customer_id);

-- total and status are not in the index, so each
-- matching row triggers a Key Lookup to the table
SELECT customer_id, total, status
FROM Orders
WHERE customer_id = 42;

The plan is an Index Seek plus a Key Lookup, because total and status must be fetched from the base table.

After: add INCLUDE to make the index covering

-- Same key, but the returned columns ride along at the leaf
CREATE NONCLUSTERED INDEX IX_Orders_Customer
ON Orders (customer_id)
INCLUDE (total, status)
WITH (DROP_EXISTING = ON);

SELECT customer_id, total, status
FROM Orders
WHERE customer_id = 42;

Every referenced column now lives in the index, so the Key Lookup disappears and the plan is a single Index Seek.

Follow-up questions to expect

How do you tell from an execution plan that a query is not covered?
What is the difference between a key column and an INCLUDE column?
When would you widen the key instead of using INCLUDE?
What is the cost of adding many INCLUDE columns to an index?

8 Why should you avoid SELECT * in production queries?

SELECT * tells the server to return every column of every row the query touches. That is convenient while you are exploring, but in production code it costs you on several fronts at once, which is exactly why an interviewer raises it as a performance tuning topic.

More data on the wire and in memory. Wide tables often carry columns you do not use (long VARCHAR, NVARCHAR(MAX), blobs). Pulling them all inflates network traffic, buffer pool usage and the memory grant for sorts and joins.
It defeats covering indexes. A covering index can satisfy a query entirely from the index when you ask for only a few columns. Ask for * and the engine must go back to the base table for the missing columns, turning a cheap index seek into a key lookup per row or a full clustered index scan.
It breaks when the schema changes. Add, drop or reorder a column and SELECT * silently changes shape. Positional code, INSERT ... SELECT *, and views created WITH SCHEMABINDING can break or return the wrong column in the wrong slot.
It hides intent. Naming columns documents exactly what the query depends on, which helps reviewers, the optimizer and the next person to add an index.

The fix is simple: select only the columns you need. That keeps result sets narrow, lets a well designed index cover the query, and makes the code resilient to schema changes.

It is not that * is always wrong. For ad hoc exploration in SSMS, quick SELECT TOP (10) * checks, or an EXISTS subquery where the column list is irrelevant, it is perfectly fine. The rule is about queries that ship in application code and run at scale.

The problem: SELECT * cannot be covered by the index

-- Index on (Status) INCLUDE (CustomerId) exists
CREATE INDEX IX_Orders_Status
    ON Orders (Status) INCLUDE (CustomerId);

-- SELECT * still needs Total, CreatedAt, Notes, ... so the
-- engine does a key lookup per row or scans the whole table.
SELECT *
FROM Orders
WHERE Status = 'Open';

Because * asks for columns the index does not hold, a cheap seek degrades into per-row key lookups or a clustered index scan.

The fix: name the columns so the index covers the query

-- Ask for only the columns the index already contains
SELECT OrderId, CustomerId
FROM Orders
WHERE Status = 'Open';

-- Now IX_Orders_Status can satisfy the query with a single
-- index seek and no trip back to the base table.

A narrow, explicit column list lets the covering index answer the query end to end, moving far less data.

Follow-up questions to expect

What is a covering index, and how do you design one for a given query?
What is the difference between a key lookup and a clustered index scan?
When is SELECT * acceptable in real code?
How would a view defined as SELECT * behave after someone adds a column to the base table?

9 What is parameter sniffing and how do you deal with it?

Parameter sniffing is what happens when SQL Server compiles a parameterized query (typically a stored procedure or a query using sp_executesql) for the first time. The optimizer sniffs the actual parameter values passed on that first call, uses the statistics for those specific values to estimate row counts, and builds a plan tuned for them. That plan is then cached and reused for every later call with the same query text, regardless of the values passed next.

This is usually a good thing. Compiling a plan is expensive, so reusing a cached plan is normally a performance win. The problem appears when the data is skewed. Suppose a plan is compiled for a customer with 3 orders: the optimizer picks an index seek plus key lookups, which is perfect for 3 rows. That same plan is then reused for a customer with 3 million orders, where a table scan would have been far cheaper. Now every big-customer call runs the small-customer plan and is painfully slow. The reverse also happens: a plan compiled for the huge value scans, and small-value calls that should seek pay for a scan.

The classic symptom is "the same query is fast sometimes and slow other times", or a procedure that suddenly turns slow after a stats update, an index rebuild, or a server restart clears the plan cache and a different value gets sniffed first. See https://codewithsql.com/performance-tuning for the broader picture of how plans and statistics drive query speed.

Common ways to deal with it:

OPTION (RECOMPILE) - forces the statement to compile fresh on every execution using the current parameter values. You get an ideal plan each time at the cost of extra CPU for compilation. Great for occasional or wildly-varying queries; bad for high-frequency ones.
OPTIMIZE FOR (@p = value) - tells the optimizer to build the plan as if a specific typical value were passed, so you deliberately pick a good "average" plan.
OPTIMIZE FOR UNKNOWN - ignores the sniffed value and uses the average density from statistics instead, producing a stable, middle-of-the-road plan.
Local variables - copying parameters into local variables inside the procedure defeats sniffing, because the optimizer cannot sniff a variable and falls back to the density estimate (same effect as OPTIMIZE FOR UNKNOWN).
Splitting the procedure - branch on the parameter (for example small range vs large range) and call separate procedures so each gets its own cached plan.
Plan guides or forced plans (Query Store) - pin a known-good plan when you cannot change the code.

The key interview point is nuance: parameter sniffing is not inherently bad. Plan reuse is what makes parameterized queries fast in the first place. You only intervene when a specific query suffers from skewed data, and you pick the fix that trades the least CPU or flexibility for a stable plan.

OPTION (RECOMPILE): compile a fresh plan for each execution

CREATE OR ALTER PROCEDURE dbo.GetOrdersByCustomer
    @CustomerId INT
AS
BEGIN
    SELECT o.order_id, o.order_date, o.total
    FROM   dbo.Orders AS o
    WHERE  o.customer_id = @CustomerId
    OPTION (RECOMPILE);
END;

Every call optimizes for the actual @CustomerId, so small and large customers each get an ideal plan. The cost is a recompile on every execution.

OPTIMIZE FOR UNKNOWN: one stable plan from average density

CREATE OR ALTER PROCEDURE dbo.GetOrdersByCustomer
    @CustomerId INT
AS
BEGIN
    SELECT o.order_id, o.order_date, o.total
    FROM   dbo.Orders AS o
    WHERE  o.customer_id = @CustomerId
    OPTION (OPTIMIZE FOR UNKNOWN);
END;

Ignores the sniffed value and estimates from average statistics, giving a consistent middle-ground plan instead of one skewed toward the first value seen.

Follow-up questions to expect

When is parameter sniffing actually helpful rather than harmful?
What is the trade-off of using OPTION (RECOMPILE) on a high-frequency query?
How does OPTIMIZE FOR UNKNOWN differ from using a local variable?
How can Query Store or a plan guide force a known-good plan without changing code?

10 When can an index hurt performance?

An index is a copy of some of your data, kept sorted, so the engine can find rows without scanning the whole table. That is a great deal for reads, but it is not free. Every index is a separate structure the storage engine has to keep in sync, so the cost lands on writes and on storage. Seeing indexes as a read/write trade-off is exactly what an interviewer wants to hear. See how indexes work for the read side.

Write overhead. Every INSERT, UPDATE and DELETE must maintain every affected index, not just the table. Insert a row into a table with six nonclustered indexes and the engine does seven writes, one per index plus the base table. An UPDATE that changes an indexed column also has to move the entry to its new sorted position in each index that covers it. On a write-heavy table (an order queue, an audit log, a message table) this maintenance can cost more than the reads the index saves.

Too many indexes. Beyond the write cost, a pile of overlapping indexes gives the query optimizer more plans to consider and more chances to pick a bad one. More indexes also mean more storage and a bigger backup and maintenance footprint.
Low selectivity. An index on a column with few distinct values (a bit flag, a status with three states, gender) is rarely useful. If a value matches 40 percent of the table, a seek plus thousands of lookups is slower than a plain scan, so the optimizer ignores the index and you paid to maintain it for nothing.
Duplicate or overlapping indexes. An index on (CustomerId) is fully redundant if you also have one on (CustomerId, OrderDate), because the composite can serve any query the single-column one could. The duplicate just doubles the write work. See composite index column order.
Wide indexes. Indexing a long varchar, or stuffing many INCLUDE columns in, makes each index row large. Fewer rows fit per page, the index is bigger on disk and in memory, and every write copies more bytes.

Finding the waste. SQL Server tracks how each index is used in sys.dm_db_index_usage_stats. An index with many user_updates but zero or almost zero user_seeks, user_scans and user_lookups since the last restart is being maintained on every write while contributing nothing to reads. Those are the first candidates to drop. Note the counters reset when the instance restarts, so judge them over a representative window, not right after a reboot.

The practical rule: index for the reads you actually run, then periodically review usage and remove indexes that only ever get written. Fewer, well chosen indexes usually beat a large set of speculative ones. More on this in performance tuning.

Find unused (write-only) indexes

SELECT
    OBJECT_NAME(i.object_id) AS table_name,
    i.name                   AS index_name,
    us.user_seeks,
    us.user_scans,
    us.user_lookups,
    us.user_updates
FROM sys.indexes AS i
LEFT JOIN sys.dm_db_index_usage_stats AS us
    ON  us.object_id = i.object_id
    AND us.index_id  = i.index_id
    AND us.database_id = DB_ID()
WHERE i.type_desc <> 'HEAP'
  AND i.is_primary_key = 0
  AND i.is_unique_constraint = 0
  AND ISNULL(us.user_seeks, 0)
    + ISNULL(us.user_scans, 0)
    + ISNULL(us.user_lookups, 0) = 0
ORDER BY us.user_updates DESC;

Rows with high user_updates but no seeks, scans or lookups are pure write overhead. The counters reset on restart, so check them after the server has run for a while.

A write-heavy table where extra indexes cost more than they help

-- Every insert into this queue maintains the base table
-- plus all three nonclustered indexes: four writes per row.
CREATE TABLE dbo.OrderQueue (
    QueueId    BIGINT IDENTITY PRIMARY KEY,
    OrderId    INT      NOT NULL,
    Status     TINYINT  NOT NULL,   -- only 3 distinct values
    CreatedAt  DATETIME2 NOT NULL,
    Payload    NVARCHAR(2000) NULL
);

CREATE INDEX IX_Queue_Status  ON dbo.OrderQueue(Status);         -- low selectivity
CREATE INDEX IX_Queue_Order    ON dbo.OrderQueue(OrderId);       -- redundant with below
CREATE INDEX IX_Queue_OrderDt  ON dbo.OrderQueue(OrderId, CreatedAt);

-- High-volume insert path pays for all of them:
INSERT INTO dbo.OrderQueue (OrderId, Status, CreatedAt)
VALUES (1001, 0, SYSUTCDATETIME());

IX_Queue_Status is too low selectivity to help, and IX_Queue_Order is redundant with the composite IX_Queue_OrderDt. Both slow every insert without earning their keep.

Follow-up questions to expect

How would you decide whether to drop an index you suspect is unused?
Why does an UPDATE to an indexed column cost more than an UPDATE to a non-indexed one?
What is index selectivity and why does the optimizer ignore low-selectivity indexes?
How do INCLUDE columns affect the size and cost of a covering index?

11 What is the N plus one query problem and how do you fix it?

The N plus one query problem is a performance anti pattern most often seen with ORMs and lazy loading. Your code runs 1 query to fetch a list of parent rows (say 100 orders), then loops over the results and runs 1 more query per row to pull each order its related data (the customer, the line items). One list query plus N detail queries gives N + 1 total round trips to the database.

Each of those extra queries is cheap on its own, but the cost is the round trip: network latency, query parsing, planning and connection overhead, paid once per row. At 100 rows that is 101 round trips where 1 would do, and it gets dramatically worse as the list grows. The database is barely working, yet the request is slow because it spends its time waiting on a storm of tiny sequential calls.

The fix is to fetch the related data in bulk instead of one row at a time. There are three common approaches:

A single JOIN. Ask for parents and children together in one query so the database returns everything in one result set. This is usually the fastest option and works well for one to one and one to many links. See SQL joins for the mechanics.
A batched IN query. Collect the parent keys from the first query and load all children at once with WHERE parent_id IN (...), then stitch them together in code. That is 2 queries total regardless of N.
Eager loading. Tell the ORM up front to include the relation (Entity Framework Include, Django select_related / prefetch_related, Rails includes) so it generates a JOIN or a batched query for you instead of lazy loading per row.

To detect N plus one, watch the query log or an APM trace for many near identical parameterized statements that differ only by a single id value, for example the same SELECT ... FROM Customers WHERE customer_id = @p0 executed hundreds of times in one request. That repeated shape is the signature. If a page is slow for this reason, it is worth also checking why a query is slow more generally.

The N plus one pattern (1 list query, then 1 query per row)

-- Step 1: one query fetches the parent list
SELECT order_id, customer_id
FROM Orders
WHERE order_date >= '2026-01-01';

-- Step 2: application loops over each order and fires
-- one more query per row, so N times in total:
SELECT name, email FROM Customers WHERE customer_id = 101;
SELECT name, email FROM Customers WHERE customer_id = 102;
SELECT name, email FROM Customers WHERE customer_id = 103;
-- ... one per order = N extra round trips

The list query plus N per row lookups is N + 1 round trips. The repeated identical shape is the tell tale sign.

The fix: one JOIN returns parents and children together

SELECT o.order_id, o.order_date,
       c.name, c.email
FROM Orders AS o
INNER JOIN Customers AS c
    ON c.customer_id = o.customer_id
WHERE o.order_date >= '2026-01-01';

One query, one round trip, no matter how many orders. A batched WHERE customer_id IN (...) is a good alternative when a JOIN is awkward.

Follow-up questions to expect

How would you spot the N plus one problem in a production application?
When is a batched IN query a better fix than a single JOIN?
What is the difference between lazy loading and eager loading in an ORM?
Can a JOIN ever make things worse than N plus one, for example by duplicating parent rows?

12 Why is keyset pagination faster than OFFSET for large pages?

OFFSET pagination does not actually skip work. Even though OFFSET 100000 FETCH NEXT 20 returns only 20 rows, the engine must first read and discard those 100000 preceding rows in sort order to know where the page begins. The deeper you page, the more rows it wastes, so latency grows roughly linearly with the offset value. Page 1 is instant; page 5000 crawls.

Keyset pagination (also called seek or cursor pagination) remembers the sort key of the last row you saw and asks for the next rows directly: WHERE sort_key > @last_value ORDER BY sort_key. With an index on sort_key, the engine performs an index seek straight to that position and reads just the page you want. Cost stays near constant no matter how deep you go, because the work is proportional to the page size, not the page number.

OFFSET cost grows with the offset: it reads N + page_size rows to return page_size rows.
Keyset cost is flat: it seeks once and reads page_size rows, regardless of how far into the result set you are.
Requirement: the predicate must be sargable and the sort key must be covered by an index whose order matches the ORDER BY.
Ties: if the sort key is not unique, add a tiebreaker column (for example the primary key) to both the ORDER BY and the comparison so no rows are skipped or repeated.

The trade-off is navigation. Keyset can only move to the next (or previous) page relative to a known row, so it cannot jump to an arbitrary page number like "page 500" the way OFFSET can. That is rarely a real requirement: infinite scroll and "load more" only ever need the next page, which is exactly what keyset is built for. See performance tuning for how indexing choices drive this.

OFFSET pagination: slows down as the page number grows

SELECT order_id, customer_id, order_date
FROM Orders
ORDER BY order_date, order_id
OFFSET 100000 ROWS FETCH NEXT 20 ROWS ONLY;

The engine must read and discard the first 100000 rows in sort order before returning 20, so deep pages get progressively slower.

Keyset pagination: seek straight to the next page

-- @last_date and @last_id come from the last row of the previous page
SELECT TOP (20) order_id, customer_id, order_date
FROM Orders
WHERE order_date > @last_date
   OR (order_date = @last_date AND order_id > @last_id)
ORDER BY order_date, order_id;

With an index on (order_date, order_id) this is a single seek that reads only 20 rows, regardless of how deep you page. The order_id tiebreaker keeps paging correct when dates repeat.

Follow-up questions to expect

Why does OFFSET get slower on deeper pages if the page size is the same?
What index would you create to make the keyset query a single seek?
How do you handle rows with duplicate sort keys in keyset pagination?
What is the main thing keyset pagination cannot do that OFFSET can?

13 What are statistics and how does cardinality estimation affect plans?

Statistics are small metadata objects the query optimizer keeps about the distribution of values in a column or index. The two parts that matter are the histogram, which buckets the column values into up to 200 steps and records how many rows fall in each, and the density vector, which measures how unique the values are. SQL Server does not read your data to compile a plan; it reads these statistics instead.

When the optimizer compiles a query it uses statistics to perform cardinality estimation: for every predicate and join it guesses how many rows will flow out of each operator. Those estimated row counts are the single most important input to plan choice. From them the optimizer decides join type (a nested loop for a few rows versus a hash join for many), join order, whether to seek or scan an index, and how much memory to grant for sorts and hashes.

A good estimate leads to a plan sized correctly for the real workload.
When statistics are stale or missing, the estimate diverges from reality. You see this as a large gap between Estimated Rows and Actual Rows in the plan.
An underestimate often picks a nested loop join or a tiny memory grant, so the query spills to tempdb or loops millions of times.
An overestimate can pick a full scan and an oversized grant, starving other queries of memory.

SQL Server maintains statistics automatically when AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are on. Auto update fires after a change threshold is crossed, but on large tables that threshold can be reached long after the distribution has drifted, so the sampled snapshot goes stale. You can refresh them explicitly with UPDATE STATISTICS (optionally WITH FULLSCAN for an exact histogram rather than a sample), and you can rebuild an index to get a full-scan statistic for free.

In an interview, tie this back to symptoms: a plan that looks reasonable but runs slowly, warnings about spills, or a huge estimated-versus-actual gap all point at cardinality estimation gone wrong. Learning to read that gap is a core performance tuning skill, and it is closely related to parameter sniffing, where a cached estimate is right for one parameter and wrong for the next. See the Microsoft Learn statistics guide for the full model.

Refresh statistics with a full scan

-- Update every statistic on the table with an exact (full) scan
UPDATE STATISTICS dbo.Orders WITH FULLSCAN;

-- Or update a single named statistic, sampling a percentage
UPDATE STATISTICS dbo.Orders IX_Orders_CustomerId WITH SAMPLE 30 PERCENT;

FULLSCAN reads all rows for an exact histogram; SAMPLE is faster but approximate. Rebuilding an index also refreshes its statistics at full scan.

Inspect what the optimizer sees

-- List the statistics objects on a table
SELECT name, auto_created, user_created, filter_definition
FROM sys.stats
WHERE object_id = OBJECT_ID('dbo.Orders');

-- View the histogram, density and last update date
DBCC SHOW_STATISTICS('dbo.Orders', 'IX_Orders_CustomerId');

DBCC SHOW_STATISTICS returns the header (rows, last updated), the density vector and the histogram steps the optimizer reads to estimate cardinality.

Follow-up questions to expect

How does the auto update statistics threshold work, and why can it fire too late on large tables?
What is the difference between the histogram and the density vector?
How would you tell from an execution plan that a bad estimate caused a slow query?
When would you create filtered or multi-column statistics manually?

14 What is table partitioning and when does it help?

Table partitioning takes one logically single table and stores its rows in several physical partitions, divided by the value of a chosen partitioning column (almost always a date). To the application it is still one table you query normally; underneath, SQL Server routes each row to a partition based on a partition function (which defines the boundary values, for example the start of 2022, 2023 and 2024) mapped onto a partition scheme (which places those partitions on filegroups).

The two real payoffs are specific, and naming them is what an interviewer wants to hear:

Partition elimination - when a query filters on the partitioning column with a range predicate (for example WHERE order_date >= '2024-01-01'), the optimizer knows it only has to read the 2024 partition and skips the rest. On a table with years of history that can turn a full scan into a read of one small slice.
Partition SWITCH - you can move an entire partition in or out of the table as a metadata-only operation, with no data movement. This makes archiving old data or bulk-loading a new period almost instant, and it avoids the huge logging and locking of a DELETE or INSERT of millions of rows.

Just as important is what partitioning is not. It is a manageability and specific-query feature, not a general performance boost. Partitioning a table does not automatically make arbitrary queries faster, and if a query does not filter on the partitioning column it may actually read every partition. For most "my query is slow" problems a good index matters far more than partitioning - see https://codewithsql.com/performance-tuning for how indexing and statistics drive query speed, and https://codewithsql.com/blog-post/building-enterprise-databases-with-sql for where partitioning fits in a large schema.

Practical notes worth mentioning: the partitioning column should ideally be part of the clustered key and of unique indexes so they can be aligned with the table; SWITCH requires the source and target to have identical schema, indexes and a matching CHECK constraint; and partitioning is most valuable on genuinely large tables (tens of millions of rows and up) with a natural sliding-window pattern, such as time-series or audit data.

Create a partition function and scheme, then a partitioned table

-- Boundary values split rows into yearly ranges (RANGE RIGHT:
-- each boundary belongs to the partition on its right).
CREATE PARTITION FUNCTION pfOrderYear (DATE)
AS RANGE RIGHT FOR VALUES ('2023-01-01', '2024-01-01');

-- Map every partition onto a filegroup (PRIMARY here for brevity).
CREATE PARTITION SCHEME psOrderYear
AS PARTITION pfOrderYear ALL TO ([PRIMARY]);

-- Build the table ON the scheme, keyed by the partitioning column.
CREATE TABLE dbo.Orders
(
    order_id    BIGINT      NOT NULL,
    order_date  DATE        NOT NULL,
    total       DECIMAL(12,2) NOT NULL,
    CONSTRAINT PK_Orders PRIMARY KEY CLUSTERED (order_date, order_id)
) ON psOrderYear (order_date);

The function defines the boundaries (pre-2023, 2023, 2024+), the scheme places them, and the table is created on the scheme so rows land in the right partition automatically.

Archive an old year with a metadata-only SWITCH

-- Staging table must match schema, indexes and filegroup exactly.
CREATE TABLE dbo.Orders_2022_Archive
(
    order_id    BIGINT      NOT NULL,
    order_date  DATE        NOT NULL,
    total       DECIMAL(12,2) NOT NULL,
    CONSTRAINT PK_Orders_2022 PRIMARY KEY CLUSTERED (order_date, order_id)
) ON [PRIMARY];

-- Move partition 1 out of Orders in an instant, no data copy.
ALTER TABLE dbo.Orders
    SWITCH PARTITION 1 TO dbo.Orders_2022_Archive;

SWITCH is a catalog change, not a data move, so archiving millions of rows is near-instant and avoids the logging and locking of a bulk DELETE.

Follow-up questions to expect

Why does partitioning not speed up a query that filters on a non-partitioning column?
What must be true of two tables before you can SWITCH a partition between them?
What is an aligned index, and why does the partitioning column belong in the clustered key?
When would a well-designed index give you more benefit than partitioning the table?

15 How do you reduce locking and blocking under load?

Blocking happens when one session holds a lock that another session needs, so the second session waits. A little blocking is normal; sustained blocking under load is a design problem. The goal is to hold fewer locks, hold them for less time, and stop readers and writers from fighting over the same rows.

Keep transactions short and commit fast. Open the transaction as late as possible, do only the writes that must be atomic, then COMMIT. The longer a transaction runs, the longer its locks are held and the more sessions pile up behind it.
Access objects in a consistent order. If every code path updates Parent before Child, two sessions cannot each hold what the other needs, which is what prevents a deadlock.
Add the right indexes. A seek on a narrow index locks the handful of rows it touches; a scan takes locks across the whole table. Good indexes shrink the lock footprint of every statement. See SQL indexes.
Let readers stop blocking writers. Turn on READ_COMMITTED_SNAPSHOT so ordinary SELECT statements read the last committed version from the version store instead of taking shared locks.
Never wait on user interaction inside a transaction. A transaction that stays open while a screen waits for a click holds its locks for seconds or minutes. Fetch, let the user act, then open a short transaction to write.

READ_COMMITTED_SNAPSHOT (RCSI) is often the single biggest win. Once enabled at the database level, it changes the default READ COMMITTED isolation level to use row versioning: readers see a consistent snapshot without acquiring shared locks, so a long read no longer blocks a writer and a writer no longer blocks a reader. The cost is extra writes to tempdb for the version store and a 14 byte row overhead, which is a good trade for most OLTP workloads. Contrast the lock based and version based approaches in optimistic vs pessimistic locking.

When blocking is already happening, find the head of the chain rather than guessing. Query sys.dm_exec_requests and follow the blocking_session_id column up to the session that has no blocker of its own, or run the community script sp_whoisactive which surfaces the blocking tree, the running SQL, and wait types in one result set. See transactions, locks and deadlocks for the full picture.

Finally, know the two errors that come up here. A lock request time out (error 1222) means your session waited past SET LOCK_TIMEOUT and gave up; it is a symptom of blocking, and the transaction keeps running unless you handle it. A deadlock (error 1205) is different: two sessions block each other in a cycle, SQL Server detects it and kills one as the victim. Blocking is a queue you eventually clear; a deadlock is a cycle that can never clear on its own, which is why the engine has to intervene.

Enable READ_COMMITTED_SNAPSHOT so readers do not block writers

-- Requires a brief moment with no other connections to the database.
ALTER DATABASE Sales SET READ_COMMITTED_SNAPSHOT ON
    WITH ROLLBACK IMMEDIATE;

-- Confirm it is on (is_read_committed_snapshot_on = 1).
SELECT name, is_read_committed_snapshot_on
FROM sys.databases
WHERE name = 'Sales';

After this, the default READ COMMITTED isolation uses row versioning, so a SELECT no longer takes shared locks that block writers.

Find the session at the head of a blocking chain

SELECT
    r.session_id,
    r.blocking_session_id,
    r.wait_type,
    r.wait_time,
    r.wait_resource,
    t.text AS running_sql
FROM sys.dm_exec_requests AS r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS t
WHERE r.blocking_session_id <> 0
ORDER BY r.wait_time DESC;

The lead blocker is the session_id that appears as a blocking_session_id but is not itself blocked; sp_whoisactive shows the same chain more readably.

Follow-up questions to expect

What is the difference between error 1222 and error 1205?
What are the downsides of turning on READ_COMMITTED_SNAPSHOT?
How does accessing tables in a consistent order prevent deadlocks?
When would you use SNAPSHOT isolation instead of READ_COMMITTED_SNAPSHOT?

Keep going

Explore the other interview categories, or drill the fundamentals with the functions library and the SQL Server error library.

Get 1-on-1 Prep All Categories