Once the basics are covered, interviewers test whether you understand the trickier distinctions. These intermediate questions cover subqueries, CTEs, window functions, set operators and the classic "what is the difference between X and Y" pairs, each with a diagram and SQL.
A subquery is a complete SELECT nested inside another statement. It comes in a few shapes: a scalar subquery returns one row and one column and can sit anywhere a single value is allowed; a multi-row subquery feeds operators like IN or EXISTS; and a derived table is a subquery in the FROM clause that you query as if it were a table.
A join works differently: it places two or more tables side by side and matches their rows on a related column, so the result set can carry columns from every table at once. That is the key contrast - a join is built to combine columns, while a subquery is often used to filter or test the outer query.
Many subqueries can be rewritten as joins and the other way around. In SQL Server the query optimizer frequently flattens an IN or EXISTS subquery into a semi join internally, so the two forms often produce the same execution plan and similar performance. The choice is then mostly about clarity.
One trap with scalar subqueries: if a subquery used where a single value is expected returns more than one row, SQL Server raises error 512 (subquery returned more than 1 value). A join would instead return the extra rows, so understanding this difference in behavior is part of choosing the right tool.
SELECT c.name
FROM Customers AS c
WHERE c.customer_id IN (
SELECT o.customer_id
FROM Orders AS o
);
The inner query returns a set of customer ids and the outer query keeps customers whose id is in that set.
SELECT DISTINCT c.name
FROM Customers AS c
INNER JOIN Orders AS o
ON o.customer_id = c.customer_id;
The join matches the tables directly; DISTINCT removes duplicates since a customer with many orders would otherwise repeat.
Related: What is the difference between EXISTS and IN? · What is a correlated subquery? · What is a SQL JOIN and what are the main types?
A common table expression, or CTE, is a named temporary result set that you define with a WITH clause at the top of a query. It exists only for the duration of the single statement that immediately follows it, then disappears. Think of it as a query-scoped alias for a subquery that you can refer to by name.
The main reason to use one is readability. Instead of burying logic in a deeply nested subquery, you name each step and read the query top to bottom. Because the CTE has a name, you can also reference it more than once in the same statement without repeating the SQL, which a plain derived table in the FROM clause cannot do.
An important and often-tested detail: a CTE is not materialized or cached by default in SQL Server. The optimizer inlines its definition into the outer query, so referencing a CTE twice can cause its underlying query to be evaluated twice. A CTE does not by itself make a query faster; it is a naming and structuring tool. If you genuinely need to compute a result once and reuse it, store it in a temp table or table variable instead.
Compared with a derived table (a subquery in the FROM clause), a CTE is functionally similar but sits above the main query and can be named and reused. Compared with a view, a CTE is disposable and local to one statement, whereas a view is persisted metadata that also serves as an access-control and abstraction layer. Microsoft documents the full syntax in the WITH common_table_expression reference.
WITH BigOrders AS (
SELECT customer_id, total
FROM Orders
WHERE total > 1000
)
SELECT customer_id, COUNT(*) AS big_order_count, SUM(total) AS big_total
FROM BigOrders
GROUP BY customer_id;
The BigOrders CTE names the filtered set once, and the main query reads cleanly against that name.
WITH OrderTotals AS (
SELECT customer_id, SUM(total) AS lifetime_value
FROM Orders
GROUP BY customer_id
),
Ranked AS (
SELECT customer_id, lifetime_value,
RANK() OVER (ORDER BY lifetime_value DESC) AS value_rank
FROM OrderTotals
)
SELECT customer_id, lifetime_value
FROM Ranked
WHERE value_rank <= 10;
Two CTEs separated by a comma: Ranked reads from OrderTotals, so the logic flows top to bottom instead of nesting.
Related: What is a recursive CTE and when do you need one? · What is the difference between a subquery and a join, and when do you use each? · What is a view and what are its benefits and limits?
A window function performs a calculation across a set of rows that are related to the current row, called the window, and returns a value for every input row. You attach it to an aggregate or ranking function with the OVER clause: OVER(PARTITION BY ... ORDER BY ...). The PARTITION BY splits the rows into groups, and the optional ORDER BY orders the rows inside each partition (which matters for ranking, running totals and offsets).
The key contrast with GROUP BY is what happens to the rows. GROUP BY collapses each group into one summary row, so the individual detail rows disappear and you can only select grouped columns and aggregates. A window function keeps every row and simply adds the computed value as an extra column, so you can show a detail row and its group total side by side without a self join or subquery.
Window functions fall into three families:
ROW_NUMBER, RANK and DENSE_RANK number or rank rows within a partition. See the ROW_NUMBER function reference.SUM, AVG, COUNT, MIN and MAX used with OVER to produce group totals or running totals alongside each row.LAG and LEAD read a value from a previous or following row in the ordered partition, which is ideal for period over period comparisons.A window function always runs after the WHERE, GROUP BY and HAVING steps, which is why you cannot filter directly on one in a WHERE clause; you wrap the query in a CTE or subquery and filter the outer query instead. For a fuller tour with more patterns, see the SQL window functions guide.
SELECT
emp_id,
department,
salary,
SUM(salary) OVER (PARTITION BY department) AS dept_total
FROM Employees;
Every employee row is returned, with the department total repeated on each row. No detail is lost.
SELECT
department,
SUM(salary) AS dept_total
FROM Employees
GROUP BY department;
One row per department only. The individual emp_id and salary values are gone because the rows are collapsed.
Related: What is the difference between ROW_NUMBER, RANK and DENSE_RANK? · How do you calculate a running total with a window function? · How do the LAG and LEAD window functions work?
ROW_NUMBER, RANK and DENSE_RANK are all window functions that number rows within an ordered group. They share the same syntax, an OVER (PARTITION BY ... ORDER BY ...) clause, so the only thing that changes is how each one treats ties (rows with equal values in the ORDER BY).
1, 2, 3, 4. Even when two rows tie, it breaks the tie arbitrarily and never repeats a number.1, and the next row gets 3, so the sequence is 1, 1, 3 with a gap.1 and the next row gets 2, producing 1, 1, 2 with no gaps.A simple way to remember it: RANK counts positions (like Olympic medals, where two silvers mean no bronze), while DENSE_RANK counts distinct values. ROW_NUMBER ignores ties entirely and just hands out a running counter.
Which one to pick depends on the goal. Use ROW_NUMBER when you need exactly one row per group, for example the single latest order per customer or a deterministic page of results. Use RANK or DENSE_RANK when tied rows should genuinely share a placing, such as leaderboards or nth-highest queries; choose DENSE_RANK when you want the nth distinct value and RANK when the gaps are meaningful.
Because ROW_NUMBER is non-deterministic across ties, always add a tie-breaker column to the ORDER BY if you need repeatable output.
SELECT
name,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rn,
RANK() OVER (ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY salary DESC) AS drnk
FROM Employees;
For salaries 5000, 5000, 4000 you get rn = 1,2,3 (unique), rnk = 1,1,3 (gap after the tie) and drnk = 1,1,2 (no gap).
WITH ranked AS (
SELECT
department_id,
name,
salary,
ROW_NUMBER() OVER (
PARTITION BY department_id
ORDER BY salary DESC, emp_id
) AS rn
FROM Employees
)
SELECT department_id, name, salary
FROM ranked
WHERE rn = 1;
ROW_NUMBER guarantees exactly one row per department. The extra ORDER BY emp_id is a tie-breaker so the result is deterministic.
Related: What is a window function and how is it different from GROUP BY? · Write a query to find the second highest salary. · How do you get the top N rows per group (for example top 3 per category)?
The set operators combine the output of two SELECT statements vertically, stacking one result set on top of the other rather than joining them side by side. To use them the two queries must return the same number of columns, in the same order, with compatible data types. The column names of the final result come from the first query.
A key detail interviewers listen for: INTERSECT and EXCEPT, like UNION, remove duplicates and compare whole rows, and they treat two NULL values as equal. This is different from a regular comparison such as =, where NULL = NULL is unknown rather than true, which makes these operators handy for comparing two data sets that may contain NULLs.
EXCEPT is a clean way to answer "which rows are missing" without writing an outer join and an IS NULL filter. It is closely related to an anti join, and for keeping duplicates you should understand UNION versus UNION ALL. See the Microsoft docs on EXCEPT and INTERSECT for the full rules.
Precedence matters when you mix operators in one statement: INTERSECT binds tighter than UNION and EXCEPT, so wrap parts in parentheses when the intended order is not obvious. An ORDER BY can appear only once, at the very end, and it sorts the combined result.
SELECT email FROM Customers
INTERSECT
SELECT email FROM Employees
ORDER BY email;
Returns only the email values present in both tables, with duplicates removed and NULLs matched as equal.
SELECT product_id FROM Products
EXCEPT
SELECT product_id FROM OrderItems;
Returns product_id values in Products that do not appear in OrderItems, a concise way to find missing rows without a LEFT JOIN and IS NULL check.
Related: What is the difference between UNION and UNION ALL? · What is an anti join and how do you find rows with no match? · What is the difference between a subquery and a join, and when do you use each?
A view is a named, virtual table defined by a stored SELECT statement. It has no data of its own: when you query a view, the database expands its definition and runs the underlying query against the base tables, then returns the result as if it were a table. You can read more in SQL views explained.
Views deliver three main benefits:
SELECT * FROM SalesSummary instead of repeating the whole query.The key limits to mention are:
INSERT, UPDATE, or DELETE through a view only when it maps cleanly to one base table. Views using JOIN, GROUP BY, DISTINCT, aggregates, or UNION are generally read only unless you add an INSTEAD OF trigger.Because a view is just a saved query, it is closely related to a CTE: a CTE is scoped to a single statement, while a view is a reusable database object.
CREATE VIEW dbo.CustomerOrderSummary
AS
SELECT c.customer_id,
c.name,
COUNT(o.order_id) AS order_count,
SUM(o.total) AS lifetime_value
FROM Customers AS c
LEFT JOIN Orders AS o
ON o.customer_id = c.customer_id
GROUP BY c.customer_id, c.name;
The view stores only this definition, not the rows; the query runs against the base tables each time it is used.
SELECT name, lifetime_value
FROM dbo.CustomerOrderSummary
WHERE order_count > 0
ORDER BY lifetime_value DESC;
Callers treat the view as a virtual table and never repeat the join or GROUP BY.
Related: What is a common table expression (CTE)? · What is an index and how does it speed up queries? · How do you enforce referential integrity?
An index is a separate, sorted data structure that the database keeps alongside a table so it can find rows quickly. The classic analogy is the index at the back of a book: instead of reading every page to find a topic, you look it up in the sorted index and jump straight to the right page. Without an index the engine must do a table scan, reading every row to check the filter.
Most indexes are stored as a B-tree (balanced tree). The keys are held in sorted order across a shallow tree of pages, so the engine starts at the root and follows a few pointers down to the leaf that holds the value it wants. Because the tree stays balanced and shallow, a seek touches only a handful of pages even in a table with millions of rows, turning a linear scan into something close to a logarithmic lookup. See how SQL indexes work for a deeper walk through.
SQL Server has two broad kinds. A clustered index defines the physical order of the table itself, so the table data is the leaf level of the tree; a table can have only one. A non-clustered index is a separate structure whose leaves hold the key plus a pointer back to the row, and a table can have many. A non-clustered index that lacks a needed column may still trigger a lookup back into the table, which is why INCLUDEd columns and covering indexes matter.
Indexes are a trade-off. Reads that filter or sort on the indexed columns get much faster, but every INSERT, UPDATE and DELETE now has to maintain the index too, so writes are slower and each index costs extra storage. The rule of thumb is to index the columns you actually query on and no more.
WHERE (an equality or range that returns a small fraction of rows) benefits most.JOIN columns benefit, because the engine can seek into the other table instead of scanning it.ORDER BY and GROUP BY can skip a sort when the data is already in index order.For more on measuring the payoff, see how SQL indexes improve performance.
CREATE NONCLUSTERED INDEX IX_Orders_CustomerId
ON Orders (customer_id)
INCLUDE (order_date, total);
The key lets the engine seek by customer_id; INCLUDE stores order_date and total in the leaf so the query is covered without a lookup back to the table.
SELECT order_date, total
FROM Orders
WHERE customer_id = 4271
ORDER BY order_date;
The selective WHERE lets the engine seek straight to one customer, and the index order supplies the ORDER BY for free, so no table scan or sort is needed.
Related: What is the difference between a clustered and a non-clustered index? · How does an index improve query performance? · What is a composite key and when should you use one?
Both are B-tree structures that speed up lookups, but they differ in what sits at the leaf level. A clustered index is not a copy of the table sitting beside it; it is the table, physically stored and sorted by the index key. Because a table can only be laid out in one physical order, you get at most one clustered index per table. Its leaf pages hold the full data rows in key order.
A non-clustered index is a separate structure that stores the index key columns in sorted order, and each leaf entry carries a row locator that points back to the underlying row. If the table has a clustered index, that locator is the clustered key; if the table is a heap (no clustered index), it is a physical RID (file, page, slot). You can create many non-clustered indexes on one table. See the SQL indexes guide for a fuller walk-through.
PRIMARY KEY creates a clustered index by default (unless you ask for NONCLUSTERED), and a UNIQUE constraint defaults to non-clustered.The practical cost of a non-clustered index shows up as the key lookup (or RID lookup): when a query filters using the index but also selects columns the index does not contain, the engine must follow the locator back to the clustered index or heap to fetch the missing columns. Each lookup is a separate read, so this gets expensive over many rows.
You avoid lookups with included columns. Adding INCLUDE (col1, col2) stores extra non-key columns at the leaf level of the non-clustered index, so the index can satisfy the query on its own. When an index contains every column a query needs, it becomes a covering index and the lookup disappears.
CREATE CLUSTERED INDEX IX_Orders_OrderDate
ON Orders (order_date);
The rows of Orders are now stored sorted by order_date; there can be only one such index per table.
CREATE NONCLUSTERED INDEX IX_Orders_CustomerId
ON Orders (customer_id)
INCLUDE (order_date, total);
The key is customer_id; INCLUDE stores order_date and total at the leaf so a query selecting those columns needs no lookup.
Related: What is an index and how does it speed up queries? · What is a covering index and how does an included column help? · How do you order the columns in a composite index?
A transaction is a group of one or more statements treated as a single, indivisible unit of work. Either every statement succeeds and the changes are made permanent with COMMIT, or something fails and every change is undone with ROLLBACK. The classic example is transferring money between two accounts: debiting one and crediting the other must both happen, or neither should.
The guarantees a transaction provides are summed up by the acronym ACID:
In T-SQL you open a transaction with BEGIN TRAN, confirm it with COMMIT, or undo it with ROLLBACK. In production code you normally wrap the work in a TRY...CATCH block so that any error rolls the whole thing back instead of leaving a partial write. See transactions, locks and deadlocks for how this interacts with concurrency.
For a deeper walkthrough with examples, read SQL transactions and ACID explained. Getting ACID right is the difference between a database you can trust with money and one you cannot.
BEGIN TRAN;
UPDATE Accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE Accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;
Both updates commit together. If you issued ROLLBACK instead of COMMIT, neither update would take effect.
BEGIN TRY
BEGIN TRAN;
UPDATE Accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE Accounts SET balance = balance + 100 WHERE account_id = 2;
COMMIT;
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
ROLLBACK;
THROW;
END CATCH;
If either statement errors, control jumps to CATCH, the transaction is rolled back, and THROW re-raises the error to the caller.
Related: What are the transaction isolation levels and what anomalies do they prevent? · How do you use a transaction inside a stored procedure? · How do you handle errors in T-SQL with TRY CATCH?
Both IN and EXISTS let you filter a query using the result of a subquery, but they ask different questions. IN asks "is this value equal to any value in this list?" and works over a set of scalar values. EXISTS asks "does this correlated subquery return at least one row?" and only cares about presence, not about the values themselves.
Because EXISTS only checks for the existence of a row, it short-circuits: the moment the subquery finds one matching row it stops and returns true. That is why an EXISTS subquery often uses SELECT 1 instead of selecting real columns. IN instead materializes the full list of values before comparing, which is why it pairs naturally with a subquery that returns a single column.
The most important difference in an interview is how they handle NULLs. A NOT IN list that contains even one NULL can silently return no rows at all, because SQL cannot prove a value is "not equal to" an unknown. NOT EXISTS does not have this problem: it evaluates to true or false per row and treats a non-matching correlated subquery as simply "no row found".
NULL in the list = zero rows returned, a classic and hard-to-spot bug.On performance the two are usually similar on modern optimizers, which often rewrite one form into the other or into a semi join. The practical rule is: reach for EXISTS / NOT EXISTS when the subquery is correlated or when negation and NULLs are in play, and use IN for a short static list or a simple uncorrelated subquery. See the EXISTS documentation on Microsoft Learn for the exact semantics.
SELECT c.name
FROM Customers AS c
WHERE EXISTS (
SELECT 1
FROM Orders AS o
WHERE o.customer_id = c.customer_id
);
The correlated subquery references c.customer_id and returns true the instant one matching order is found.
-- If any returned customer_id is NULL, this returns ZERO rows:
SELECT c.name
FROM Customers AS c
WHERE c.customer_id NOT IN (
SELECT o.customer_id FROM Orders AS o
);
-- Safe rewrite: finds customers with no orders, NULLs and all:
SELECT c.name
FROM Customers AS c
WHERE NOT EXISTS (
SELECT 1
FROM Orders AS o
WHERE o.customer_id = c.customer_id
);
A single NULL in the NOT IN list makes every comparison unknown, so no row qualifies. NOT EXISTS is unaffected.
Related: What is a correlated subquery? · What is an anti join and how do you find rows with no match? · What is a NULL value and how is it different from zero or an empty string?
COALESCE and ISNULL both replace a NULL with a fallback value, but they are not the same function. COALESCE is part of the ANSI SQL standard and works across most database engines, while ISNULL is proprietary to SQL Server. You can read the full reference on the COALESCE function page.
COALESCE(a, b, c, ...) accepts two or more arguments and returns the first one that is not NULL. ISNULL(check, replacement) takes exactly two arguments.COALESCE uses data type precedence rules across all of its arguments (it is essentially shorthand for a CASE expression). ISNULL returns the data type of the first argument, which can silently truncate the replacement value.ISNULL is often marked NOT NULL, whereas one built with COALESCE is usually treated as NULLable. This matters when a SELECT INTO creates a new table from the expression.A subtle trap: because COALESCE expands to a CASE expression, an argument that contains a subquery can be evaluated more than once. If that subquery is expensive or non-deterministic, COALESCE may run it twice, while ISNULL evaluates each argument once. For simple two argument NULL replacement inside SQL Server, ISNULL is fine; when you need portability, more than two fallbacks, or standard behaviour, prefer COALESCE.
See the official Microsoft Learn COALESCE reference for the full precedence rules.
SELECT
emp_id,
COALESCE(mobile_phone, home_phone, work_phone, 'no phone on file') AS contact
FROM Employees;
COALESCE scans the arguments left to right and returns the first one that is not NULL, so it can chain many fallbacks in a single expression.
SELECT
order_id,
ISNULL(discount, 0) AS discount
FROM Orders;
ISNULL takes exactly two arguments and returns the data type of the first, so a longer replacement string could be truncated to that column length.
Related: What is a NULL value and how is it different from zero or an empty string? · What is the CASE expression and how do you use it? · Count rows per group including groups with zero matches.
The CASE expression is SQL's way of writing if/then/else logic inline. It evaluates a list of conditions in order and returns the value from the first branch that is true, which makes it the go-to tool for deriving a column, bucketing values, or reshaping data without leaving the query.
There are two forms. A searched CASE writes a full boolean test after each WHEN (for example WHEN price > 100), so it can compare ranges, use AND/OR, and test different columns. A simple CASE names an expression once after CASE and then lists values to match (CASE status WHEN 1 THEN ...); it only checks equality against that single expression. The searched form is more flexible and is what you will use most often.
Because CASE is an expression, not a statement, it can appear almost anywhere a value is allowed: in the SELECT list, in WHERE, in ORDER BY for custom sort orders, in GROUP BY, and inside aggregate functions. If none of the WHEN branches match and there is no ELSE, the result is NULL, so add an ELSE when you want a defined default.
WHEN wins, so order your conditions from most specific to most general.SUM(CASE WHEN region = 'East' THEN 1 ELSE 0 END), lets you count or total only the rows that match a condition. This trick is how you pivot rows into columns and build cross-tab reports in one pass.CASE also underpins several everyday patterns: it is the manual equivalent of COALESCE for NULL handling, and understanding it is the foundation for pivoting rows to columns. See the CASE function reference for the full syntax and more examples.
SELECT
customer_id,
total,
CASE
WHEN total >= 1000 THEN 'VIP'
WHEN total >= 100 THEN 'Regular'
ELSE 'New'
END AS tier
FROM Orders
ORDER BY total DESC;
Conditions are checked top to bottom; the first true WHEN wins, and ELSE catches everything else.
SELECT
customer_id,
SUM(CASE WHEN status = 'Shipped' THEN 1 ELSE 0 END) AS shipped,
SUM(CASE WHEN status = 'Pending' THEN 1 ELSE 0 END) AS pending,
SUM(CASE WHEN status = 'Cancelled' THEN 1 ELSE 0 END) AS cancelled
FROM Orders
GROUP BY customer_id;
Wrapping CASE in SUM turns rows into columns, counting only the rows that match each condition in a single pass.
Related: What is the difference between COALESCE and ISNULL? · How do you turn rows into columns (pivot monthly sales)? · What are aggregate functions in SQL?
Start by deciding what "duplicate" means. Two rows are duplicates when the columns you care about hold the same values, even if other columns (like a surrogate id) differ. Once you know those key columns, finding duplicates is a counting problem: group by the key columns and keep only the groups that appear more than once.
The classic approach is GROUP BY with HAVING. You group on the key columns and filter with HAVING COUNT(*) > 1. Remember that HAVING filters groups after aggregation, whereas WHERE filters individual rows before grouping, so the count test has to live in HAVING. This tells you which value combinations are duplicated and how many times.
The limitation is that GROUP BY collapses each group into a single output row, so you see the duplicated values but not the individual offending rows. When you need the actual rows (for example to inspect or delete them), use ROW_NUMBER() with PARTITION BY on the key columns. Each partition is numbered from 1, so any row with rn > 1 is a repeat of one you have already seen.
PARTITION BY for the columns that define a duplicate and ORDER BY for the tie-breaker that decides which copy to keep.Once you have identified them, removing duplicates is usually done with the same ROW_NUMBER() query wrapped in a CTE and a DELETE. See the walkthrough on how to delete duplicate rows for the safe pattern that keeps exactly one copy of each.
SELECT email, COUNT(*) AS copies
FROM Customers
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY copies DESC;
Groups rows by email and returns only the values that occur more than once, with a count of how many times.
WITH Ranked AS (
SELECT
customer_id,
email,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY customer_id
) AS rn
FROM Customers
)
SELECT customer_id, email, rn
FROM Ranked
WHERE rn > 1;
PARTITION BY email restarts the counter for each email; rows with rn > 1 are the extra copies. Swap the final SELECT for DELETE to remove them.
Related: How do you delete duplicate rows but keep one copy? · What does DISTINCT do and how do you remove duplicate rows? · What does the GROUP BY clause do?
An anti join is the opposite of an inner join. Where an inner join keeps only rows that match, an anti join keeps only the rows in one table that have no matching row in the other. The classic interview phrasing is "find the customers who have never placed an order" or "list products that were never sold". SQL has no dedicated ANTI JOIN keyword, so you express the pattern with one of three idioms.
1. LEFT JOIN ... WHERE right.key IS NULL. Do a LEFT JOIN so every left row survives, then filter to the rows where the right side came back NULL. Those NULLs are exactly the rows that found no match. It is readable and works everywhere, but you must filter on a right column that is never NULL in real data, such as its primary key.
2. NOT EXISTS (preferred). A correlated NOT EXISTS subquery is usually the best choice: it reads as "keep this row when no matching row exists", the optimizer turns it into an efficient anti semi join, and it is NULL-safe. EXISTS only cares whether a row is returned, so NULLs in the joined column do not corrupt the result.
3. NOT IN (use with care). NOT IN (subquery) looks clean but has a notorious trap: if the subquery returns even one NULL, the whole NOT IN evaluates to unknown and the query returns zero rows. This is because x NOT IN (1, NULL) can never be proven true. Only use NOT IN when the column is guaranteed NOT NULL, otherwise reach for NOT EXISTS.
Which to prefer: default to NOT EXISTS for correctness and performance. Use LEFT JOIN ... IS NULL when it reads more naturally to your team or you already have the join. Avoid NOT IN on nullable columns. See also EXISTS vs IN and INNER vs LEFT JOIN.
SELECT c.customer_id, c.name
FROM Customers AS c
LEFT JOIN Orders AS o
ON o.customer_id = c.customer_id
WHERE o.customer_id IS NULL;
Every customer survives the LEFT JOIN; keeping only the NULL right-side rows leaves customers with no order.
SELECT c.customer_id, c.name
FROM Customers AS c
WHERE NOT EXISTS (
SELECT 1
FROM Orders AS o
WHERE o.customer_id = c.customer_id
);
NOT EXISTS returns a customer only when no matching order exists, and NULLs in Orders cannot break it.
Related: What is the difference between an INNER JOIN and a LEFT JOIN? · What is the difference between EXISTS and IN? · Find all customers who have never placed an order.
Explore the other interview categories, or drill the fundamentals with the functions library and the SQL Server error library.