SQL Practice Problem

Cumulative count of new customers

Hard CTEs & Recursion

Schema and sample data

Orders
order_idcustomer_idorder_date
112024-01-01
222024-01-01
312024-01-02
432024-01-02
522024-01-03
642024-01-03
752024-01-03
Show setup SQL (copy to run)
CREATE TABLE Orders (
    order_id    INT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date  DATE NOT NULL
);

INSERT INTO Orders (order_id, customer_id, order_date) VALUES
(1, 1, '2024-01-01'),
(2, 2, '2024-01-01'),
(3, 1, '2024-01-02'),
(4, 3, '2024-01-02'),
(5, 2, '2024-01-03'),
(6, 4, '2024-01-03'),
(7, 5, '2024-01-03');
Your task

The Orders table records every order placed, and the same customer_id can appear on several different days. For each day on which a customer placed their first ever order, return three columns: first_order_date, new_customers (the number of customers whose very first order falls on that day), and cumulative_customers (the running total of distinct customers seen from the earliest day up to and including that day). Sort the result by first_order_date ascending.

Expected result

Your query should return
first_order_datenew_customerscumulative_customers
2024-01-0122
2024-01-0213
2024-01-0325
Show hint

Do it in two steps. First collapse each customer to a single row holding their earliest order date with MIN(order_date) grouped by customer_id. Then group that result by the first date to count how many customers start on each day, and add a running SUM() window function ordered by the date.

Try to write the query yourself before you open this.

Show solution and explanation
WITH first_orders AS (
    SELECT customer_id, MIN(order_date) AS first_date
    FROM Orders
    GROUP BY customer_id
),
daily_new AS (
    SELECT first_date, COUNT(*) AS new_customers
    FROM first_orders
    GROUP BY first_date
)
SELECT
    first_date AS first_order_date,
    new_customers,
    SUM(new_customers) OVER (ORDER BY first_date) AS cumulative_customers
FROM daily_new
ORDER BY first_order_date;

The trick is to think "first seen, then cumulate". A customer is only new on the day of their earliest order, so the first CTE, first_orders, reduces every customer to exactly one row using MIN(order_date) GROUP BY customer_id. In the sample data customer 1 orders on Jan 1 and Jan 2, but MIN pins their first date to Jan 1; customer 2 orders on Jan 1 and Jan 3, so their first date is Jan 1 as well.

The second CTE, daily_new, groups those one-row-per-customer results by first_date and counts them. That gives 2 new customers on Jan 1 (customers 1 and 2), 1 on Jan 2 (customer 3), and 2 on Jan 3 (customers 4 and 5). Because each customer contributes to exactly one day, summing new_customers can never double count anyone.

Finally SUM(new_customers) OVER (ORDER BY first_date) turns the per-day counts into a running total: 2, then 2+1=3, then 3+2=5. The default window frame for an ordered SUM is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which is exactly the "everything up to today" cumulative behaviour we want. See SUM for the running-total pattern, and ROW_NUMBER for an alternative way to flag each customer's first order.

The most common mistake is to skip the first CTE and instead run COUNT(DISTINCT customer_id) per day. That counts a customer as "new" on every day they order, so customer 1 would be counted again on Jan 2 and the cumulative total would overshoot. Collapsing to the first order date first is what keeps the count of distinct customers honest.

Keep practising

Work through more Hard exercises, or test yourself with the SQL interview questions.