1 Denormalization: When, Why, and How
Normalization eliminates redundancy. Denormalization intentionally adds it back — storing duplicate or pre-calculated data to speed up read queries at the cost of more complex writes.
Why Denormalize?
A fully normalized database might require a 5-table JOIN just to display a product page. On a high-traffic site, that JOIN runs millions of times per day. Denormalization pre-computes the result so the query becomes a simple, fast lookup.
| Scenario | Normalized | Denormalized |
|---|---|---|
| Show order with customer name | JOIN orders + customers every time | Store customer_name in orders table |
| Display product rating | AVG(rating) FROM reviews per request | Store avg_rating column on products |
| Count orders per customer | COUNT(*) GROUP BY every time | Store order_count on customers, update via trigger |
Practical Example
SELECT c.name, COUNT(o.id) AS total_orders, SUM(o.amount) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;-- Add summary columns + keep updated with trigger
ALTER TABLE customers
ADD COLUMN total_orders INT DEFAULT 0,
ADD COLUMN total_spent DECIMAL(12,2) DEFAULT 0;
DELIMITER //
CREATE TRIGGER trg_update_stats AFTER INSERT ON orders
FOR EACH ROW
BEGIN
UPDATE customers
SET total_orders = total_orders + 1, total_spent = total_spent + NEW.amount
WHERE id = NEW.customer_id;
END //
DELIMITER ;
-- Dashboard query is now instant:
SELECT name, total_orders, total_spent FROM customers;When to Denormalize vs Stay Normalized
| Denormalize When | Stay Normalized When |
|---|---|
| Read-heavy workloads (dashboards, search) | Write-heavy workloads (frequent inserts) |
| Complex JOINs cause performance issues | Data consistency is critical (finance) |
| The same aggregate is recalculated repeatedly | Data changes frequently and sync is complex |
Common Patterns
- Pre-calculated columns —
total_orders,avg_ratingmaintained by triggers - Materialized views (PostgreSQL) — a view that stores results, refreshes on schedule
- Summary tables — daily/monthly aggregates rebuilt by a scheduled procedure
- Embedding data — storing
customer_namein orders to avoid JOINs
Golden rule: Normalize first, denormalize second. Start with a properly normalized schema. Only denormalize when you have measured performance problems. Premature denormalization creates data inconsistency bugs that are very hard to track down.
Key Takeaways
- Denormalization trades write complexity for read speed
- Common patterns: pre-calculated columns, summary tables, embedded data
- Use triggers or scheduled jobs to keep denormalized data in sync
- Normalize first, then denormalize only what's proven slow