CodeWithSQL.com
HomeSQL Intermediate CourseDatabase Normalization

Database Normalization

Master database normalization from First Normal Form (1NF) through Boyce-Codd Normal Form (BCNF). Learn to design efficient, redundancy-free database schemas.

2 Topics Chapter 2 of 6 Intermediate

1 Introduction to Normalization

Top

Normalization is the process of organizing a database to reduce redundancy (duplicate data) and prevent anomalies (inconsistencies when inserting, updating, or deleting data). It's a set of rules — called normal forms — that guide how you split data across tables.

The Problem: An Unnormalized Table

Imagine storing everything in one flat table:

Unnormalized: orders_flat
order_id | customer | customer_email       | product       | category    | price | qty
---------|----------|----------------------|---------------|-------------|-------|----
1        | Sara     | sara@email.com       | SQL Course    | Education   | 49.99 | 1
2        | Sara     | sara@email.com       | Mentorship    | Education   | 199   | 1
3        | James    | james@email.com      | SQL Course    | Education   | 49.99 | 2
4        | Sara     | sara_NEW@email.com   | Database Book | Books       | 29.99 | 1

Spot the problems?

  • Redundancy: Sara's name appears 3 times. "SQL Course" and its price/category are repeated for every order.
  • Update anomaly: Sara's email changed in row 4 but not in rows 1-2. Which one is correct?
  • Insert anomaly: You can't add a new product unless someone orders it (no order_id to use).
  • Delete anomaly: If James cancels order 3, you lose the fact that James exists entirely.

The Solution: Split Into Related Tables

SQL-- Normalized into 3 tables CREATE TABLE customers ( id INT PRIMARY KEY, name VARCHAR(100), email VARCHAR(150) UNIQUE ); CREATE TABLE products ( id INT PRIMARY KEY, name VARCHAR(100), category VARCHAR(50), price DECIMAL(8,2) ); CREATE TABLE orders ( id INT PRIMARY KEY, customer_id INT REFERENCES customers(id), product_id INT REFERENCES products(id), quantity INT );

Now Sara's email exists in one place. Update it once, and every query that JOINs to the customers table gets the correct value. Products can exist without orders. Deleting an order doesn't delete the customer.

Think of it this way: Normalization is like the "Don't Repeat Yourself" (DRY) principle in programming. If the same data exists in two places, it will eventually become inconsistent. Store each fact exactly once.

Key Takeaways

  • Normalization organizes data to eliminate redundancy and prevent anomalies
  • Unnormalized tables lead to update, insert, and delete anomalies
  • The solution is splitting data into related tables connected by foreign keys
  • Each fact should be stored exactly once in the database

2 Understanding 1NF, 2NF, 3NF, BCNF

Top

Normal forms are progressive levels of organization. Each level builds on the previous one. In practice, most databases aim for 3NF — it eliminates the vast majority of problems without overcomplicating the schema.

First Normal Form (1NF)

Rule: Every column must contain atomic (single, indivisible) values. No repeating groups, no lists in a cell.

Violates 1NF
student_id | name  | courses
-----------|-------|---------------------------
1          | Sara  | SQL, Python, Java        <-- multiple values in one cell!
2          | James | SQL, JavaScript
Fixed: 1NF Compliant
student_id | name  | course
-----------|-------|----------
1          | Sara  | SQL
1          | Sara  | Python
1          | Sara  | Java
2          | James | SQL
2          | James | JavaScript

Or better — a separate enrollment table:

SQLCREATE TABLE students (id INT PRIMARY KEY, name VARCHAR(100)); CREATE TABLE enrollments ( student_id INT REFERENCES students(id), course VARCHAR(50), PRIMARY KEY (student_id, course) );

Second Normal Form (2NF)

Rule: Must be in 1NF, and every non-key column must depend on the entire primary key — not just part of it. This only matters for composite (multi-column) primary keys.

Violates 2NF (composite key: student_id + course_id)
student_id | course_id | student_name | course_name | grade
-----------|-----------|--------------|-------------|------
1          | 101       | Sara         | SQL Basics  | A
1          | 102       | Sara         | Python      | B
2          | 101       | James        | SQL Basics  | A+

student_name depends only on student_id, not the full key. course_name depends only on course_id. Only grade depends on the full composite key.

Fixed: 2NF-- Split into 3 tables CREATE TABLE students (id INT PRIMARY KEY, name VARCHAR(100)); CREATE TABLE courses (id INT PRIMARY KEY, name VARCHAR(100)); CREATE TABLE grades ( student_id INT REFERENCES students(id), course_id INT REFERENCES courses(id), grade CHAR(2), PRIMARY KEY (student_id, course_id) );

Third Normal Form (3NF)

Rule: Must be in 2NF, and no non-key column can depend on another non-key column. Every non-key column must depend directly on the primary key.

Violates 3NF
employee_id | name  | dept_id | dept_name    | dept_location
------------|-------|---------|--------------|-------------
1           | Sara  | 10      | Engineering  | Floor 3
2           | James | 20      | Marketing    | Floor 5
3           | Priya | 10      | Engineering  | Floor 3

dept_name and dept_location depend on dept_id, not on employee_id. This is a transitive dependency: employee → dept_id → dept_name.

Fixed: 3NFCREATE TABLE departments ( id INT PRIMARY KEY, name VARCHAR(50), location VARCHAR(50) ); CREATE TABLE employees ( id INT PRIMARY KEY, name VARCHAR(100), dept_id INT REFERENCES departments(id) );

The 3NF rule in plain English: "Every non-key column must provide a fact about the key, the whole key, and nothing but the key — so help me Codd." (A famous saying in the database community, referring to Edgar F. Codd who invented the relational model.)

Boyce-Codd Normal Form (BCNF)

BCNF is a stricter version of 3NF. The rule: every determinant must be a candidate key. In simpler terms, if column A determines column B, then A must be a key (or part of a key).

BCNF violations are rare in practice. They typically occur with overlapping composite keys. Here's the classic example:

Violates BCNF
student | subject | professor
--------|---------|----------
Sara    | SQL     | Dr. Khan       -- Dr. Khan teaches SQL
Sara    | Python  | Dr. Lee        -- Dr. Lee teaches Python
James   | SQL     | Dr. Khan       -- Each professor teaches only ONE subject

Problem: professor → subject (each professor teaches one subject), but professor isn't a key. Fix: split into two tables.

Fixed: BCNFCREATE TABLE professor_subjects ( professor VARCHAR(100) PRIMARY KEY, subject VARCHAR(50) ); CREATE TABLE enrollments ( student VARCHAR(100), professor VARCHAR(100) REFERENCES professor_subjects(professor), PRIMARY KEY (student, professor) );

Quick Reference: Normal Forms Summary

FormRuleFixesExample
1NFAtomic values only, no repeating groupsLists in cells"SQL, Python" → separate rows
2NFNo partial dependencies on composite keysColumns depending on part of the keystudent_name depends only on student_id
3NFNo transitive dependenciesNon-key → non-key dependenciesdept_name depends on dept_id, not employee_id
BCNFEvery determinant is a candidate keyOverlapping composite key issuesprofessor determines subject, but isn't a key

Don't over-normalize. Aiming for 3NF is the sweet spot for most applications. Going beyond (4NF, 5NF) splits tables so aggressively that queries require many JOINs, hurting performance. Sometimes strategic denormalization is the right call — we cover that in the Advanced course.

Key Takeaways

  • 1NF: No lists in cells — every value must be atomic
  • 2NF: No partial dependencies — non-key columns depend on the whole key
  • 3NF: No transitive dependencies — non-key columns depend only on the key
  • BCNF: Every determinant is a candidate key (stricter 3NF)
  • Aim for 3NF in practice — it catches 95% of design issues without over-splitting

What's next? Now that you know how to structure your data properly, the next chapter covers Transactions & Concurrency — how databases keep data consistent when multiple operations happen at once (ACID properties, COMMIT, ROLLBACK).