Large Database Strategies: Partitioning, Indexing & Backup

1 Strategies for Handling Large Databases

When your database grows beyond a few million rows, strategies that worked at small scale start to break down. Here's how to keep things fast.

Data Partitioning

As covered in partitioning, splitting a large table into smaller physical segments lets the database scan only relevant partitions.

SQL

-- Partition a 500M-row logs table by month
ALTER TABLE access_logs
PARTITION BY RANGE (TO_DAYS(created_at)) (
    PARTITION p_2024_q1 VALUES LESS THAN (TO_DAYS('2024-04-01')),
    PARTITION p_2024_q2 VALUES LESS THAN (TO_DAYS('2024-07-01')),
    PARTITION p_2024_q3 VALUES LESS THAN (TO_DAYS('2024-10-01')),
    PARTITION p_2024_q4 VALUES LESS THAN (TO_DAYS('2025-01-01')),
    PARTITION p_2025_q1 VALUES LESS THAN (TO_DAYS('2025-04-01')),
    PARTITION p_future  VALUES LESS THAN MAXVALUE
);

-- Drop old data instantly (vs DELETE which logs every row)
ALTER TABLE access_logs DROP PARTITION p_2024_q1;

Indexing at Scale

Challenge	Solution
Too many indexes slow writes	Audit unused indexes: `SELECT * FROM sys.schema_unused_indexes`
Index creation locks the table	Use `ALTER TABLE ... ADD INDEX ... ALGORITHM=INPLACE, LOCK=NONE` (InnoDB online DDL)
Queries still slow despite indexes	Check for covering indexes and composite index column order
Full-text search on large text	Use FULLTEXT indexes or external search (Elasticsearch)

Archiving Old Data

SQL

-- Move old orders to an archive table
INSERT INTO orders_archive
SELECT * FROM orders
WHERE order_date < DATE_SUB(CURDATE(), INTERVAL 2 YEAR);

DELETE FROM orders
WHERE order_date < DATE_SUB(CURDATE(), INTERVAL 2 YEAR)
LIMIT 10000;  -- Delete in batches to avoid long locks

Other Strategies

Read replicas — Route SELECT queries to read-only replicas to offload the primary server
Connection pooling — Reuse database connections instead of creating new ones per request
Query caching — Cache frequent query results in Redis or Memcached
Sharding — Split data across multiple database servers (last resort — adds significant complexity)
Denormalization — Pre-calculate aggregates to avoid expensive JOINs

Scaling order: (1) Optimize queries and add indexes. (2) Add caching (Redis). (3) Add read replicas. (4) Partition large tables. (5) Archive old data. (6) Shard only when everything else isn't enough. Most applications never need to go past step 3.

Key Takeaways

Partition time-series data by date — enables fast pruning and instant old-data drops
Audit and remove unused indexes to improve write performance
Archive old data to keep active tables lean
Scale in order: indexes → cache → replicas → partitions → sharding

2 Database Backup and Restore

Top

Your database will eventually face a disaster — accidental DELETE, hardware failure, ransomware, or a bad deployment. Backups are your insurance.

Backup Types

Type	What It Does	Speed	Size	Use Case
Full backup	Copies everything	Slow	Large	Weekly/monthly baseline
Incremental	Only changes since last backup	Fast	Small	Daily/hourly
Logical (mysqldump)	SQL statements to recreate data	Slow	Medium	Small databases, migrations
Physical	Copies raw data files	Fast	Large	Large databases, fast restore
Point-in-time	Full + binary logs = restore to any second	Medium	Large	Production systems

MySQL Backup Commands

Bash

# Logical backup (small/medium databases)
mysqldump -u root -p --single-transaction \
    --routines --triggers --events \
    mydb > mydb_backup_2025-04-04.sql

# Restore from logical backup
mysql -u root -p mydb < mydb_backup_2025-04-04.sql

# Physical backup (large databases — use MySQL Enterprise Backup or Percona XtraBackup)
xtrabackup --backup --target-dir=/backups/full/

# Restore physical backup
xtrabackup --prepare --target-dir=/backups/full/
xtrabackup --copy-back --target-dir=/backups/full/

Point-in-Time Recovery (PITR)

Bash

# Enable binary logging (my.cnf)
[mysqld]
log_bin = /var/log/mysql/mysql-bin
server-id = 1
binlog_format = ROW

# Restore to a specific point in time:
# 1. Restore the last full backup
mysql -u root -p mydb < full_backup.sql

# 2. Replay binary logs up to the moment before the disaster
mysqlbinlog --stop-datetime="2025-04-04 14:30:00" \
    /var/log/mysql/mysql-bin.000042 | mysql -u root -p

The 3-2-1 Backup Rule

3 copies of your data (production + 2 backups)
2 different storage types (local disk + cloud/tape)
1 offsite copy (different data center, cloud region, or physical location)

A backup you haven't tested is not a backup. Schedule regular restore drills. Verify that your backups actually produce a working database. The worst time to discover your backup is corrupt is during an actual disaster.

Production checklist: (1) Automated daily backups with mysqldump --single-transaction. (2) Binary logging enabled for point-in-time recovery. (3) Weekly backup verification (restore to a test server). (4) Offsite copy to cloud storage (S3, Azure Blob). (5) Monitor backup job success/failure alerts.

Key Takeaways

Use logical backups (mysqldump) for small/medium databases; physical for large ones
Enable binary logging for point-in-time recovery (restore to any second)
Follow the 3-2-1 rule: 3 copies, 2 storage types, 1 offsite
Test your restores regularly — an untested backup is no backup at all

Working with Large Databases

In This Chapter

1 Strategies for Handling Large Databases

Data Partitioning

Indexing at Scale

Archiving Old Data

Other Strategies

Key Takeaways

2 Database Backup and Restore

Backup Types

MySQL Backup Commands

Point-in-Time Recovery (PITR)

The 3-2-1 Backup Rule

Key Takeaways

All Chapters in SQL Advanced Course

What's Next?

Courses

Services

Support

Connect