CodeWithSQL.com

Working with Large Databases

Learn strategies for handling large databases including data partitioning, indexing at scale, managing locks, and database backup & restore strategies.

2 Topics Chapter 12 of 12 Advanced

1 Strategies for Handling Large Databases

Top

When your database grows beyond a few million rows, strategies that worked at small scale start to break down. Here's how to keep things fast.

Data Partitioning

As covered in partitioning, splitting a large table into smaller physical segments lets the database scan only relevant partitions.

SQL-- Partition a 500M-row logs table by month ALTER TABLE access_logs PARTITION BY RANGE (TO_DAYS(created_at)) ( PARTITION p_2024_q1 VALUES LESS THAN (TO_DAYS('2024-04-01')), PARTITION p_2024_q2 VALUES LESS THAN (TO_DAYS('2024-07-01')), PARTITION p_2024_q3 VALUES LESS THAN (TO_DAYS('2024-10-01')), PARTITION p_2024_q4 VALUES LESS THAN (TO_DAYS('2025-01-01')), PARTITION p_2025_q1 VALUES LESS THAN (TO_DAYS('2025-04-01')), PARTITION p_future VALUES LESS THAN MAXVALUE ); -- Drop old data instantly (vs DELETE which logs every row) ALTER TABLE access_logs DROP PARTITION p_2024_q1;

Indexing at Scale

ChallengeSolution
Too many indexes slow writesAudit unused indexes: SELECT * FROM sys.schema_unused_indexes
Index creation locks the tableUse ALTER TABLE ... ADD INDEX ... ALGORITHM=INPLACE, LOCK=NONE (InnoDB online DDL)
Queries still slow despite indexesCheck for covering indexes and composite index column order
Full-text search on large textUse FULLTEXT indexes or external search (Elasticsearch)

Archiving Old Data

SQL-- Move old orders to an archive table INSERT INTO orders_archive SELECT * FROM orders WHERE order_date < DATE_SUB(CURDATE(), INTERVAL 2 YEAR); DELETE FROM orders WHERE order_date < DATE_SUB(CURDATE(), INTERVAL 2 YEAR) LIMIT 10000; -- Delete in batches to avoid long locks

Other Strategies

  • Read replicas — Route SELECT queries to read-only replicas to offload the primary server
  • Connection pooling — Reuse database connections instead of creating new ones per request
  • Query caching — Cache frequent query results in Redis or Memcached
  • Sharding — Split data across multiple database servers (last resort — adds significant complexity)
  • Denormalization — Pre-calculate aggregates to avoid expensive JOINs

Scaling order: (1) Optimize queries and add indexes. (2) Add caching (Redis). (3) Add read replicas. (4) Partition large tables. (5) Archive old data. (6) Shard only when everything else isn't enough. Most applications never need to go past step 3.

Key Takeaways

  • Partition time-series data by date — enables fast pruning and instant old-data drops
  • Audit and remove unused indexes to improve write performance
  • Archive old data to keep active tables lean
  • Scale in order: indexes → cache → replicas → partitions → sharding

2 Database Backup and Restore

Top

Your database will eventually face a disaster — accidental DELETE, hardware failure, ransomware, or a bad deployment. Backups are your insurance.

Backup Types

TypeWhat It DoesSpeedSizeUse Case
Full backupCopies everythingSlowLargeWeekly/monthly baseline
IncrementalOnly changes since last backupFastSmallDaily/hourly
Logical (mysqldump)SQL statements to recreate dataSlowMediumSmall databases, migrations
PhysicalCopies raw data filesFastLargeLarge databases, fast restore
Point-in-timeFull + binary logs = restore to any secondMediumLargeProduction systems

MySQL Backup Commands

Bash# Logical backup (small/medium databases) mysqldump -u root -p --single-transaction \ --routines --triggers --events \ mydb > mydb_backup_2025-04-04.sql # Restore from logical backup mysql -u root -p mydb < mydb_backup_2025-04-04.sql # Physical backup (large databases — use MySQL Enterprise Backup or Percona XtraBackup) xtrabackup --backup --target-dir=/backups/full/ # Restore physical backup xtrabackup --prepare --target-dir=/backups/full/ xtrabackup --copy-back --target-dir=/backups/full/

Point-in-Time Recovery (PITR)

Bash# Enable binary logging (my.cnf) [mysqld] log_bin = /var/log/mysql/mysql-bin server-id = 1 binlog_format = ROW # Restore to a specific point in time: # 1. Restore the last full backup mysql -u root -p mydb < full_backup.sql # 2. Replay binary logs up to the moment before the disaster mysqlbinlog --stop-datetime="2025-04-04 14:30:00" \ /var/log/mysql/mysql-bin.000042 | mysql -u root -p

The 3-2-1 Backup Rule

  • 3 copies of your data (production + 2 backups)
  • 2 different storage types (local disk + cloud/tape)
  • 1 offsite copy (different data center, cloud region, or physical location)

A backup you haven't tested is not a backup. Schedule regular restore drills. Verify that your backups actually produce a working database. The worst time to discover your backup is corrupt is during an actual disaster.

Production checklist: (1) Automated daily backups with mysqldump --single-transaction. (2) Binary logging enabled for point-in-time recovery. (3) Weekly backup verification (restore to a test server). (4) Offsite copy to cloud storage (S3, Azure Blob). (5) Monitor backup job success/failure alerts.

Key Takeaways

  • Use logical backups (mysqldump) for small/medium databases; physical for large ones
  • Enable binary logging for point-in-time recovery (restore to any second)
  • Follow the 3-2-1 rule: 3 copies, 2 storage types, 1 offsite
  • Test your restores regularly — an untested backup is no backup at all

Congratulations! You've completed the entire SQL Advanced Course and the complete CodeWithSQL curriculum. You now have deep knowledge of window functions, stored procedures, triggers, normalization, locks, security, performance tuning, query optimization, and large database strategies. Need personalized help with a specific SQL challenge? Our SQL Mentorship program offers 1-on-1 sessions with experienced professionals.