1 Strategies for Handling Large Databases
TopWhen your database grows beyond a few million rows, strategies that worked at small scale start to break down. Here's how to keep things fast.
Data Partitioning
As covered in partitioning, splitting a large table into smaller physical segments lets the database scan only relevant partitions.
-- Partition a 500M-row logs table by month
ALTER TABLE access_logs
PARTITION BY RANGE (TO_DAYS(created_at)) (
PARTITION p_2024_q1 VALUES LESS THAN (TO_DAYS('2024-04-01')),
PARTITION p_2024_q2 VALUES LESS THAN (TO_DAYS('2024-07-01')),
PARTITION p_2024_q3 VALUES LESS THAN (TO_DAYS('2024-10-01')),
PARTITION p_2024_q4 VALUES LESS THAN (TO_DAYS('2025-01-01')),
PARTITION p_2025_q1 VALUES LESS THAN (TO_DAYS('2025-04-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Drop old data instantly (vs DELETE which logs every row)
ALTER TABLE access_logs DROP PARTITION p_2024_q1;Indexing at Scale
| Challenge | Solution |
|---|---|
| Too many indexes slow writes | Audit unused indexes: SELECT * FROM sys.schema_unused_indexes |
| Index creation locks the table | Use ALTER TABLE ... ADD INDEX ... ALGORITHM=INPLACE, LOCK=NONE (InnoDB online DDL) |
| Queries still slow despite indexes | Check for covering indexes and composite index column order |
| Full-text search on large text | Use FULLTEXT indexes or external search (Elasticsearch) |
Archiving Old Data
-- Move old orders to an archive table
INSERT INTO orders_archive
SELECT * FROM orders
WHERE order_date < DATE_SUB(CURDATE(), INTERVAL 2 YEAR);
DELETE FROM orders
WHERE order_date < DATE_SUB(CURDATE(), INTERVAL 2 YEAR)
LIMIT 10000; -- Delete in batches to avoid long locksOther Strategies
- Read replicas — Route SELECT queries to read-only replicas to offload the primary server
- Connection pooling — Reuse database connections instead of creating new ones per request
- Query caching — Cache frequent query results in Redis or Memcached
- Sharding — Split data across multiple database servers (last resort — adds significant complexity)
- Denormalization — Pre-calculate aggregates to avoid expensive JOINs
Scaling order: (1) Optimize queries and add indexes. (2) Add caching (Redis). (3) Add read replicas. (4) Partition large tables. (5) Archive old data. (6) Shard only when everything else isn't enough. Most applications never need to go past step 3.
Key Takeaways
- Partition time-series data by date — enables fast pruning and instant old-data drops
- Audit and remove unused indexes to improve write performance
- Archive old data to keep active tables lean
- Scale in order: indexes → cache → replicas → partitions → sharding