Database Indexing Optimization: Complete Guide

The Importance of Database Indexing

Database indexes are critical for query performance. Proper indexing can improve query speed by orders of magnitude, while poor indexing can lead to slow queries, high resource consumption, and scalability issues. Understanding indexing strategies is essential for any developer working with databases.

Indexing Analogy: Think of a database index like a book's index. Without an index, you'd have to scan every page to find information. With an index, you can go directly to the relevant pages.

Types of Database Indexes

B-Tree Indexes

Most common index type, balanced tree structure:

• Default index in most databases
• Efficient for equality and range queries
• Supports sorting operations
• Works well with most data types

-- PostgreSQL/MySQL
CREATE INDEX idx_users_email ON users(email);
                                

Hash Indexes

Fast for equality lookups only:

• Extremely fast for exact matches
• Doesn't support range queries
• Memory-intensive
• Best for unique constraints

-- PostgreSQL
CREATE INDEX idx_users_id_hash ON users USING hash(id);
                                

GIN (Generalized Inverted) Indexes

For array and full-text search:

• Optimized for array operations
• Excellent for full-text search
• Supports JSON/JSONB queries
• Larger than B-tree indexes

-- PostgreSQL JSONB indexing
CREATE INDEX idx_users_tags_gin ON users USING gin(tags);
                                

BRIN (Block Range) Indexes

For large, naturally ordered data:

• Very small storage footprint
• Efficient for time-series data
• Works with ordered data
• Less precise than B-tree

-- PostgreSQL time-series
CREATE INDEX idx_logs_timestamp_brin ON logs USING brin(timestamp);
                                

Index Design Strategies

1. Composite Indexes

Index multiple columns together for specific query patterns:

-- Query: WHERE status = 'active' AND created_at > '2024-01-01'
-- Good: Composite index on (status, created_at)
CREATE INDEX idx_users_status_created ON users(status, created_at);

-- Query: WHERE status = 'active' ORDER BY created_at DESC
-- Also works well with the same index

-- Query: WHERE created_at > '2024-01-01' AND status = 'active'
-- Still works (order doesn't matter for equality)

-- Query: WHERE created_at > '2024-01-01'
-- Does NOT use the index efficiently (leading column missing)
                        

2. Covering Indexes

Include all columns needed by the query in the index:

-- Query: SELECT id, name, email FROM users WHERE status = 'active'
-- Without covering index: Index lookup + table access
CREATE INDEX idx_users_status ON users(status);

-- With covering index: Index-only scan
CREATE INDEX idx_users_status_covering ON users(status) INCLUDE (id, name, email);

-- PostgreSQL INCLUDE syntax
-- MySQL: Add columns to index key
-- SQL Server: INCLUDE clause available
                        

3. Partial/Filtered Indexes

Index only a subset of rows:

-- Index only active users (smaller, faster)
CREATE INDEX idx_users_active ON users(status) WHERE status = 'active';

-- Index only recent data
CREATE INDEX idx_orders_recent ON orders(created_at) 
WHERE created_at > CURRENT_DATE - INTERVAL '30 days';

-- Index only non-deleted records
CREATE INDEX idx_comments_active ON comments(post_id) 
WHERE deleted_at IS NULL;
                        

Query Optimization with Indexes

1. Index Selection Rules

How databases choose which index to use:

Index Selection Factors:

Selectivity: More selective columns first
Cardinality: High cardinality = better index
Query Patterns: Match WHERE, JOIN, ORDER BY
Index Size: Smaller indexes preferred
Maintenance Cost: Update/insert performance

2. Common Anti-Patterns

Indexing mistakes to avoid:

-- 1. Over-indexing (too many indexes)
-- Each index adds write overhead
CREATE INDEX idx_a ON table(a);
CREATE INDEX idx_b ON table(b);
CREATE INDEX idx_c ON table(c);
CREATE INDEX idx_a_b ON table(a, b); -- Often redundant

-- 2. Indexing low-cardinality columns
CREATE INDEX idx_gender ON users(gender); -- Only 2-3 values

-- 3. Indexing frequently updated columns
CREATE INDEX idx_last_login ON users(last_login); -- Updates on every login

-- 4. Wrong column order in composite indexes
-- Query: WHERE status = 'active' AND created_at > '2024-01-01'
CREATE INDEX idx_wrong_order ON users(created_at, status); -- Wrong!
CREATE INDEX idx_correct_order ON users(status, created_at); -- Correct!
                        

Index Monitoring and Maintenance

1. Identifying Unused Indexes

Find and remove indexes that aren't being used:

-- PostgreSQL: Check index usage
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan as index_scans
FROM pg_stat_user_indexes
ORDER BY idx_scan;

-- MySQL: Check index usage
SELECT 
    object_schema,
    object_name,
    index_name,
    count_read,
    count_fetch
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
ORDER BY count_read DESC;

-- SQL Server: Check index usage
SELECT 
    object_name(s.object_id) as table_name,
    i.name as index_name,
    user_seeks,
    user_scans,
    user_lookups,
    user_updates
FROM sys.dm_db_index_usage_stats s
INNER JOIN sys.indexes i ON s.object_id = i.object_id
    AND s.index_id = i.index_id;
                        

2. Index Fragmentation and Rebuilding

Maintain index performance over time:

-- PostgreSQL: REINDEX
REINDEX INDEX concurrently idx_users_email;
REINDEX TABLE users;

-- MySQL: OPTIMIZE TABLE
OPTIMIZE TABLE users;

-- SQL Server: Rebuild/Reorganize
-- Rebuild (more thorough)
ALTER INDEX idx_users_email ON users REBUILD;

-- Reorganize (online, less disruptive)
ALTER INDEX idx_users_email ON users REORGANIZE;

-- Check fragmentation
SELECT 
    object_name(ips.object_id) as table_name,
    i.name as index_name,
    ips.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(
    DB_ID(), NULL, NULL, NULL, 'LIMITED') ips
INNER JOIN sys.indexes i ON ips.object_id = i.object_id
    AND ips.index_id = i.index_id
WHERE ips.avg_fragmentation_in_percent > 30;
                        

Database-Specific Optimization

PostgreSQL

• Use partial indexes
• Consider BRIN for time-series
• GIN for JSON/arrays
• Analyze table regularly

MySQL

• Use covering indexes
• Consider prefix indexes
• Monitor index usage
• Use EXPLAIN ANALYZE

SQL Server

• Use filtered indexes
• Consider columnstore
• Monitor fragmentation
• Use included columns

Conclusion and Best Practices

Indexing Best Practices Checklist

✓ Index columns used in WHERE, JOIN, ORDER BY
✓ Create composite indexes for common query patterns
✓ Use covering indexes to avoid table access
✓ Consider partial indexes for filtered data
✓ Monitor and remove unused indexes
✓ Rebuild fragmented indexes regularly
✓ Test index changes with realistic data
✓ Use database-specific optimization features

Effective database indexing requires understanding your data, query patterns, and database capabilities. Start with the most critical queries, monitor performance, and iterate. Remember that indexing is a balance between read performance and write overhead. Regular monitoring and adjustment are key to maintaining optimal database performance.

Related Tools on DailyTools.uk

Check out our developer tools that can help with data processing:

• JSON Tool - Validate and optimize JSON data
• Base64 Tool - Encode/decode database data
• UUID Generator - Generate unique identifiers
• CSV Tool - Process tabular data