Schema Design & Normalization vs Denormalization

Database schema design is crucial for building efficient, maintainable, and scalable applications. Understanding normalization helps eliminate data redundancy and maintain data integrity, while strategic denormalization can improve query performance. This tutorial covers database design principles, normalization forms, and practical decisions between normalized and denormalized approaches.

Database Design Fundamentals

Entity-Relationship (ER) Modeling

sql

-- Example: E-commerce Database Schema Design
-- Entities: Users, Products, Orders, Categories, Reviews

-- Users entity
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    password_hash VARCHAR(255) NOT NULL,
    phone VARCHAR(20),
    date_of_birth DATE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Categories entity (hierarchical)
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    slug VARCHAR(100) UNIQUE NOT NULL,
    description TEXT,
    parent_category_id INTEGER REFERENCES categories(id),
    image_url VARCHAR(255),
    is_active BOOLEAN DEFAULT true,
    sort_order INTEGER DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Products entity
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    slug VARCHAR(255) UNIQUE NOT NULL,
    description TEXT,
    short_description VARCHAR(500),
    sku VARCHAR(100) UNIQUE NOT NULL,
    price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
    cost_price DECIMAL(10,2) CHECK (cost_price >= 0),
    weight DECIMAL(8,3),
    dimensions_length DECIMAL(8,2),
    dimensions_width DECIMAL(8,2),
    dimensions_height DECIMAL(8,2),
    stock_quantity INTEGER DEFAULT 0 CHECK (stock_quantity >= 0),
    low_stock_threshold INTEGER DEFAULT 10,
    is_active BOOLEAN DEFAULT true,
    is_featured BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Many-to-many relationship: Products and Categories
CREATE TABLE product_categories (
    product_id INTEGER REFERENCES products(id) ON DELETE CASCADE,
    category_id INTEGER REFERENCES categories(id) ON DELETE CASCADE,
    is_primary BOOLEAN DEFAULT false,
    PRIMARY KEY (product_id, category_id)
);

Relationships and Constraints

sql

-- One-to-Many: Users to Addresses
CREATE TABLE addresses (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    type VARCHAR(20) CHECK (type IN ('shipping', 'billing', 'both')),
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    company VARCHAR(100),
    address_line_1 VARCHAR(255) NOT NULL,
    address_line_2 VARCHAR(255),
    city VARCHAR(100) NOT NULL,
    state_province VARCHAR(100),
    postal_code VARCHAR(20),
    country_code CHAR(2) NOT NULL DEFAULT 'US',
    is_default BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- One-to-Many: Orders
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id),
    order_number VARCHAR(50) UNIQUE NOT NULL,
    status VARCHAR(20) CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled', 'refunded')),
    payment_status VARCHAR(20) CHECK (payment_status IN ('pending', 'paid', 'failed', 'refunded')),
    
    -- Order totals
    subtotal DECIMAL(10,2) NOT NULL CHECK (subtotal >= 0),
    tax_amount DECIMAL(10,2) DEFAULT 0 CHECK (tax_amount >= 0),
    shipping_amount DECIMAL(10,2) DEFAULT 0 CHECK (shipping_amount >= 0),
    discount_amount DECIMAL(10,2) DEFAULT 0 CHECK (discount_amount >= 0),
    total_amount DECIMAL(10,2) NOT NULL CHECK (total_amount >= 0),
    
    -- Addresses (denormalized for historical accuracy)
    shipping_first_name VARCHAR(100),
    shipping_last_name VARCHAR(100),
    shipping_address_line_1 VARCHAR(255),
    shipping_address_line_2 VARCHAR(255),
    shipping_city VARCHAR(100),
    shipping_state VARCHAR(100),
    shipping_postal_code VARCHAR(20),
    shipping_country_code CHAR(2),
    
    billing_first_name VARCHAR(100),
    billing_last_name VARCHAR(100),
    billing_address_line_1 VARCHAR(255),
    billing_address_line_2 VARCHAR(255),
    billing_city VARCHAR(100),
    billing_state VARCHAR(100),
    billing_postal_code VARCHAR(20),
    billing_country_code CHAR(2),
    
    notes TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Order Items (Many-to-Many with attributes)
CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INTEGER NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    product_id INTEGER NOT NULL REFERENCES products(id),
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(10,2) NOT NULL CHECK (unit_price >= 0),
    total_price DECIMAL(10,2) NOT NULL CHECK (total_price >= 0),
    
    -- Product snapshot (denormalized for historical accuracy)
    product_name VARCHAR(255),
    product_sku VARCHAR(100),
    product_weight DECIMAL(8,3),
    
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Normalization Forms

First Normal Form (1NF)

sql

-- VIOLATION of 1NF: Multiple values in single column
CREATE TABLE bad_users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    phone_numbers VARCHAR(255) -- '555-1234, 555-5678, 555-9012'
);

-- 1NF COMPLIANT: Atomic values only
CREATE TABLE users_1nf (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),  -- Separated from full name
    last_name VARCHAR(50),
    email VARCHAR(255)
);

CREATE TABLE user_phone_numbers (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users_1nf(id),
    phone_number VARCHAR(20),
    phone_type VARCHAR(20) -- 'mobile', 'home', 'work'
);

-- Example data that satisfies 1NF
INSERT INTO users_1nf (first_name, last_name, email) VALUES
('John', 'Doe', 'john@example.com'),
('Jane', 'Smith', 'jane@example.com');

INSERT INTO user_phone_numbers (user_id, phone_number, phone_type) VALUES
(1, '555-1234', 'mobile'),
(1, '555-5678', 'home'),
(2, '555-9012', 'mobile');

Second Normal Form (2NF)

sql

-- VIOLATION of 2NF: Partial dependency on composite key
CREATE TABLE bad_order_items (
    order_id INTEGER,
    product_id INTEGER,
    product_name VARCHAR(255),    -- Depends only on product_id
    product_price DECIMAL(10,2),  -- Depends only on product_id
    quantity INTEGER,             -- Depends on both order_id and product_id
    PRIMARY KEY (order_id, product_id)
);

-- 2NF COMPLIANT: Separate entities, eliminate partial dependencies
CREATE TABLE orders_2nf (
    id SERIAL PRIMARY KEY,
    user_id INTEGER,
    total_amount DECIMAL(10,2),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE products_2nf (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    price DECIMAL(10,2),
    description TEXT
);

CREATE TABLE order_items_2nf (
    order_id INTEGER REFERENCES orders_2nf(id),
    product_id INTEGER REFERENCES products_2nf(id),
    quantity INTEGER,
    unit_price DECIMAL(10,2), -- Price at time of order (may differ from current)
    PRIMARY KEY (order_id, product_id)
);

Third Normal Form (3NF)

sql

-- VIOLATION of 3NF: Transitive dependency
CREATE TABLE bad_employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    department_id INTEGER,
    department_name VARCHAR(100),    -- Depends on department_id, not employee_id
    department_location VARCHAR(100) -- Depends on department_id, not employee_id
);

-- 3NF COMPLIANT: Remove transitive dependencies
CREATE TABLE departments (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    location VARCHAR(100),
    budget DECIMAL(12,2),
    manager_id INTEGER
);

CREATE TABLE employees_3nf (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(255),
    department_id INTEGER REFERENCES departments(id),
    salary DECIMAL(10,2),
    hire_date DATE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Now department information is stored once
INSERT INTO departments (name, location, budget) VALUES
('Engineering', 'Building A', 500000.00),
('Marketing', 'Building B', 200000.00),
('Sales', 'Building C', 300000.00);

INSERT INTO employees_3nf (first_name, last_name, email, department_id, salary, hire_date) VALUES
('Alice', 'Johnson', 'alice@company.com', 1, 75000.00, '2023-01-15'),
('Bob', 'Smith', 'bob@company.com', 1, 80000.00, '2022-11-01'),
('Carol', 'Davis', 'carol@company.com', 2, 60000.00, '2023-03-10');

Boyce-Codd Normal Form (BCNF)

sql

-- Example of 3NF but not BCNF violation
-- Student-Subject-Professor relationship with constraints:
-- 1. Each student takes each subject with exactly one professor
-- 2. Each professor teaches exactly one subject
-- 3. A subject can be taught by multiple professors

CREATE TABLE student_enrollments_bcnf (
    student_id INTEGER,
    subject_code VARCHAR(10),
    professor_id INTEGER,
    grade CHAR(2),
    PRIMARY KEY (student_id, subject_code),
    UNIQUE (student_id, professor_id) -- Each student can only take one subject per professor
);

-- Supporting tables for BCNF
CREATE TABLE students (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(255)
);

CREATE TABLE subjects (
    code VARCHAR(10) PRIMARY KEY,
    name VARCHAR(100),
    credits INTEGER
);

CREATE TABLE professors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(255),
    subject_code VARCHAR(10) REFERENCES subjects(code) -- Each professor teaches one subject
);

-- BCNF compliant alternative: Break into smaller relations
CREATE TABLE enrollments (
    student_id INTEGER REFERENCES students(id),
    professor_id INTEGER REFERENCES professors(id),
    grade CHAR(2),
    enrollment_date DATE,
    PRIMARY KEY (student_id, professor_id)
);

Strategic Denormalization

When to Denormalize

sql

-- Example: Blog system with comment counts
-- Normalized approach (requires JOINs and aggregations)
CREATE TABLE posts_normalized (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255),
    content TEXT,
    author_id INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE comments_normalized (
    id SERIAL PRIMARY KEY,
    post_id INTEGER REFERENCES posts_normalized(id),
    author_id INTEGER,
    content TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Expensive query to get posts with comment counts
SELECT 
    p.id,
    p.title,
    COUNT(c.id) as comment_count
FROM posts_normalized p
LEFT JOIN comments_normalized c ON p.id = c.post_id
GROUP BY p.id, p.title
ORDER BY comment_count DESC;

-- Denormalized approach: Store comment count directly
CREATE TABLE posts_denormalized (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255),
    content TEXT,
    author_id INTEGER,
    comment_count INTEGER DEFAULT 0, -- Denormalized field
    last_comment_at TIMESTAMP,       -- Additional denormalized field
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE comments_denormalized (
    id SERIAL PRIMARY KEY,
    post_id INTEGER REFERENCES posts_denormalized(id),
    author_id INTEGER,
    content TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Fast query with denormalized data
SELECT id, title, comment_count 
FROM posts_denormalized 
ORDER BY comment_count DESC;

-- Triggers to maintain denormalized data
CREATE OR REPLACE FUNCTION update_post_comment_count()
RETURNS TRIGGER AS $
BEGIN
    IF TG_OP = 'INSERT' THEN
        UPDATE posts_denormalized 
        SET comment_count = comment_count + 1,
            last_comment_at = NEW.created_at
        WHERE id = NEW.post_id;
        RETURN NEW;
    ELSIF TG_OP = 'DELETE' THEN
        UPDATE posts_denormalized 
        SET comment_count = comment_count - 1
        WHERE id = OLD.post_id;
        RETURN OLD;
    END IF;
    RETURN NULL;
END;
$ LANGUAGE plpgsql;

CREATE TRIGGER comment_count_trigger
    AFTER INSERT OR DELETE ON comments_denormalized
    FOR EACH ROW EXECUTE FUNCTION update_post_comment_count();

Materialized Views for Read Performance

sql

-- Complex aggregation query that's expensive to run repeatedly
CREATE MATERIALIZED VIEW user_stats AS
SELECT 
    u.id,
    u.username,
    u.email,
    COUNT(DISTINCT p.id) as post_count,
    COUNT(DISTINCT c.id) as comment_count,
    AVG(p.views) as avg_post_views,
    MAX(p.created_at) as last_post_date,
    MAX(c.created_at) as last_comment_date,
    COALESCE(SUM(p.views), 0) as total_views
FROM users u
LEFT JOIN posts_normalized p ON u.id = p.author_id
LEFT JOIN comments_normalized c ON u.id = c.author_id
GROUP BY u.id, u.username, u.email;

-- Create index on materialized view
CREATE INDEX idx_user_stats_post_count ON user_stats(post_count);
CREATE INDEX idx_user_stats_total_views ON user_stats(total_views);

-- Fast queries using materialized view
SELECT * FROM user_stats WHERE post_count > 10;
SELECT * FROM user_stats ORDER BY total_views DESC LIMIT 10;

-- Refresh materialized view (can be scheduled)
REFRESH MATERIALIZED VIEW user_stats;

-- Or refresh concurrently (non-blocking)
REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats;

Historical Data and Snapshots

sql

-- Order system: Prices change over time, but orders need historical accuracy
CREATE TABLE product_prices (
    id SERIAL PRIMARY KEY,
    product_id INTEGER REFERENCES products(id),
    price DECIMAL(10,2),
    cost_price DECIMAL(10,2),
    valid_from TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    valid_to TIMESTAMP,
    created_by INTEGER REFERENCES users(id)
);

-- Denormalized order items with price snapshots
CREATE TABLE historical_order_items (
    id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    
    -- Current product reference
    quantity INTEGER,
    
    -- Historical snapshot (denormalized)
    product_name VARCHAR(255),     -- Name at time of order
    product_sku VARCHAR(100),      -- SKU at time of order
    unit_price DECIMAL(10,2),      -- Price at time of order
    product_description TEXT,       -- Description at time of order
    
    total_price DECIMAL(10,2),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Function to create order item with snapshot
CREATE OR REPLACE FUNCTION create_order_item(
    p_order_id INTEGER,
    p_product_id INTEGER,
    p_quantity INTEGER
) RETURNS INTEGER AS $
DECLARE
    v_product products%ROWTYPE;
    v_order_item_id INTEGER;
BEGIN
    -- Get current product data
    SELECT * INTO v_product FROM products WHERE id = p_product_id;
    
    -- Insert with current snapshot
    INSERT INTO historical_order_items (
        order_id, product_id, quantity,
        product_name, product_sku, unit_price, product_description,
        total_price
    ) VALUES (
        p_order_id, p_product_id, p_quantity,
        v_product.name, v_product.sku, v_product.price, v_product.description,
        v_product.price * p_quantity
    ) RETURNING id INTO v_order_item_id;
    
    RETURN v_order_item_id;
END;
$ LANGUAGE plpgsql;

Performance Considerations

Indexing Strategies

sql

-- Single column indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_posts_created_at ON posts_normalized(created_at);

-- Composite indexes (order matters!)
CREATE INDEX idx_posts_author_created ON posts_normalized(author_id, created_at);
CREATE INDEX idx_comments_post_created ON comments_normalized(post_id, created_at);

-- Partial indexes for specific conditions
CREATE INDEX idx_active_products ON products(name) WHERE is_active = true;
CREATE INDEX idx_recent_posts ON posts_normalized(created_at) 
WHERE created_at > CURRENT_DATE - INTERVAL '30 days';

-- Functional indexes
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
CREATE INDEX idx_posts_title_length ON posts_normalized(LENGTH(title));

-- Index for full-text search (PostgreSQL specific)
ALTER TABLE posts_normalized ADD COLUMN search_vector tsvector;

UPDATE posts_normalized 
SET search_vector = to_tsvector('english', title || ' ' || content);

CREATE INDEX idx_posts_search ON posts_normalized USING gin(search_vector);

-- Full-text search query
SELECT id, title FROM posts_normalized 
WHERE search_vector @@ to_tsquery('english', 'database & design');

Partitioning for Large Tables

sql

-- Partition by date range (PostgreSQL 10+)
CREATE TABLE posts_partitioned (
    id SERIAL,
    title VARCHAR(255),
    content TEXT,
    author_id INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);

-- Create partitions
CREATE TABLE posts_2023_q1 PARTITION OF posts_partitioned
    FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');

CREATE TABLE posts_2023_q2 PARTITION OF posts_partitioned
    FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');

CREATE TABLE posts_2023_q3 PARTITION OF posts_partitioned
    FOR VALUES FROM ('2023-07-01') TO ('2023-10-01');

-- Hash partitioning for even distribution
CREATE TABLE user_activities (
    id SERIAL,
    user_id INTEGER,
    activity_type VARCHAR(50),
    activity_data JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id, user_id)
) PARTITION BY HASH (user_id);

CREATE TABLE user_activities_0 PARTITION OF user_activities
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE user_activities_1 PARTITION OF user_activities
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE user_activities_2 PARTITION OF user_activities
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE user_activities_3 PARTITION OF user_activities
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Design Patterns and Anti-Patterns

Entity-Attribute-Value (EAV) Pattern

sql

-- EAV Pattern: Flexible but often problematic
CREATE TABLE entities (
    id SERIAL PRIMARY KEY,
    entity_type VARCHAR(50),
    name VARCHAR(255)
);

CREATE TABLE attributes (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    data_type VARCHAR(20) -- 'string', 'integer', 'decimal', 'date'
);

CREATE TABLE entity_attribute_values (
    entity_id INTEGER REFERENCES entities(id),
    attribute_id INTEGER REFERENCES attributes(id),
    value_text TEXT,
    value_integer INTEGER,
    value_decimal DECIMAL(15,4),
    value_date DATE,
    PRIMARY KEY (entity_id, attribute_id)
);

-- Better alternative: JSONB for flexible attributes
CREATE TABLE products_flexible (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    category_id INTEGER REFERENCES categories(id),
    base_price DECIMAL(10,2) NOT NULL,
    
    -- Core attributes in columns
    weight DECIMAL(8,3),
    dimensions JSONB,
    
    -- Flexible attributes in JSONB
    attributes JSONB,
    
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- JSONB indexes for performance
CREATE INDEX idx_products_attributes ON products_flexible USING gin(attributes);

-- Example data
INSERT INTO products_flexible (name, category_id, base_price, weight, dimensions, attributes) 
VALUES (
    'Laptop Pro 15"',
    1,
    1299.99,
    2.1,
    '{"length": 35.79, "width": 24.71, "height": 1.55}',
    '{
        "brand": "TechCorp",
        "model": "LP15-2023",
        "screen_size": "15.6",
        "ram": "16GB",
        "storage": "512GB SSD",
        "color": "Space Gray",
        "warranty_years": 2,
        "specifications": {
            "processor": "M2 Pro",
            "graphics": "Integrated",
            "ports": ["USB-C", "HDMI", "Audio Jack"]
        }
    }'
);

-- Query JSONB attributes
SELECT name, attributes->>'brand' as brand, attributes->>'ram' as ram
FROM products_flexible 
WHERE attributes->>'brand' = 'TechCorp';

SELECT name FROM products_flexible 
WHERE attributes @> '{"warranty_years": 2}';

Audit and Versioning Patterns

sql

-- Audit log pattern
CREATE TABLE audit_log (
    id SERIAL PRIMARY KEY,
    table_name VARCHAR(50),
    record_id INTEGER,
    operation VARCHAR(10), -- 'INSERT', 'UPDATE', 'DELETE'
    old_values JSONB,
    new_values JSONB,
    changed_by INTEGER REFERENCES users(id),
    changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Generic audit trigger function
CREATE OR REPLACE FUNCTION audit_trigger_function()
RETURNS TRIGGER AS $
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO audit_log (table_name, record_id, operation, new_values, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, TG_OP, row_to_json(NEW), NEW.updated_by);
        RETURN NEW;
    ELSIF TG_OP = 'UPDATE' THEN
        INSERT INTO audit_log (table_name, record_id, operation, old_values, new_values, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, TG_OP, row_to_json(OLD), row_to_json(NEW), NEW.updated_by);
        RETURN NEW;
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO audit_log (table_name, record_id, operation, old_values, changed_by)
        VALUES (TG_TABLE_NAME, OLD.id, TG_OP, row_to_json(OLD), OLD.updated_by);
        RETURN OLD;
    END IF;
    RETURN NULL;
END;
$ LANGUAGE plpgsql;

-- Version control pattern
CREATE TABLE document_versions (
    id SERIAL PRIMARY KEY,
    document_id INTEGER,
    version_number INTEGER,
    title VARCHAR(255),
    content TEXT,
    created_by INTEGER REFERENCES users(id),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    is_current BOOLEAN DEFAULT false,
    UNIQUE (document_id, version_number)
);

-- Current documents view
CREATE VIEW current_documents AS
SELECT * FROM document_versions WHERE is_current = true;

Migration and Schema Evolution

Schema Migration Best Practices

sql

-- Migration script example: Adding new column safely
-- Step 1: Add nullable column
ALTER TABLE users ADD COLUMN middle_name VARCHAR(50);

-- Step 2: Populate with default/calculated values (if needed)
-- UPDATE users SET middle_name = '' WHERE middle_name IS NULL;

-- Step 3: Make NOT NULL if required (in separate migration)
-- ALTER TABLE users ALTER COLUMN middle_name SET NOT NULL;

-- Safe index creation (non-blocking in PostgreSQL)
CREATE INDEX CONCURRENTLY idx_users_middle_name ON users(middle_name);

-- Renaming columns safely
-- Step 1: Add new column
ALTER TABLE products ADD COLUMN product_name VARCHAR(255);

-- Step 2: Copy data
UPDATE products SET product_name = name;

-- Step 3: Update application to use new column
-- Step 4: Drop old column (after deployment)
-- ALTER TABLE products DROP COLUMN name;

-- Table restructuring with minimal downtime
-- Create new table structure
CREATE TABLE users_new (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL,
    full_name VARCHAR(200), -- Combined first_name + last_name
    profile_data JSONB, -- Flexible profile information
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Migrate data
INSERT INTO users_new (id, email, username, full_name, profile_data, created_at)
SELECT 
    id, 
    email, 
    username,
    CONCAT_WS(' ', first_name, last_name),
    jsonb_build_object(
        'first_name', first_name,
        'last_name', last_name,
        'phone', phone,
        'date_of_birth', date_of_birth
    ),
    created_at
FROM users;

-- Switch tables (requires downtime or complex orchestration)
-- BEGIN;
-- ALTER TABLE users RENAME TO users_old;
-- ALTER TABLE users_new RENAME TO users;
-- COMMIT;

Decision Framework: Normalize vs Denormalize

Analysis Criteria

sql

-- Example decision matrix for comment counts:

-- NORMALIZED approach:
-- ✓ Data consistency guaranteed
-- ✓ Single source of truth
-- ✓ Easy to maintain
-- ✗ Expensive JOINs for display
-- ✗ Complex aggregation queries
-- ✗ Poor performance at scale

-- DENORMALIZED approach:
-- ✓ Fast read queries
-- ✓ Simple SELECT statements
-- ✓ Better performance at scale
-- ✗ Data consistency requires careful management
-- ✗ Storage overhead
-- ✗ Complex update logic

-- Hybrid approach: Materialized views + triggers
CREATE MATERIALIZED VIEW post_summary AS
SELECT 
    p.id,
    p.title,
    p.content,
    p.author_id,
    p.created_at,
    COUNT(c.id) as comment_count,
    MAX(c.created_at) as last_comment_at,
    ARRAY_AGG(DISTINCT c.author_id) as commenters
FROM posts_normalized p
LEFT JOIN comments_normalized c ON p.id = c.post_id
GROUP BY p.id, p.title, p.content, p.author_id, p.created_at;

-- Refresh strategy based on requirements:
-- Real-time: Use triggers
-- Near real-time: Refresh every few minutes
-- Batch: Refresh nightly

-- Scheduled refresh
SELECT cron.schedule('refresh-post-summary', '*/5 * * * *', 
    'REFRESH MATERIALIZED VIEW CONCURRENTLY post_summary;');

Conclusion

Schema design is about balancing consistency, performance, and maintainability. Start with a normalized design to ensure data integrity, then strategically denormalize based on actual performance requirements and usage patterns. Use tools like materialized views, proper indexing, and partitioning to optimize performance while maintaining data quality.

Remember that premature optimization is often counterproductive - measure actual performance bottlenecks before denormalizing, and always consider the maintenance cost of complex denormalized structures.

LearnCodePro

Schema Design & Normalization vs Denormalization

Database Design Fundamentals

Entity-Relationship (ER) Modeling

Relationships and Constraints

Normalization Forms

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)

Strategic Denormalization

When to Denormalize

Materialized Views for Read Performance

Historical Data and Snapshots

Performance Considerations

Indexing Strategies

Partitioning for Large Tables

Design Patterns and Anti-Patterns

Entity-Attribute-Value (EAV) Pattern

Audit and Versioning Patterns

Migration and Schema Evolution

Schema Migration Best Practices

Decision Framework: Normalize vs Denormalize

Analysis Criteria

Conclusion

SQL Fundamentals: SELECT, JOINs, GROUP BY

Bridging the Gap: A Guide to Using ORMs with SQL

Managing Your Database Schema: Migrations and Seeding

Quick Navigation

This Series

Topics in MERN STACK WEB DEVELOPMENT

Categories

Learn More

Want to Track Your Progress?

Database Design Fundamentals

Entity-Relationship (ER) Modeling

Relationships and Constraints

Normalization Forms

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)

Strategic Denormalization

When to Denormalize

Materialized Views for Read Performance

Historical Data and Snapshots

Performance Considerations

Indexing Strategies

Partitioning for Large Tables

Design Patterns and Anti-Patterns

Entity-Attribute-Value (EAV) Pattern

Audit and Versioning Patterns

Migration and Schema Evolution

Schema Migration Best Practices

Decision Framework: Normalize vs Denormalize

Analysis Criteria

Conclusion

More in Databases — SQL / Postgres

SQL Fundamentals: SELECT, JOINs, GROUP BY

Bridging the Gap: A Guide to Using ORMs with SQL

Managing Your Database Schema: Migrations and Seeding

Quick Navigation

This Series

Topics in MERN STACK WEB DEVELOPMENT

Categories

Learn More

Want to Track Your Progress?