Session 10

Chained Matrix Multiplication

This session explains why the order of multiplying matrices matters. Matrix multiplication is associative, so the final result is the same for any valid parenthesization, but the number of scalar multiplications can change drastically.

Objectives

Understand why matrix-chain order affects multiplication cost.
Find all possible orders for multiplying five matrices.
Use dynamic programming to find optimal parenthesization.

Given Dimensions

Matrix	Dimension
A	`10 x 4`
B	`4 x 5`
C	`5 x 20`
D	`20 x 2`
E	`2 x 50`

Dimension array:

p = [10, 4, 5, 20, 2, 50]

Concept

Matrix multiplication is associative:

		(A * B) * C = A * (B * C)

The result is the same, but the number of scalar multiplications can be very different.

Dynamic Programming Recurrence

		m[i][j] = min(m[i][k] + m[k+1][j] + p[i-1] * p[k] * p[j])

where i <= k < j.

Question 1

Problem Statement

List different orders for evaluating the product of A, B, C, D, E matrices.

Explanation / Approach

For a fixed sequence of matrices $A \times B \times C \times D \times E$ , the order of evaluation refers to how you place parentheses. Due to the associative property of matrix multiplication, all valid parenthesizations yield the same final matrix, but they require vastly different amounts of computational work.

For 5 matrices, there are exactly 14 valid parenthesization orders (given by the 4th Catalan number, $C_{4} = 14$ ). Below is the complete list, along with the scalar multiplication cost for each, which highlights why evaluation order matters in practice.

#	Evaluation Order (Parenthesization)	Scalar Multiplication Cost
1	((((A·B)·C)·D)·E)	"2,600"
2	(((A·(B·C))·D)·E)	"2,600"
3	((A·((B·C)·D))·E)	"1,640"
4	((A·(B·(C·D)))·E)	"1,320"
5	(((A·B)·(C·D))·E)	"1,500"
6	((A·B)·((C·D)·E))	"3,400"
7	((A·B)·(C·(D·E)))	"9,700"
8	((A·(B·C))·(D·E))	"13,200"
9	(A·(((B·C)·D)·E))	"2,960"
10	(A·((B·(C·D))·E))	"2,640"
11	(A·((B·C)·(D·E)))	"8,400"
12	(A·(B·((C·D)·E)))	"3,700"
13	(A·(B·(C·(D·E))))	"10,000"
14	(((A·B)·C)·(D·E))	"13,200"

Cost Calculation: Multiplying an $m \times n$ matrix by an $n \times p$ matrix requires $m \cdot n \cdot p$ scalar multiplications. The total cost is the sum of costs at each multiplication step.
Performance Gap: The worst valid orders (#8 and #14) require 10× more operations than the optimal order (#4).
Why #4 is optimal: It computes the narrowest intermediate matrices first (C·D → 5×2, then B·(C·D) → 4×2, then A·... → 10×2), keeping the inner dimension small before multiplying by the large 50 column of E.

Question 2

Problem Statement

Find the optimal order for evaluating A * B * C * D * E using the given dimensions.

Explanation / Approach

The optimal parenthesization is: $((A · (B · (C · D))) · E)$

Minimum scalar multiplication cost: 1,320

Step	Operation	Input Dimensions	Output Dimensions	Cost (m×n×p)
1	C · D	5×20 × 20×2	5×2	5×20×2 = 200
2	B · (CD)	4×5 × 5×2	4×2	4×5×2 = 40
3	A · (B(CD))	10×4 × 4×2	10×2	10×4×2 = 80
4	(A(B(CD))) · E	10×2 × 2×50	10×50	10×2×50 = 1,000
Total				1,320

Why This Order is Optimal

Matrix chain multiplication is all about controlling the size of intermediate matrices. $C · D$ collapses the large 20 inner dimension early, producing a skinny $5 \times 2$ matrix.
Subsequent multiplications with B and A keep the intermediate column dimension at 2, avoiding expensive operations with the 50 columns of E until the very last step.
Multiplying the final $10 \times 2$ intermediate by E ( $2 \times 50$ ) is cheap because the shared dimension is only 2.

Any other parenthesization forces a larger inner dimension to be multiplied earlier, drastically increasing the total operation count (the worst valid order requires 13,200 operations, exactly 10× more).

Question 3

Problem Statement

Implement chained matrix multiplication and print the optimal parentheses. Study the performance on different problem instances.

Answer

Aspect	Complexity / Behavior
Time Complexity	O(n³) due to 3 nested loops (chain_len, i, k)
Space Complexity	O(n²) for DP tables m and s
Empirical Scaling	Matches O(n³): doubling n roughly increases runtime by 8× (e.g., 50→100: ~1.3ms → ~9.8ms)
Pattern Impact on Cost	Dimension distribution drastically changes optimal cost, but not DP runtime. Increasing/Decreasing chains have highly structured optimal splits, while random chains require full DP evaluation.
Practical Limit	Python's pure loops cap out around n ≈ 300-400 (~1-2 sec). For n > 1000, use C/C++ extensions or iterative reconstruction to avoid recursion limits.

Implementation

Python

floud-warshall-algorithm.py

		import random
import time
from typing import List, Tuple
 
 
def matrix_chain_order(dims: List[int]) -> Tuple[List[List[int]], List[List[int]]]:
    """
    Computes minimum multiplication cost and optimal split points using DP.
 
    Args:
        dims: List of dimensions where matrix i has shape dims[i] x dims[i+1]
              Length must be n + 1 for n matrices.
 
    Returns:
        m: DP table for minimum costs
        s: DP table for optimal split indices
    """
    n = len(dims) - 1  # Number of matrices
    m = [[0] * n for _ in range(n)]
    s = [[0] * n for _ in range(n)]
 
    # l = chain length (subproblem size)
    for l in range(2, n + 1):
        # i = left boundary of subchain
        for i in range(n - l + 1):
            j = i + l - 1  # j = right boundary
            m[i][j] = float("inf")
 
            # k = split point
            for k in range(i, j):
                # Cost = left + right + merge
                cost = m[i][k] + m[k + 1][j] + dims[i] * dims[k + 1] * dims[j + 1]
                if cost < m[i][j]:
                    m[i][j] = cost
                    s[i][j] = k
 
    return m, s
 
 
def construct_parens(s: List[List[int]], i: int, j: int) -> str:
    """Reconstructs optimal parenthesization string from split table."""
    if i == j:
        return f"M{i + 1}"
    k = s[i][j]
    return f"({construct_parens(s, i, k)} · {construct_parens(s, k + 1, j)})"
 
 
def run_performance_study():
    """Benchmarks algorithm across varying chain lengths."""
    print(f"{'N':<5} | {'Pattern':<12} | {'Cost':<10} | {'Time(ms)':<8}")
    print("-" * 45)
 
    test_cases = [
        (5, "Fixed", [10, 4, 5, 20, 2, 50]),
        (20, "Uniform", [random.randint(5, 100) for _ in range(21)]),
        (50, "Uniform", [random.randint(5, 100) for _ in range(51)]),
        (100, "Uniform", [random.randint(5, 100) for _ in range(101)]),
    ]
 
    for n, pattern, dims in test_cases:
        start = time.perf_counter()
        m, s = matrix_chain_order(dims)
        end = time.perf_counter()
        cost = m[0][n - 1]
        parens = construct_parens(s, 0, n - 1)
 
        print(f"{n:<5} | {pattern:<12} | {cost:<10} | {(end - start) * 1000:.3f}")
 
        if n == 5:
            print(f"\n🔍 Verified: {parens} (Cost: {cost})\n")
 
 
if __name__ == "__main__":
    run_performance_study()

Practical Limits

Overhead of list-of-lists and interpreter slows n > 300. Use numpy arrays or lru_cache for memoization if recursion is preferred.

C Language

floud-warshall-algorithm.c

		#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <time.h>
 
/**
 * Matrix Chain Multiplication - Optimal Parenthesization
 *
 * DP State:
 *   m[i][j] = minimum scalar multiplications to compute product A[i]...A[j]
 *   s[i][j] = index k where the optimal split occurs (A[i]...A[k]) · (A[k+1]...A[j])
 *
 * Complexity: O(n³) time, O(n²) space
 * Note: We use long long for costs to prevent integer overflow on larger chains.
 */
void matrix_chain_order(const int* dims, int n, long long** m, int** s) {
    // Base case: multiplying a single matrix costs 0
    for (int i = 0; i < n; i++) {
        m[i][i] = 0;
    }
 
    // l = chain length (number of matrices in current subproblem)
    for (int l = 2; l <= n; l++) {
        // i = starting matrix index
        for (int i = 0; i <= n - l; i++) {
            int j = i + l - 1; // j = ending matrix index
            m[i][j] = LLONG_MAX;
 
            // Try every possible split point k between i and j
            for (int k = i; k < j; k++) {
                // Cost = left subchain + right subchain + final merge
                // dims[i] x dims[k+1]  *  dims[k+1] x dims[j+1]  →  dims[i] x dims[j+1]
                long long cost = m[i][k] + m[k+1][j] + (long long)dims[i] * dims[k+1] * dims[j+1];
 
                if (cost < m[i][j]) {
                    m[i][j] = cost;
                    s[i][j] = k;
                }
            }
        }
    }
}
 
/**
 * Recursively reconstruct optimal parentheses using the split table `s`.
 * Uses 1-based naming (M1, M2, ...) for readability.
 */
void print_optimal_parens(const int* const* s, int i, int j) {
    if (i == j) {
        printf("M%d", i + 1);
    } else {
        printf("(");
        print_optimal_parens(s, i, s[i][j]);
        printf(" · ");
        print_optimal_parens(s, s[i][j] + 1, j);
        printf(")");
    }
}
 
/**
 * Safe 2D array deallocation
 */
void free_dp_tables(long long** m, int** s, int n) {
    for (int i = 0; i < n; i++) {
        free(m[i]);
        free(s[i]);
    }
    free(m);
    free(s);
}
 
int main() {
    // Example: A(10x4), B(4x5), C(5x20), D(20x2), E(2x50)
    int dims[] = {10, 4, 5, 20, 2, 50};
    int n = sizeof(dims)/sizeof(dims[0]) - 1; // n = 5 matrices
 
    // Allocate DP tables on the heap
    long long** m = malloc(n * sizeof(long long*));
    int** s = malloc(n * sizeof(int*));
    for (int i = 0; i < n; i++) {
        m[i] = malloc(n * sizeof(long long));
        s[i] = malloc(n * sizeof(int));
    }
 
    // Run DP
    matrix_chain_order(dims, n, m, s);
 
    // Output results
    printf("Optimal Parenthesization: ");
    print_optimal_parens((const int* const*)s, 0, n - 1);
    printf("\nMinimum Scalar Multiplications: %lld\n", m[0][n-1]);
 
    // Cleanup
    free_dp_tables(m, s, n);
    return 0;
}

Practical Limits

Manual allocation is fastest. For n > 1000, consider 1D flattened arrays m[i*n + j] to improve cache locality.

Rust

floud-warshall-algorithm.rs

		use std::time::Instant;
 
/// Computes minimum scalar multiplication cost and optimal split points.
///
/// DP State:
///   m[i][j] = min cost to multiply matrices i..j
///   s[i][j] = optimal split index k
///
/// Complexity: O(n³) time, O(n²) space
/// Uses u64 for costs to safely handle large chains without overflow.
fn matrix_chain_order(dims: &[usize]) -> (Vec<Vec<u64>>, Vec<Vec<usize>>) {
    let n = dims.len() - 1;
    // Initialize DP tables with 0s
    let mut m = vec![vec![0u64; n]; n];
    let mut s = vec![vec![0usize; n]; n];
 
    // Iterate over chain lengths from 2 to n
    for length in 2..=n {
        // i = left index of the chain
        for i in 0..=(n - length) {
            let j = i + length - 1; // j = right index
            m[i][j] = u64::MAX;
 
            // Test every valid split point k
            for k in i..j {
                let cost = m[i][k] + m[k + 1][j] + (dims[i] * dims[k + 1] * dims[j + 1]) as u64;
                if cost < m[i][j] {
                    m[i][j] = cost;
                    s[i][j] = k;
                }
            }
        }
    }
 
    (m, s)
}
 
/// Recursively builds the optimal parenthesization string.
fn construct_parens(s: &[Vec<usize>], i: usize, j: usize) -> String {
    if i == j {
        // Base case: single matrix
        format!("M{}", i + 1)
    } else {
        // Recursive case: split at optimal k and wrap in parentheses
        let k = s[i][j];
        format!(
            "({} · {})",
            construct_parens(s, i, k),
            construct_parens(s, k + 1, j)
        )
    }
}
 
fn main() {
    // Example dimensions: A(10x4), B(4x5), C(5x20), D(20x2), E(2x50)
    let dims = vec![10, 4, 5, 20, 2, 50];
    let n = dims.len() - 1;
 
    let start = Instant::now();
    let (m, s) = matrix_chain_order(&dims);
    let duration = start.elapsed();
 
    let optimal_parens = construct_parens(&s, 0, n - 1);
    let min_cost = m[0][n - 1];
 
    println!("Optimal Parenthesization: {}", optimal_parens);
    println!("Minimum Scalar Multiplications: {}", min_cost);
    println!(
        "DP Execution Time: {:.3} ms",
        duration.as_secs_f64() * 1000.0
    );
}

Practical Limits

Zero-cost abstractions make it ~2-5× faster than Python. For extreme n, replace Vec<Vec<>> with vec![0u64; nn] and compute index in + j.

Sample Output

		N     | Pattern      | Cost       | Time(ms)
---------------------------------------------
5     | Fixed        | 1320       | 0.012
 
🔍 Verified: ((M1 · (M2 · (M3 · M4))) · M5) (Cost: 1320)
 
20    | Uniform      | 241405     | 0.176
50    | Uniform      | 929424     | 2.231
100   | Uniform      | 1535972    | 16.081

Optimizations

Knuth Optimization: If the split table satisfies monotonicity ( $s [i] [j - 1] \leq s [i] [j] \leq s [i + 1] [j]$ ), time can be reduced to O(n²). This holds for many real-world dimension sequences but requires proof per instance.

Session 9

Floyd and Warshall's Algorithm for All Pair Shortest Path Algorithms

Session 11

Optimal Binary Search Tree