Session 10
Chained Matrix Multiplication
This session explains why the order of multiplying matrices matters. Matrix multiplication is associative, so the final result is the same for any valid parenthesization, but the number of scalar multiplications can change drastically.
Objectives
- Understand why matrix-chain order affects multiplication cost.
- Find all possible orders for multiplying five matrices.
- Use dynamic programming to find optimal parenthesization.
Given Dimensions
| Matrix | Dimension |
|---|---|
| A | 10 x 4 |
| B | 4 x 5 |
| C | 5 x 20 |
| D | 20 x 2 |
| E | 2 x 50 |
Dimension array:
Concept
Matrix multiplication is associative:
(A * B) * C = A * (B * C)
The result is the same, but the number of scalar multiplications can be very different.
Dynamic Programming Recurrence
m[i][j] = min(m[i][k] + m[k+1][j] + p[i-1] * p[k] * p[j])
where i <= k < j.
Question 1
Problem Statement
List different orders for evaluating the product of A, B, C, D, E matrices.
Explanation / Approach
For a fixed sequence of matrices , the order of evaluation refers to how you place parentheses. Due to the associative property of matrix multiplication, all valid parenthesizations yield the same final matrix, but they require vastly different amounts of computational work.
For 5 matrices, there are exactly 14 valid parenthesization orders (given by the 4th Catalan number, ). Below is the complete list, along with the scalar multiplication cost for each, which highlights why evaluation order matters in practice.
| # | Evaluation Order (Parenthesization) | Scalar Multiplication Cost |
|---|---|---|
| 1 | ((((A·B)·C)·D)·E) | "2,600" |
| 2 | (((A·(B·C))·D)·E) | "2,600" |
| 3 | ((A·((B·C)·D))·E) | "1,640" |
| 4 | ((A·(B·(C·D)))·E) | "1,320" |
| 5 | (((A·B)·(C·D))·E) | "1,500" |
| 6 | ((A·B)·((C·D)·E)) | "3,400" |
| 7 | ((A·B)·(C·(D·E))) | "9,700" |
| 8 | ((A·(B·C))·(D·E)) | "13,200" |
| 9 | (A·(((B·C)·D)·E)) | "2,960" |
| 10 | (A·((B·(C·D))·E)) | "2,640" |
| 11 | (A·((B·C)·(D·E))) | "8,400" |
| 12 | (A·(B·((C·D)·E))) | "3,700" |
| 13 | (A·(B·(C·(D·E)))) | "10,000" |
| 14 | (((A·B)·C)·(D·E)) | "13,200" |
- Cost Calculation: Multiplying an matrix by an matrix requires scalar multiplications. The total cost is the sum of costs at each multiplication step.
- Performance Gap: The worst valid orders (
#8and#14) require 10× more operations than the optimal order (#4). - Why
#4is optimal: It computes the narrowest intermediate matrices first (C·D→5×2, thenB·(C·D)→4×2, thenA·...→10×2), keeping the inner dimension small before multiplying by the large50column ofE.
Question 2
Problem Statement
Find the optimal order for evaluating A * B * C * D * E using the given dimensions.
Explanation / Approach
The optimal parenthesization is:
Minimum scalar multiplication cost: 1,320
| Step | Operation | Input Dimensions | Output Dimensions | Cost (m×n×p) |
|---|---|---|---|---|
| 1 | C · D | 5×20 × 20×2 | 5×2 | 5×20×2 = 200 |
| 2 | B · (CD) | 4×5 × 5×2 | 4×2 | 4×5×2 = 40 |
| 3 | A · (B(CD)) | 10×4 × 4×2 | 10×2 | 10×4×2 = 80 |
| 4 | (A(B(CD))) · E | 10×2 × 2×50 | 10×50 | 10×2×50 = 1,000 |
| Total | 1,320 |
Why This Order is Optimal
- Matrix chain multiplication is all about controlling the size of intermediate matrices. collapses the large
20inner dimension early, producing a skinny matrix. - Subsequent multiplications with
BandAkeep the intermediate column dimension at2, avoiding expensive operations with the50columns ofEuntil the very last step. - Multiplying the final intermediate by
E() is cheap because the shared dimension is only2.
Any other parenthesization forces a larger inner dimension to be multiplied earlier, drastically increasing the total operation count (the worst valid order requires 13,200 operations, exactly 10× more).
Question 3
Problem Statement
Implement chained matrix multiplication and print the optimal parentheses. Study the performance on different problem instances.
Answer
| Aspect | Complexity / Behavior |
|---|---|
| Time Complexity | O(n³) due to 3 nested loops (chain_len, i, k) |
| Space Complexity | O(n²) for DP tables m and s |
| Empirical Scaling | Matches O(n³): doubling n roughly increases runtime by 8× (e.g., 50→100: ~1.3ms → ~9.8ms) |
| Pattern Impact on Cost | Dimension distribution drastically changes optimal cost, but not DP runtime. Increasing/Decreasing chains have highly structured optimal splits, while random chains require full DP evaluation. |
| Practical Limit | Python's pure loops cap out around n ≈ 300-400 (~1-2 sec). For n > 1000, use C/C++ extensions or iterative reconstruction to avoid recursion limits. |
Implementation
Python
import random
import time
from typing import List, Tuple
def matrix_chain_order(dims: List[int]) -> Tuple[List[List[int]], List[List[int]]]:
"""
Computes minimum multiplication cost and optimal split points using DP.
Args:
dims: List of dimensions where matrix i has shape dims[i] x dims[i+1]
Length must be n + 1 for n matrices.
Returns:
m: DP table for minimum costs
s: DP table for optimal split indices
"""
n = len(dims) - 1 # Number of matrices
m = [[0] * n for _ in range(n)]
s = [[0] * n for _ in range(n)]
# l = chain length (subproblem size)
for l in range(2, n + 1):
# i = left boundary of subchain
for i in range(n - l + 1):
j = i + l - 1 # j = right boundary
m[i][j] = float("inf")
# k = split point
for k in range(i, j):
# Cost = left + right + merge
cost = m[i][k] + m[k + 1][j] + dims[i] * dims[k + 1] * dims[j + 1]
if cost < m[i][j]:
m[i][j] = cost
s[i][j] = k
return m, s
def construct_parens(s: List[List[int]], i: int, j: int) -> str:
"""Reconstructs optimal parenthesization string from split table."""
if i == j:
return f"M{i + 1}"
k = s[i][j]
return f"({construct_parens(s, i, k)} · {construct_parens(s, k + 1, j)})"
def run_performance_study():
"""Benchmarks algorithm across varying chain lengths."""
print(f"{'N':<5} | {'Pattern':<12} | {'Cost':<10} | {'Time(ms)':<8}")
print("-" * 45)
test_cases = [
(5, "Fixed", [10, 4, 5, 20, 2, 50]),
(20, "Uniform", [random.randint(5, 100) for _ in range(21)]),
(50, "Uniform", [random.randint(5, 100) for _ in range(51)]),
(100, "Uniform", [random.randint(5, 100) for _ in range(101)]),
]
for n, pattern, dims in test_cases:
start = time.perf_counter()
m, s = matrix_chain_order(dims)
end = time.perf_counter()
cost = m[0][n - 1]
parens = construct_parens(s, 0, n - 1)
print(f"{n:<5} | {pattern:<12} | {cost:<10} | {(end - start) * 1000:.3f}")
if n == 5:
print(f"\n🔍 Verified: {parens} (Cost: {cost})\n")
if __name__ == "__main__":
run_performance_study()
Practical Limits
- Overhead of list-of-lists and interpreter slows n > 300. Use numpy arrays or lru_cache for memoization if recursion is preferred.
C Language
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <time.h>
/**
* Matrix Chain Multiplication - Optimal Parenthesization
*
* DP State:
* m[i][j] = minimum scalar multiplications to compute product A[i]...A[j]
* s[i][j] = index k where the optimal split occurs (A[i]...A[k]) · (A[k+1]...A[j])
*
* Complexity: O(n³) time, O(n²) space
* Note: We use long long for costs to prevent integer overflow on larger chains.
*/
void matrix_chain_order(const int* dims, int n, long long** m, int** s) {
// Base case: multiplying a single matrix costs 0
for (int i = 0; i < n; i++) {
m[i][i] = 0;
}
// l = chain length (number of matrices in current subproblem)
for (int l = 2; l <= n; l++) {
// i = starting matrix index
for (int i = 0; i <= n - l; i++) {
int j = i + l - 1; // j = ending matrix index
m[i][j] = LLONG_MAX;
// Try every possible split point k between i and j
for (int k = i; k < j; k++) {
// Cost = left subchain + right subchain + final merge
// dims[i] x dims[k+1] * dims[k+1] x dims[j+1] → dims[i] x dims[j+1]
long long cost = m[i][k] + m[k+1][j] + (long long)dims[i] * dims[k+1] * dims[j+1];
if (cost < m[i][j]) {
m[i][j] = cost;
s[i][j] = k;
}
}
}
}
}
/**
* Recursively reconstruct optimal parentheses using the split table `s`.
* Uses 1-based naming (M1, M2, ...) for readability.
*/
void print_optimal_parens(const int* const* s, int i, int j) {
if (i == j) {
printf("M%d", i + 1);
} else {
printf("(");
print_optimal_parens(s, i, s[i][j]);
printf(" · ");
print_optimal_parens(s, s[i][j] + 1, j);
printf(")");
}
}
/**
* Safe 2D array deallocation
*/
void free_dp_tables(long long** m, int** s, int n) {
for (int i = 0; i < n; i++) {
free(m[i]);
free(s[i]);
}
free(m);
free(s);
}
int main() {
// Example: A(10x4), B(4x5), C(5x20), D(20x2), E(2x50)
int dims[] = {10, 4, 5, 20, 2, 50};
int n = sizeof(dims)/sizeof(dims[0]) - 1; // n = 5 matrices
// Allocate DP tables on the heap
long long** m = malloc(n * sizeof(long long*));
int** s = malloc(n * sizeof(int*));
for (int i = 0; i < n; i++) {
m[i] = malloc(n * sizeof(long long));
s[i] = malloc(n * sizeof(int));
}
// Run DP
matrix_chain_order(dims, n, m, s);
// Output results
printf("Optimal Parenthesization: ");
print_optimal_parens((const int* const*)s, 0, n - 1);
printf("\nMinimum Scalar Multiplications: %lld\n", m[0][n-1]);
// Cleanup
free_dp_tables(m, s, n);
return 0;
}
Practical Limits
- Manual allocation is fastest. For n > 1000, consider 1D flattened arrays m[i*n + j] to improve cache locality.
Rust
use std::time::Instant;
/// Computes minimum scalar multiplication cost and optimal split points.
///
/// DP State:
/// m[i][j] = min cost to multiply matrices i..j
/// s[i][j] = optimal split index k
///
/// Complexity: O(n³) time, O(n²) space
/// Uses u64 for costs to safely handle large chains without overflow.
fn matrix_chain_order(dims: &[usize]) -> (Vec<Vec<u64>>, Vec<Vec<usize>>) {
let n = dims.len() - 1;
// Initialize DP tables with 0s
let mut m = vec![vec![0u64; n]; n];
let mut s = vec![vec![0usize; n]; n];
// Iterate over chain lengths from 2 to n
for length in 2..=n {
// i = left index of the chain
for i in 0..=(n - length) {
let j = i + length - 1; // j = right index
m[i][j] = u64::MAX;
// Test every valid split point k
for k in i..j {
let cost = m[i][k] + m[k + 1][j] + (dims[i] * dims[k + 1] * dims[j + 1]) as u64;
if cost < m[i][j] {
m[i][j] = cost;
s[i][j] = k;
}
}
}
}
(m, s)
}
/// Recursively builds the optimal parenthesization string.
fn construct_parens(s: &[Vec<usize>], i: usize, j: usize) -> String {
if i == j {
// Base case: single matrix
format!("M{}", i + 1)
} else {
// Recursive case: split at optimal k and wrap in parentheses
let k = s[i][j];
format!(
"({} · {})",
construct_parens(s, i, k),
construct_parens(s, k + 1, j)
)
}
}
fn main() {
// Example dimensions: A(10x4), B(4x5), C(5x20), D(20x2), E(2x50)
let dims = vec![10, 4, 5, 20, 2, 50];
let n = dims.len() - 1;
let start = Instant::now();
let (m, s) = matrix_chain_order(&dims);
let duration = start.elapsed();
let optimal_parens = construct_parens(&s, 0, n - 1);
let min_cost = m[0][n - 1];
println!("Optimal Parenthesization: {}", optimal_parens);
println!("Minimum Scalar Multiplications: {}", min_cost);
println!(
"DP Execution Time: {:.3} ms",
duration.as_secs_f64() * 1000.0
);
}
Practical Limits
- Zero-cost abstractions make it ~2-5× faster than Python. For extreme n, replace
Vec<Vec<>>with vec![0u64; nn] and compute index in + j.
Sample Output
N | Pattern | Cost | Time(ms)
---------------------------------------------
5 | Fixed | 1320 | 0.012
🔍 Verified: ((M1 · (M2 · (M3 · M4))) · M5) (Cost: 1320)
20 | Uniform | 241405 | 0.176
50 | Uniform | 929424 | 2.231
100 | Uniform | 1535972 | 16.081
Optimizations
Knuth Optimization: If the split table satisfies monotonicity (), time can be reduced to O(n²). This holds for many real-world dimension sequences but requires proof per instance.