Comparing Naive and Strassen’s Matrix Multiplication Algorithms

Matrix multiplication is a fundamental operation in many fields, including computer graphics, scientific computing, machine learning and more. This post compares two matrix multiplication algorithms — the traditional (naive) matrix multiplication and Strassen’s matrix multiplication — to understand their differences in terms of time complexity, space complexity, execution time, and memory usage.

Prerequisite: Strassen’s Algorithm for Matrix Multiplication

What is Matrix Multiplication? Why is it Important?

Matrix multiplication is an operation where two matrices are combined to produce a new matrix. If you have two matrices A and B of dimensions n x m and m x p, their product C, is a matrix of dimensions n x p. The value of each element C[i][j] is computed as follows:

C[i][j] = \sum_{k=0}^{m-1} A[i][k] \times B[k][j]

Applications of Matrix Multiplication

Matrix multiplication is used extensively in various domains:

Linear Algebra: Operations involving linear transformations, eigenvalues, and vector spaces.
Machine Learning: Essential for operations involving neural networks and optimization algorithms.
Data Science: Performing matrix operations for large datasets in data analysis.
Physics Simulations: Modeling and simulating physical systems.
Computer Graphics: Applying transformations like translation, rotation, and scaling to 3D models.
Scientific Computing: Employed in solving linear equations and simulations.

Traditional Matrix Multiplication vs Strassen’s Matrix Multiplication

Naive (Traditional) Matrix Multiplication

The naive algorithm for multiplying two n x n matrices involves iterating through each element of the resulting matrix and computing the sum of products. The time complexity of this method is O(n³).

Pseudocode

NAIVE_MATRIX_MULTIPLY(A, B):
    n = number of rows/columns in A (assuming square matrix)
    Initialize C as an n x n matrix with all elements set to 0

    for i from 0 to n-1:
        for j from 0 to n-1:
            C[i][j] = 0
            for k from 0 to n-1:
                C[i][j] = C[i][j] + (A[i][k] * B[k][j])
    
    return C

Strassen’s Matrix Multiplication

Strassen’s algorithm improves the naive method by reducing the number of multiplications required. It divides each matrix into smaller submatrices, multiplying them in a recursive manner using fewer operations. The time complexity of Strassen’s algorithm is O(n^2.81), which is faster than O(n³) for large matrices.

Pseudocode

STRASSEN_MATRIX_MULTIPLY(A, B):
    n = number of rows/columns in A (assuming square matrix)
    if n == 1:
        return A[0][0] * B[0][0]
    
    Partition A into 4 submatrices:
        A11, A12, A21, A22 (each of size n/2 x n/2)
    Partition B into 4 submatrices:
        B11, B12, B21, B22 (each of size n/2 x n/2)

    M1 = STRASSEN_MATRIX_MULTIPLY(A11 + A22, B11 + B22)
    M2 = STRASSEN_MATRIX_MULTIPLY(A21 + A22, B11)
    M3 = STRASSEN_MATRIX_MULTIPLY(A11, B12 - B22)
    M4 = STRASSEN_MATRIX_MULTIPLY(A22, B21 - B11)
    M5 = STRASSEN_MATRIX_MULTIPLY(A11 + A12, B22)
    M6 = STRASSEN_MATRIX_MULTIPLY(A21 - A11, B11 + B12)
    M7 = STRASSEN_MATRIX_MULTIPLY(A12 - A22, B21 + B22)

    C11 = M1 + M4 - M5 + M7
    C12 = M3 + M5
    C21 = M2 + M4
    C22 = M1 - M2 + M3 + M6

    Combine C11, C12, C21, C22 to form C (n x n matrix)

    return C

Time and Space Complexity

Naive Method

Time Complexity: The algorithm has O(n³) time complexity due to three nested loops.
Space Complexity: O(n²) is used to store the result matrix.

Strassen’s Method

Time Complexity: Reduced to O(n^2.81) because it optimizes the number of multiplications required.
Space Complexity: O(n²), as additional space is needed for temporary and sub matrices generated during the recursive splitting.

Comparing Execution Time

Let’s implement the two algorithms in C++ and compare their execution time.

Note: The results obtained may not accurately reflect exact execution times due to factors such as platform differences, caching mechanisms, and system optimizations.

Code Implementation

#include <iostream>
#include <vector>
#include <chrono>
using namespace std;
using namespace chrono;

// Function to print the matrix
void printMatrix(const vector<vector<int>>& matrix) {
    int n = (int) matrix.size();
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            cout << matrix[i][j] << " ";
        }
        cout << endl;
    }
}

// Function to add two matrices
vector<vector<int>> add(const vector<vector<int>>& A, const vector<vector<int>>& B) {
    int n = (int) A.size();
    vector<vector<int>> result(n, vector<int>(n));
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            result[i][j] = A[i][j] + B[i][j];
        }
    }
    
    return result;
}

// Function to subtract two matrices
vector<vector<int>> subtract(const vector<vector<int>>& A, const vector<vector<int>>& B) {
    int n = (int) A.size();
    vector<vector<int>> result(n, vector<int>(n));
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            result[i][j] = A[i][j] - B[i][j];
        }
    }
    
    return result;
}

// Function to split a matrix into 4 submatrices
void splitMatrix(const vector<vector<int>>& original, vector<vector<int>>& A11, vector<vector<int>>& A12,
                 vector<vector<int>>& A21, vector<vector<int>>& A22) {
    int newSize = (int) (original.size() / 2);
    for (int i = 0; i < newSize; i++) {
        for (int j = 0; j < newSize; j++) {
            A11[i][j] = original[i][j];
            A12[i][j] = original[i][j + newSize];
            A21[i][j] = original[i + newSize][j];
            A22[i][j] = original[i + newSize][j + newSize];
        }
    }
}

// Function to join 4 submatrices into a matrix
void joinMatrices(const vector<vector<int>>& A11, const vector<vector<int>>& A12,
                  const vector<vector<int>>& A21, const vector<vector<int>>& A22,
                  vector<vector<int>>& result) {
    int newSize = (int) A11.size();
    for (int i = 0; i < newSize; i++) {
        for (int j = 0; j < newSize; j++) {
            result[i][j] = A11[i][j];
            result[i][j + newSize] = A12[i][j];
            result[i + newSize][j] = A21[i][j];
            result[i + newSize][j + newSize] = A22[i][j];
        }
    }
}


// Strassen's matrix multiplication (for simplicity, this is for square matrices of size n)
vector<vector<int>> strassenMultiply(const vector<vector<int>>& A, const vector<vector<int>>& B) {
    int n = (int) A.size();
    
    // Base case
    if (n == 1) {
        return {{A[0][0] * B[0][0]}};
    }

    // Splitting matrices into submatrices
    int newSize = n / 2;
    vector<vector<int>> A11(newSize, vector<int>(newSize));
    vector<vector<int>> A12(newSize, vector<int>(newSize));
    vector<vector<int>> A21(newSize, vector<int>(newSize));
    vector<vector<int>> A22(newSize, vector<int>(newSize));
    vector<vector<int>> B11(newSize, vector<int>(newSize));
    vector<vector<int>> B12(newSize, vector<int>(newSize));
    vector<vector<int>> B21(newSize, vector<int>(newSize));
    vector<vector<int>> B22(newSize, vector<int>(newSize));

    splitMatrix(A, A11, A12, A21, A22);
    splitMatrix(B, B11, B12, B21, B22);

    // Computing the 7 products using Strassen's method
    vector<vector<int>> M1 = strassenMultiply(add(A11, A22), add(B11, B22));
    vector<vector<int>> M2 = strassenMultiply(add(A21, A22), B11);
    vector<vector<int>> M3 = strassenMultiply(A11, subtract(B12, B22));
    vector<vector<int>> M4 = strassenMultiply(A22, subtract(B21, B11));
    vector<vector<int>> M5 = strassenMultiply(add(A11, A12), B22);
    vector<vector<int>> M6 = strassenMultiply(subtract(A21, A11), add(B11, B12));
    vector<vector<int>> M7 = strassenMultiply(subtract(A12, A22), add(B21, B22));

    // Calculating C submatrices
    vector<vector<int>> C11 = add(subtract(add(M1, M4), M5), M7);
    vector<vector<int>> C12 = add(M3, M5);
    vector<vector<int>> C21 = add(M2, M4);
    vector<vector<int>> C22 = add(subtract(add(M1, M3), M2), M6);

    // Joining the 4 submatrices into the result matrix
    vector<vector<int>> C(n, vector<int>(n));
    joinMatrices(C11, C12, C21, C22, C);

    return C;
}


// Naive matrix multiplication
vector<vector<int>> naiveMultiply(const vector<vector<int>>& A, const vector<vector<int>>& B) {
    int n = (int) A.size();
    vector<vector<int>> C(n, vector<int>(n, 0));

    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            for (int k = 0; k < n; k++) {
                C[i][j] += A[i][k] * B[k][j];
            }
        }
    }
    
    return C;
}

// Function to measure execution time
void measureExecutionTime(const string& name, vector<vector<int>> (*multiplyFunc)(const vector<vector<int>>&, const vector<vector<int>>&), const vector<vector<int>>& A, const vector<vector<int>>& B) {
    auto start = high_resolution_clock::now();
    vector<vector<int>> C = multiplyFunc(A, B);
    auto end = high_resolution_clock::now();
    auto duration = duration_cast<microseconds>(end - start).count();
    cout << endl << name << " took " << duration << " microseconds.\n";

    cout << "Result Matrix C:" << endl;
    printMatrix(C);

}

int main() {
    int n = 4; // Example size, should be a power of 2 for Strassen's method
    vector<vector<int>> A(n, vector<int>(n, 1));
    vector<vector<int>> B(n, vector<int>(n, 1));
    
    cout << "Matrix A:" << endl;
    printMatrix(A);
    
    cout << endl << "Matrix B:" << endl;
    printMatrix(B);

    measureExecutionTime("Naive Multiplication", naiveMultiply, A, B);
    measureExecutionTime("Strassen's Multiplication", strassenMultiply, A, B);

    return 0;
}

Output

Matrix A:
1 1 1 1 
1 1 1 1 
1 1 1 1 
1 1 1 1 

Matrix B:
1 1 1 1 
1 1 1 1 
1 1 1 1 
1 1 1 1 

Naive Multiplication took 89 microseconds.
Result Matrix C:
4 4 4 4 
4 4 4 4 
4 4 4 4 
4 4 4 4 

Strassen's Multiplication took 347 microseconds.
Result Matrix C:
4 4 4 4 
4 4 4 4 
4 4 4 4 
4 4 4 4

While asymptotic analysis using time complexity notation, such as O(n³) for the naive method and O(n^2.81) for Strassen’s method, provides a theoretical measure of an algorithm’s efficiency, it may not accurately reflect the actual execution time and real-world performance.

In practice, factors such as constant overheads, memory access patterns, and system architecture play a significant role in determining how quickly an algorithm runs. Even though Strassen’s algorithm has a theoretically lower time complexity than the naive method, its overhead and complexity in implementation may result in slower execution, particularly for smaller matrices.

This highlights the importance of considering both theoretical complexity and practical execution characteristics when evaluating algorithms.

Comparing Memory Usage

Let’s now compare the memory usage of these matrix multiplication algorithms using C++.

However, note that the memory usage values are approximate and serve as rough estimates. The results may not reflect exact memory usage due to platform differences. For precise and detailed memory profiling, it’s recommended to use platform-specific profilers.

Code Implementation

#include <iostream>
#include <vector>
#include <chrono>
#include <sys/resource.h>
using namespace std;
using namespace chrono;

// Function to print the matrix
void printMatrix(const vector<vector<int>>& matrix) {
    int n = (int) matrix.size();
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            cout << matrix[i][j] << " ";
        }
        cout << endl;
    }
}

// Function to add two matrices
vector<vector<int>> add(const vector<vector<int>>& A, const vector<vector<int>>& B) {
    int n = (int) A.size();
    vector<vector<int>> result(n, vector<int>(n));
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            result[i][j] = A[i][j] + B[i][j];
        }
    }
    
    return result;
}

// Function to subtract two matrices
vector<vector<int>> subtract(const vector<vector<int>>& A, const vector<vector<int>>& B) {
    int n = (int) A.size();
    vector<vector<int>> result(n, vector<int>(n));
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            result[i][j] = A[i][j] - B[i][j];
        }
    }
    
    return result;
}

// Function to split a matrix into 4 submatrices
void splitMatrix(const vector<vector<int>>& original, vector<vector<int>>& A11, vector<vector<int>>& A12,
                 vector<vector<int>>& A21, vector<vector<int>>& A22) {
    int newSize = (int) (original.size() / 2);
    for (int i = 0; i < newSize; i++) {
        for (int j = 0; j < newSize; j++) {
            A11[i][j] = original[i][j];
            A12[i][j] = original[i][j + newSize];
            A21[i][j] = original[i + newSize][j];
            A22[i][j] = original[i + newSize][j + newSize];
        }
    }
}

// Function to join 4 submatrices into a matrix
void joinMatrices(const vector<vector<int>>& A11, const vector<vector<int>>& A12,
                  const vector<vector<int>>& A21, const vector<vector<int>>& A22,
                  vector<vector<int>>& result) {
    int newSize = (int) A11.size();
    for (int i = 0; i < newSize; i++) {
        for (int j = 0; j < newSize; j++) {
            result[i][j] = A11[i][j];
            result[i][j + newSize] = A12[i][j];
            result[i + newSize][j] = A21[i][j];
            result[i + newSize][j + newSize] = A22[i][j];
        }
    }
}


// Strassen's matrix multiplication (for simplicity, this is for square matrices of size n)
vector<vector<int>> strassenMultiply(const vector<vector<int>>& A, const vector<vector<int>>& B) {
    int n = (int) A.size();
    
    // Base case
    if (n == 1) {
        return {{A[0][0] * B[0][0]}};
    }

    // Splitting matrices into submatrices
    int newSize = n / 2;
    vector<vector<int>> A11(newSize, vector<int>(newSize));
    vector<vector<int>> A12(newSize, vector<int>(newSize));
    vector<vector<int>> A21(newSize, vector<int>(newSize));
    vector<vector<int>> A22(newSize, vector<int>(newSize));
    vector<vector<int>> B11(newSize, vector<int>(newSize));
    vector<vector<int>> B12(newSize, vector<int>(newSize));
    vector<vector<int>> B21(newSize, vector<int>(newSize));
    vector<vector<int>> B22(newSize, vector<int>(newSize));

    splitMatrix(A, A11, A12, A21, A22);
    splitMatrix(B, B11, B12, B21, B22);

    // Computing the 7 products using Strassen's method
    vector<vector<int>> M1 = strassenMultiply(add(A11, A22), add(B11, B22));
    vector<vector<int>> M2 = strassenMultiply(add(A21, A22), B11);
    vector<vector<int>> M3 = strassenMultiply(A11, subtract(B12, B22));
    vector<vector<int>> M4 = strassenMultiply(A22, subtract(B21, B11));
    vector<vector<int>> M5 = strassenMultiply(add(A11, A12), B22);
    vector<vector<int>> M6 = strassenMultiply(subtract(A21, A11), add(B11, B12));
    vector<vector<int>> M7 = strassenMultiply(subtract(A12, A22), add(B21, B22));

    // Calculating C submatrices
    vector<vector<int>> C11 = add(subtract(add(M1, M4), M5), M7);
    vector<vector<int>> C12 = add(M3, M5);
    vector<vector<int>> C21 = add(M2, M4);
    vector<vector<int>> C22 = add(subtract(add(M1, M3), M2), M6);

    // Joining the 4 submatrices into the result matrix
    vector<vector<int>> C(n, vector<int>(n));
    joinMatrices(C11, C12, C21, C22, C);

    return C;
}


// Naive matrix multiplication
vector<vector<int>> naiveMultiply(const vector<vector<int>>& A, const vector<vector<int>>& B) {
    int n = (int) A.size();
    vector<vector<int>> C(n, vector<int>(n, 0));

    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            for (int k = 0; k < n; k++) {
                C[i][j] += A[i][k] * B[k][j];
            }
        }
    }
    
    return C;
}


// Function to get memory usage
size_t getMemoryUsage() {
    struct rusage usage;
    getrusage(RUSAGE_SELF, &usage);
    return usage.ru_maxrss; // Memory usage in kilobytes (KB)
}


void measureMemoryUsage(const string& name, vector<vector<int>> (*multiplyFunc)(const vector<vector<int>>&, const vector<vector<int>>&), const vector<vector<int>>& A, const vector<vector<int>>& B) {
    size_t initialMemory = getMemoryUsage();
    vector<vector<int>> C = multiplyFunc(A, B);
    size_t finalMemory = getMemoryUsage();
    cout << endl << name << " memory usage: " << (finalMemory - initialMemory) << " KB\n";
    
    cout << "Result Matrix C:" << endl;
    printMatrix(C);
}

int main() {
    int n = 4; // Example size, should be a power of 2 for simplicity when using Strassen's method
    vector<vector<int>> A(n, vector<int>(n, 1));
    vector<vector<int>> B(n, vector<int>(n, 1));
    
    cout << "Matrix A:" << endl;
    printMatrix(A);
    
    cout << endl << "Matrix B:" << endl;
    printMatrix(B);

    measureMemoryUsage("Naive Multiplication", naiveMultiply, A, B);
    measureMemoryUsage("Strassen's Multiplication", strassenMultiply, A, B);

    return 0;
}

Output

Matrix A:
1 1 1 1 
1 1 1 1 
1 1 1 1 
1 1 1 1 

Matrix B:
1 1 1 1 
1 1 1 1 
1 1 1 1 
1 1 1 1 

Naive Multiplication memory usage: 0 KB
Result Matrix C:
4 4 4 4 
4 4 4 4 
4 4 4 4 
4 4 4 4 

Strassen's Multiplication memory usage: 4096 KB
Result Matrix C:
4 4 4 4 
4 4 4 4 
4 4 4 4 
4 4 4 4

In above example, the memory usage for naive matrix multiplication was minimal (0 KB reported). This result may vary with different systems and platforms, but generally, the naive approach uses less memory because it doesn’t require additional storage beyond the result matrix.
Strassen’s algorithm on the other hand showed significantly higher memory usage (4096 KB). This is because Strassen’s method requires additional space for intermediate matrices generated at each recursive level. And each recursive call further splits these matrices, increasing memory demand. Further, for very large matrices, Strassen’s method can result in much higher memory usage compared to the naive method.

In practice, Strassen’s algorithm becomes more efficient for larger matrices. However, this can vary based on hardware and specific optimizations (e.g., parallel computing or memory access optimizations). Generally, many implementations use a hybrid approach, combining Strassen’s method with the naive method. They switch to the naive multiplication when the matrix size becomes small enough to avoid the overhead of further recursion.

The naive matrix multiplication algorithm is generally more memory-efficient than Strassen’s algorithm for small to medium-sized matrices. However, as matrix size increases, the time efficiency of Strassen’s algorithm can become more advantageous despite the increased memory usage.

To conclude, both Strassen’s and Naive matrix multiplication algorithms have their own advantages and use cases. The naive method is simple, making it useful for smaller computations. Strassen’s algorithm, on the other hand, reduces computational complexity but adds complexity to implementation and memory usage.

What is Matrix Multiplication? Why is it Important?

Traditional Matrix Multiplication vs Strassen’s Matrix Multiplication

Time and Space Complexity

Comparing Execution Time

Comparing Memory Usage

Share this:

Leave a Reply Cancel reply

Leave a Reply