Monday, July 7, 2025

Motivation of matrix multiplication

Inspired by this tweet.

Explain this claim: "matrix multiplication is Like That because it is a definition of multiplication over matrices that satisfies some of the same properties we care about over real numbers"

Matrix multiplication isn’t “mystical” or arbitrary—it’s chosen so that when you represent linear transformations as matrices, composing those transformations corresponds exactly to multiplying their matrices, and so that the algebraic rules you like for real numbers carry over. Here’s why the definition looks the way it does:


1. Matrices as linear maps

  • Interpretation An $m\times n$ matrix $A$ represents a linear map

    $$ A: \mathbb R^n \to \mathbb R^m, $$

    sending each vector $x\in\mathbb R^n$ to $A,x\in\mathbb R^m$.

  • Composition If you then apply another linear map $B: \mathbb R^p \to \mathbb R^n$ after $A$, you get

    $$ A\circ B: \mathbb R^p \xrightarrow{,B,} \mathbb R^n \xrightarrow{,A,} \mathbb R^m. $$

    We want the matrix of $A\circ B$ to be exactly the product of the matrices of $A$ and $B$.


2. Desired algebraic properties

To mirror the familiar arithmetic of real numbers, we impose the following on “multiplication”:

  1. Associativity $(A,B),C = A,(B,C)$ so you can compose several maps without worrying about how you parenthesize.

  2. Distributivity $A,(B + C) = A,B + A,C,\quad (A + B),C = A,C + B,C$ so matrix multiplication plays nicely with matrix addition.

  3. Identity element There should be an “$I$” so that $I,A = A = A,I,$ corresponding to the identity map on $\mathbb R^n$.

  4. Compatibility with scalars $(\lambda A),B = \lambda,(A,B) = A,(\lambda B)$ for any real $\lambda$.

Together, these axioms make the set of all $n\times n$ matrices into a ring (in fact, a non-commutative algebra) much like the real numbers.


3. Deriving the “row-by-column” rule

Let

$$ A = \bigl[a_{ij}\bigr]{m\times n},\quad B = \bigl[b{jk}\bigr]_{n\times p}. $$

We want the $(i,k)$-entry of the product $C = A,B$ to be exactly the result of doing:

  1. Take the $i$th row of $A$: $\bigl(a_{i1},a_{i2},\dots,a_{in}\bigr)$.

  2. Take the $k$th column of $B$: $\bigl(b_{1k},b_{2k},\dots,b_{nk}\bigr)^{T}$.

  3. Compute their dot-product:

    $$ c_{ik} ;=; \sum_{j=1}^n a_{ij},b_{jk}. $$

Why a dot-product? Because applying $B$ to the $k$th standard basis vector $e_k$ picks out its $k$th column, and then $A$ acts on that vector by taking those linear combinations of its rows.


4. What you “get for free”

Because of this definition:

  • Composition ↔ multiplication The matrix of $A\circ B$ is $A,B$.

  • All the ring axioms hold You can add, multiply, and scale matrices just like numbers, except multiplication may not commute ($A,B\neq B,A$ in general).

  • Powerful algebraic structure You can talk about invertibility (when there’s a two-sided “matrix reciprocal”), eigen-theory, determinant, trace, etc., all of which generalize properties of numbers.


In a nutshell

Matrix multiplication is “like that” because it’s defined to be the operation on arrays of numbers which (a) models composition of linear maps and (b) satisfies the familiar algebraic laws (associativity, distributivity, identity, scalar compatibility) we depend on when working with real numbers.

No comments:

Post a Comment