Linear Transformations — Bench

What a Linear Transformation Is

A transformation T takes vectors as input and produces vectors as output: T: ℝⁿ → ℝᵐ.

It’s linear if it satisfies two conditions:

T(u + v) = T(u) + T(v)     (additivity)
T(cu) = c T(u)              (homogeneity)

Combined: T(cu + dv) = cT(u) + dT(v) for any scalars c, d.

What this rules out: any transformation that shifts the origin. T(0) must equal 0 for a linear transformation. So translation (moving every point by a fixed vector) is not linear — but it’s affine.

Key fact: every linear transformation from ℝⁿ to ℝᵐ can be represented by an m×n matrix. The transformation is the matrix.

The Matrix of a Transformation

To find the matrix A representing transformation T:

Apply T to each standard basis vector
The results are the columns of A

If T(î) = (2, 0) and T(ĵ) = (0, 3):
A = [2 0]
    [0 3]

To transform any vector v, compute Av:

A × (x, y)ᵀ = (2x, 3y)ᵀ

This stretches x by 2 and y by 3.

Geometric Transformations in 2D

Scaling

A = [sx  0]     scales x by sx, y by sy
    [ 0 sy]

Rotation by angle θ (counterclockwise)

A = [cos θ  −sin θ]
    [sin θ   cos θ]

90° CCW: A = [0 −1]     (x → y axis, y → −x axis)
             [1  0]

Why this works: the columns of A are where î and ĵ land after rotation.

Reflection over x-axis

A = [1  0]     (x unchanged, y negated)
    [0 −1]

Reflection over y-axis

A = [−1 0]
    [ 0 1]

Reflection over y = x

A = [0 1]     (swap x and y)
    [1 0]

Shear (horizontal)

A = [1 k]     (x → x + ky, y unchanged)
    [0 1]

Shear slides layers of space parallel to an axis. Like riffling a deck of cards.

Composition of Transformations

Applying transformation B then A is the same as applying the single transformation AB:

A(Bv) = (AB)v

Matrix multiplication is function composition. This is why the order matters — B then A ≠ A then B.

Rotate 45°, then scale by 2:
= Scale × Rotate
= [2 0] × [cos45  −sin45]
  [0 2]   [sin45   cos45]

A single matrix encodes the combined effect of many sequential transformations.

Projection

Projection onto a line (or subspace) maps every vector to its closest point in that subspace.

Projection of v onto unit vector û:

proj_û(v) = (v · û) û

Projection matrix onto a subspace spanned by columns of A:

P = A(AᵀA)⁻¹Aᵀ

Properties of a projection matrix:

P² = P (projecting twice = projecting once)
Pᵀ = P (symmetric)

Projection appears in: least squares regression (projecting b onto the column space of A), PCA (projecting data onto principal components), signal decomposition.

Null Space and Column Space

Column space (image): the set of all vectors Av can produce — all linear combinations of A’s columns. Dimension = rank(A).

Null space (kernel): all vectors v such that Av = 0 — inputs that get “crushed” to zero. Dimension = nullity(A).

Rank-nullity theorem: rank + nullity = n (number of columns).

If A is 3×4 with rank 2:
nullity = 4 − 2 = 2
The null space is 2-dimensional

Geometric meaning: the null space is the “lost” information — all directions that get collapsed to zero. A singular matrix has a non-trivial null space; it squashes some dimension flat.

The Four Fundamental Subspaces

Every matrix A has four associated subspaces (Strang’s framework):

Subspace	Definition	Dimension
Column space C(A)	span of columns of A	rank(r)
Null space N(A)	solutions to Ax = 0	n − r
Row space C(Aᵀ)	span of rows of A	rank(r)
Left null space N(Aᵀ)	solutions to Aᵀy = 0	m − r

The column space and left null space are orthogonal complements in ℝᵐ. The row space and null space are orthogonal complements in ℝⁿ. These four subspaces partition the domain and codomain cleanly.

Affine Transformations

A linear transformation followed by a translation:

T(v) = Av + b

Not linear (shifts the origin), but the most general transformation that preserves straight lines and parallelism. Used everywhere in computer graphics and robotics.

Homogeneous coordinates represent affine transformations as linear ones by embedding ℝⁿ in ℝⁿ⁺¹:

[x']   [A | b] [x]
[y'] = [    ] [y]
[1 ]   [0 | 1] [1]

Now translation is also a matrix multiplication. All transformations compose cleanly.

Why Linearity Matters

Linear transformations preserve the structure of vector spaces — they map lines to lines, the origin to the origin, and parallel lines to parallel lines. This makes them tractable — you can understand them completely by knowing what they do to a basis.

Non-linear transformations are vastly harder to analyse. Most of applied mathematics is, at its core, the art of approximating non-linear things with linear ones locally. Derivatives are linear approximations to functions. Neural networks stack linear transformations with non-linear activations between them.

Understanding linear transformations is understanding the local geometry of everything else.