Math 511: Linear Algebra

7.5 Singular Value Decomposition

7.5.1 The Important Pieces of a Matrix¶

$$ \require{color} \definecolor{brightblue}{rgb}{.267, .298, .812} \definecolor{darkblue}{rgb}{0.0, 0.0, 1.0} \definecolor{palepink}{rgb}{1, .73, .8} \definecolor{softmagenta}{rgb}{.99,.34,.86} \definecolor{blueviolet}{rgb}{.537,.192,.937} \definecolor{jonquil}{rgb}{.949,.792,.098} \definecolor{shockingpink}{rgb}{1, 0, .741} \definecolor{royalblue}{rgb}{0, .341, .914} \definecolor{alien}{rgb}{.529,.914,.067} \definecolor{crimson}{rgb}{1, .094, .271} \def\ihat{\mathbf{\hat{\unicode{x0131}}}} \def\jhat{\mathbf{\hat{\unicode{x0237}}}} \def\khat{\mathrm{\hat{k}}} \def\tombstone{\unicode{x220E}} \def\contradiction{\unicode{x2A33}} $$

When analyzing our linear mathematical models we often need to describe an elaborate linear transformation in terms of a composition of simple linear transformations. The three simple linear transformation are:

A rotation in $\mathbb{R}^n$ represented by an orthogonal matrix.

A reflection with respect to a hyperplane is also represented by an orthogonal matrix.

Reflection in three dimensional space with respect to the xy-plane.

A scaling, that is a transformation that dilates or contracts the magnitude of vectors along a set of axes. A diagonal matrix represents as scaling transformation. (A diagonal matrix may be orthogonal too.)

Scaling the unit sphere by one half along the x-axis, and by two along both the y-axis and z-axis.

7.5.2 Generalizing Diagonalization¶

Our goal is to decompose a matrix, and thus the linear transformation it represents into a composition of these three simple transformations. This will give us a very clear idea of the geometry of a linear transformation.

We need to review and define some vocabulary to make our discussion as understandable as possible.

Definition of a Complete Set of Orthonormal Eigenvectors¶

A matrix $A\in\mathbb{C}^{n\times n}$ that has $n$ orthonormal eigenvectors associated with its eigenvalues is said to have a complete set of orthonormal eigenvectors.

This is an important property of a matrix. We will use this to construct the Singular Value Decomposition.

Definition of Diagonalizable¶

Matrix $A\in\mathbb{C}^{n\times n}$ is called diagonalizable if there exists a nonsingular matrix $X$ and a diagonal matrix $D$ such that

$$ X^{-1}AX = D $$

We say that matrix $X$ diagonalizes matrix $A$.

Not all matrices are diagonalizable. Matrices that are not diagonalizable are called defective. Even defective matrices can still be decomposed into a composition of simple transformations. This is why we prefer Singular Value Decomposition to Diagonalization. All matrices has a singular value decomposition.

Definition of skew-symmetric and skew-Hermitian matrices¶

A matrix $A\in\mathbb{C}^{n\times n}$ is called skew-symmetric if and only if $A^T = -A$.
A matrix $A\in\mathbb{C}^{n\times n}$ is called skew-Hermitian if and only if $A^H = -A$.

Recall¶

A matrix that is equal to its own transpose $A = A^T$ is called symmetric, and a matrix that is equal to its own Hermitian $A = A^H$ is called an Hermitian matrix.

Definition of Normal Matrices¶

Matrix $A\in\mathbb{R}^{n\times n}$ is an $n\times n$ is called a normal matrix if $AA^H = A^HA$.

A normal matrix commutes with its Hermitian counterpart. We discuss normal matrices because there are non-Hermitian matrices that possess a complete set of orthonormal eigenvectors. Skew-symmetric and skew-Hermitian matrices have a complete set of orthonormal eigenvectors.

Theorem 7.5.1¶

A matrix $A\in\mathbb{R}^{n\times n}$ is normal if and only if it possess a complete orthonormal set of eigenvectors.

Proof¶

$\Longleftarrow$

Any matrix with a complete set of orthonormal eigenvectors $\left\{\mathbf{\xi}_1,\ \mathbf{\xi}_2,\ \dots,\ \mathbf{\xi}_n\right\}$ has a an unitary diagonalizing matrix $U$ whose columns are the $n$ orthonormal eigenvectors.

$$ \begin{align*} U &= \begin{bmatrix} \mathbf{\xi}_1 & \mathbf{\xi}_2 & \dots & \mathbf{\xi}_n \end{bmatrix} \\ \\ A &= UDU^H \end{align*} $$

However it is not necessary that the diagonal matrix is Hermitian. Not all diagonal matrices are Hermitian. Hence

$$ A^H = \left(UDU^H\right)^H = U^HD^HU \neq A $$

However,

$$ \begin{align*} AA^H &= \left(UDU^H\right)\left(UDU^H\right)^H = UDU^HUD^HU^H = UDD^HU^H \\ \\ A^HA &= \left(UDU^H\right)^H\left(UDU^H\right) = UD^HU^HUDU^H = UD^HDU^H \end{align*} $$

We have that the product

$$ \begin{align*} D^HD &= \begin{bmatrix} \lambda_1 & & & & \\ & \lambda_2 & & & \\ & & \dots & & \\ & & & & \lambda_n \end{bmatrix}^H \begin{bmatrix} \lambda_1 & & & & \\ & \lambda_2 & & & \\ & & \dots & & \\ & & & & \lambda_n \end{bmatrix} \\ \\ &= \begin{bmatrix} \lambda_1^H & & & & \\ & \lambda_2^H & & & \\ & & \dots & & \\ & & & & \lambda_n^H \end{bmatrix} \begin{bmatrix} \lambda_1 & & & & \\ & \lambda_2 & & & \\ & & \dots & & \\ & & & & \lambda_n \end{bmatrix} \\ \\ &= \begin{bmatrix} |\lambda_1|^2 & & & & \\ & |\lambda_2|^2 & & & \\ & & \dots & & \\ & & & & |\lambda_n|^2 \end{bmatrix} = DD^H \end{align*} $$

Hence $AA^H = UDD^HU^H = UD^HDD^H = A^HA$, and $A$ is a normal matrix.

$\Longrightarrow$ If $n\times n$ matrix $A$ is normal, then $AA^H = A^HA$. Using the Schur Decomposition Theorem 6.4.3, we have that there is a unitary matrix $U$ such that $T = U^HAU$ is an upper triangular matrix. Hence

$$ T^HT = U^HA^HUU^HAU = U^HA^HAU = U^HAA^HU = U^HAUU^HA^HU = TT^H $$ Thus matrix $T$ is a normal matrix as well. If

$$ T = \begin{bmatrix} t_{11} & t_{12} & \dots & t_{1n} \\ 0 & t_{21} & \dots & t_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & t_{nn} \end{bmatrix}, $$

then

$$ \begin{align*} T^HT &= \begin{bmatrix} t_{11} & t_{12} & t_{13} & \dots & t_{1n} \\ 0 & t_{22} & t_{23} & \dots & t_{2n} \\ 0 & 0 & t_{33} & \dots & t_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & t_{nn} \end{bmatrix}^H \begin{bmatrix} t_{11} & t_{12} & t_{13} & \dots & t_{1n} \\ 0 & t_{22} & t_{23} & \dots & t_{2n} \\ 0 & 0 & t_{33} & \dots & t_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & t_{nn} \end{bmatrix} \\ \\ &= \begin{bmatrix} t_{11} & t_{12} & t_{13} & \dots & t_{1n} \\ 0 & t_{22} & t_{23} & \dots & t_{2n} \\ 0 & 0 & t_{33} & \dots & t_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & t_{nn} \end{bmatrix} \begin{bmatrix} t_{11} & t_{12} & t_{13} & \dots & t_{1n} \\ 0 & t_{22} & t_{23} & \dots & t_{2n} \\ 0 & 0 & t_{33} & \dots & t_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & t_{nn} \end{bmatrix}^H = TT^H \\ \end{align*} $$

Comparing the diagonal elements of each product we have

$$ \begin{align*} |t_{11}|^2 + |t_{12}|^2 + |t_{13}|^2 + \cdots + |t_{1n}|^2 &= |t_{11}|^2 \\ |t_{22}|^2 + |t_{23}|^2 + \cdots + |t_{2n}|^2 &= |t_{12}|^2 + |t_{22}|^2 \\ |t_{33}|^2 + \cdots + |t_{3n}|^2 &= |t_{13}|^2 + |t_{23}|^2 + |t_{33}|^2 \\ &\vdots \\ |t_{nn}|^2 &= |t_{1n}|^2 + |t_{2n}|^2 + |t_{3n}|^2 + |t_{nn}|^2 \\ \end{align*} $$

Subtracting $|t_{kk}|^2$ from both sides of equation $k$, for $1\le k\le n$ yields

$$ \begin{align*} |t_{12}|^2 + |t_{13}|^2 + \cdots + |t_{1n}|^2 &= 0 \\ |t_{23}|^2 + \cdots + |t_{2n}|^2 &= |t_{12}|^2 \\ \cdots + |t_{3n}|^2 &= |t_{13}|^2 + |t_{23}|^2 \\ &\vdots \\ 0 &= |t_{1n}|^2 + |t_{2n}|^2 + |t_{3n}|^2 + \cdots + |t_{(n-1)n}|^2 \\ \end{align*} $$

From these equations we conclude that for all $1\le i\neq j\le n$, $t_{ij}=0$. Therefore $T$ is a diagonal matrix and $U$ diagonalizes $A$. We have using matrix multiplication

$$ \begin{align*} AU &= A\begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 & \dots & \mathbf{u}_n \end{bmatrix} \\ \\ &= \begin{bmatrix} A\mathbf{u}_1 & A\mathbf{u}_2 & \dots & A\mathbf{u}_n \end{bmatrix} \\ \\ &= UT = \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 & \dots & \mathbf{u}_n \end{bmatrix} \begin{bmatrix} t_{11} & 0 & \dots & 0 \\ 0 & t_{22} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & t_{nn} \end{bmatrix} \\ \\ &= \begin{bmatrix} t_{11}\mathbf{u}_1 & t_{22}\mathbf{u}_2 & \dots & t_{nn}\mathbf{u}_n \end{bmatrix} \end{align*} $$

Thus for $1\le j\le n$ we have $A\mathbf{u}_j = t_{jj}\mathbf{u}_j$, and $\left\{ \mathbf{u}_1,\ \mathbf{u}_2,\ \dots,\ \mathbf{u}_n\right\}$ is an orthonormal set of vectors because $U$ is a unitary matrix. Hence $\left\{ \mathbf{u}_1,\ \mathbf{u}_2,\ \dots,\ \mathbf{u}_n\right\}$ is a complete set of orthonormal eigenvectors. $\tombstone$

This means that the linear transformation represented by a matrix normal matrix $A$ is a composition of a reflection or rotation, followed by a scaling transformation, and a reflection or rotation.

Not all matrices are square matrices, normal matrices, or Hermitian.¶

7.5.3 Positive Definite Matrices¶

Definition of Positive Definite¶

A square matrix whose eigenvalues are all non-negative is called a positive semi-definite or positive definite matrix.

Let $A\in\mathbb{R}^{m\times n}$ be a real $m\times n$ matrix. While this matrix may not be Hermitian, symmetric, or even square, the matrix $A^TA\in\mathbb{R}^{n\times n}$ is a real and symmetric. Even if matrix $A$ is defective, the product $A^TA$ is real and symmetric. Remember that real and symmetric matrices are also Hermitian. Therefore the eigenvalues of $A^TA$ are all real, and the eigenvectors of distinct eigenvalues are orthogonal. Furthermore, there is a real unitary matrix $V$ that diagonalizes matrix $A^TA$.

What is even more interesting is that the eigenvalues of matrix $A^TA$ are all non-negative. Matrix $A^TA$ is positive definite.

No matter how ill-behaved a real $m\times n$ matrix may get, $A^TA$ is a very nice real symmetric matrix with a real unitary diagonalizing matrix, and its eigenvalues are all non-negative.

We can order the eigenvalues of $A^TA$ in decreasing value so that we have eigenvalues

$$ \lambda_n \le \lambda_{n-1} \le \dots \le \lambda_2 \le \lambda_1 $$

We always want to order the eigenvalues of $A^TA$ from largest $\lambda_1$ to smallest $\lambda_n$. Let's label our real unitary diagonalizing matrix of $A^TA$, $V$. Thus

$$ A^TA = V\Lambda V^T = \begin{bmatrix} \mathbf{v}_1 &\ \mathbf{v}_2 &\ \dots &\ \mathbf{v}_n \end{bmatrix}\begin{bmatrix} \lambda_1 &\ 0 &\ \dots &\ 0 \\ 0 &\ \lambda_2 &\ \dots &\ 0\ \\ 0 &\ 0 &\ \ddots &\ 0 \\ 0 &\ 0 &\ \dots &\ \lambda_m \end{bmatrix}\begin{bmatrix} \mathbf{v}_1^T \\ \mathbf{v}_2^T \\ \vdots \\ \mathbf{v}_n^T \end{bmatrix} $$

Likewise matrix $AA^T\in\mathbb{R}^{m\times m}$ is a real symmetric, positive definite matrix. The nonzero eigenvalues of $AA^T$ are real and nonnegative, and the eigenvectors of distinct eigenvalues are orthogonal. Thus there is an orthogonal diagonalizing matrix $U$ such that,

$$ AA^T = U\Lambda' U^T = \begin{bmatrix} \mathbf{u}_1 &\ \mathbf{u}_2 &\ \dots &\ \mathbf{u}_n \end{bmatrix}\begin{bmatrix} \lambda_1' &\ 0 &\ \dots &\ 0 \\ 0 &\ \lambda_2' &\ \dots &\ 0\ \\ 0 &\ 0 &\ \ddots &\ 0 \\ 0 &\ 0 &\ \dots &\ \lambda_m' \end{bmatrix}\begin{bmatrix} \mathbf{u}_1^T \\ \mathbf{u}_2^T \\ \vdots \\ \mathbf{u}_m^T \end{bmatrix} $$

Again we order the eigenvalues from larges $\lambda_1'$ to smallest $\lambda_m'$.

If $\lambda$ is a nonzero eigenvalue for $A^TA$ with associated eigenvector $\mathbf{x}$, then $A^TA\mathbf{x} = \lambda\mathbf{x}$. Multiplying both sides of this equation by matrix $A$ yields

$$ AA^T\left(A\mathbf{x}\right) = A\left(A^TA\mathbf{x}\right) = A\left(\lambda\mathbf{x}\right) = \lambda\left(A\mathbf{x}\right) $$

Thus nonzero value $\sigma$ is an eigenvalue of matrix $AA^T$ with associated eigenvector $A\mathbf{x}$. Thus the extra eigenvalues when $m\neq n$ are all zeros.

$$ \Lambda = \begin{bmatrix} \lambda_1 &\ &\ &\ \\ &\ \lambda_2 &\ &\ \\ &\ &\ \ddots &\ \\ &\ &\ &\ \lambda_r \\&\ &\ &\ &\ 0_{r+1} \\ &\ &\ &\ &\ &\ \ddots \\ &\ &\ &\ &\ &\ &\ 0_n \end{bmatrix}\neq \begin{bmatrix} \lambda_1 &\ &\ &\ \\ &\ \lambda_2 &\ &\ \\ &\ &\ \ddots &\ \\ &\ &\ &\ \lambda_r \\&\ &\ &\ &\ 0_{r+1} \\ &\ &\ &\ &\ &\ \ddots \\ &\ &\ &\ &\ &\ &\ 0_m \end{bmatrix} = \Lambda' $$

That is $\Lambda$ has $n-r$ zero eigenvalues along the diagonal and $\Lambda'$ has $m-r$ zero eigenvalues along the diagonal. $\Lambda = \Lambda'$ only when $m = n$.

Remember also these eigenvalues of $A^TA$ and $AA^T$ are all non-negative. Therefore we can compute the square root of each eigenvalue of $A^TA$, and $AA^T$.

Definition¶

The square roots of the eigenvalues $\lambda_i$ of both $A^TA$ and $AA^T$ are called the singular values $\sigma_i$ of matrix $A$.

$$ \sigma_i = \sqrt{\lambda_i},\qquad 1\le i\le \min\left\{ m, n \right\} $$

It should be clear now that matrix $A$ has $r = \text{rank}(A)$ nonzero singular values. These are the square roots of the nonzero eigenvalues of both $A^TA$ and $AA^T$. It also may have $\min\left\{ m, n \right\} - r$ zero singular values. The number of zero singular values is limited by the number or rows and columns of $m\times n$ matrix $A$.

7.5.4 A Vision of Linear Algebra¶

Dr. Gilbert Strang calls this decomposition of any matrix into simple transformations $a course in matrix theory and linear algebra$ A Vision of Linear Algebra.

Let us suppose for simplicity that matrix $A$ has full rank and $m\gt n$. That is all of the columns of $A$ are linearly independent. Then we want to find a set of orthogonal vectors $\left\{\mathbf{v}_1, \mathbf{v}_2,\dots,\mathbf{v}_n\right\}\in\mathbb{R}^n$ and orthogonal vectors $\left\{\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_m\right\}\in\mathbb{R}^m$ so that

$$ \begin{align*} A\mathbf{v}_1 &= \sigma_1\mathbf{u}_1 \\ \\ A\mathbf{v}_2 &= \sigma_2\mathbf{u}_2 \\ \\ &\ddots \\ \\ A\mathbf{v}_r &= \sigma_r\mathbf{u}_r \\ \\ A\mathbf{v}_{r+1} &= 0\mathbf{u}_{r+1} = \mathbf{0} \\ \\ &\ddots \\ \\ A\mathbf{v}_n &= 0\mathbf{u}_n = \mathbf{0} \end{align*} $$

Thus

$$ A\begin{bmatrix} \mathbf{v}_1 &\ \mathbf{v}_2 &\ \dots &\ \mathbf{v}_r \end{bmatrix} = \begin{bmatrix} \mathbf{u}_1 &\ \mathbf{u}_2 &\ \dots &\ \mathbf{u}_r \end{bmatrix}\begin{bmatrix} \sigma_1 &\ &\ &\ \\ \ &\ \sigma_2 &\ &\ \\ \ &\ &\ \ddots &\ \\ \ &\ &\ &\ &\ \sigma_r \end{bmatrix} $$

This results in

$$ AV' = U'\Sigma' $$

Here $\Sigma'$ is a $r\times r$ with nonzero singular values along the diagonal. We want to expand this equation to include an orthogonal basis for the null space of $AA^T$. Recall that $N(AA^T) = N(A^T)$. Thus if $\left\{\mathbf{u}_{r+1}, \dots, \mathbf{u}_m\right\}$ is a basis for $N(A^T)$, and we add enough rows of zeros to $\Sigma'$, then we well have our orthogonal matrix

$$ U = \begin{bmatrix} \mathbf{u}_1 &\ \dots &\ \mathbf{u}_r &\ \mathbf{u}_{r+1} &\ \dots &\ \mathbf{u}_m \end{bmatrix} $$

Similarly we expand the equation to include an orthogonal basis for the null space of $A^TA$. As $N(A^TA) = N(A)$, if $\left\{\mathbf{v}_{r+1}, \dots, \mathbf{v}_n\right\}$ is a basis for $N(A)$, and we add enough columns of zeros to $\Sigma'$, then we have our orthogonal matrix

$$ V = \begin{bmatrix} \mathbf{v}_1 &\ \dots &\ \mathbf{v}_r &\ \mathbf{v}_{r+1} &\ \dots &\ \mathbf{v}_n \end{bmatrix} $$

and our equation becomes

$$ AV = U\Sigma $$

where $m\times n$ matrix $\Sigma$ has the added rows and columns of zeros added to $\Sigma'$ for the null spaces of $A$ and $A^T$.

Since the sets of vectors $\left\{\mathbf{u}_1, \dots, \mathbf{u}_r\right\}$ and $\left\{\mathbf{u}_{r+1}, \dots, \mathbf{u}_m\right\}$ are orthonormal bases for orthogonal subspaces of $\mathbb{R}^m$, we have that $\left\{\mathbf{u}_1, \dots, \mathbf{u}_m\right\}$ is an orthonormal set and matrix $U$ is an orthogonal matrix.
Since the sets of vectors $\left\{\mathbf{v}_1, \dots, \mathbf{v}_r\right\}$ and $\left\{\mathbf{v}_{r+1}, \dots, \mathbf{v}_n\right\}$ are orthonormal bases for orthogonal subspaces of $\mathbb{R}^n$, we have that $\left\{\mathbf{v}_1, \dots, \mathbf{v}_n\right\}$ is an orthonormal set and matrix $V$ is an orthogonal matrix.

Multiplying both sides of $AV = U\Sigma$ by $V^{-1}$ results in

$$ A = U\Sigma V^T $$

Theorem 7.5.2¶

The SVD Theorem¶

If $A\in\mathbb{R}^{m\times n}$, then $A$ has a singular value decomposition.

$$ A = U\Sigma V^T $$

The columns of matrix $V$ are called right singular vectors.
The columns of matrix $U$ are called the left singular vectors.
The elements on the diagonal of $\Sigma$ are called the singular values of matrix $A$. They are the square roots of the eigenvalues of $A^TA$ and $AA^T$.

Proof¶

We showed in chapter 2 that $A^TA$ is a symmetric $n\times n$ matrix. Likewise $AA^T$ is an $m\times m$ symmetric matrix.

$$ \left(AA^T\right)^T = (A^T)^TA^T = AA^T\ \Large{\color{green}{\checkmark}} $$

This completes the proof because we already know that every real symmetric, positive definite matrix has a unitary diagonalizing matrix.$\tombstone$

7.5.5 Bases for the Fundamental Subspaces¶

Recall that we can decompose any real $m\times n$ matrix $A$ into its singular value decomposition,

$$ A = U\Sigma V^T $$

If we multiply both sides of equation $A=U\Sigma V^T$ on the left by $A^T$ one obtains

$$ \begin{align*} A^TA &= \left(U\Sigma V^T\right)^T\left(U\Sigma V^T\right) \\ \\ &= \left(V\Sigma^TU^T\right)\left(U\Sigma V^T\right) \\ \\ &= V\Sigma^T\left(U^TU\right)\Sigma V^T \\ \\ &= V\Sigma^T\Sigma V^T \\ \\ &= V\begin{bmatrix} \sigma_1^2 &\ &\ &\ &\ &\ \\ \ &\ \ddots &\ &\ &\ \\ \ &\ &\ \sigma_r^2 &\ &\ \\ \ &\ &\ &\ 0 &\ &\ \\ \ &\ &\ &\ &\ \ddots &\ \\ \ &\ &\ &\ &\ &\ 0 \end{bmatrix}V^T \end{align*} $$

Therefore matrix $V$ is an orthogonal diagonalizing matrix for matrix $A^TA$.

The columns of orthogonal matrix $V$ are the eigenvectors of $A^TA$.
The set $\left\{\sigma_1^2, \dots, \sigma_n^2\right\}$ are the associated eigenvalues of $A^TA$.
The singular values of matrix $A$ are the square roots of the eigenvalues of matrix $A^TA$.

If we multiply both sides of the equation $A = U\Sigma V^T$ on the right by $A^T$, then one obtains,

$$ \begin{align*} AA^T &= \left(U\Sigma V^T\right)\left(U\Sigma V^T\right)^T \\ \\ &= \left(U\Sigma V^T\right)\left(V\Sigma^TU^T\right) \\ \\ &= U\Sigma\left(V^TV\right)\Sigma^TU^T \\ \\ &= U\Sigma \Sigma^TU^T \\ \\ &= U\begin{bmatrix} \sigma_1^2 &\ &\ &\ &\ &\ \\ \ &\ \ddots &\ &\ &\ \\ \ &\ &\ \sigma_r^2 &\ &\ \\ \ &\ &\ &\ 0 &\ &\ \\ \ &\ &\ &\ &\ \ddots &\ \\ \ &\ &\ &\ &\ &\ 0 \end{bmatrix}U^T \end{align*} $$

The columns of orthogonal matrix $U$ are the eigenvectors of $AA^T$.
The singular values $\sigma_1$, $\dots$, $\sigma_n$ are unique; however, the matrices $U$ and $V$ are not.

Theorem 7.5.3¶

If $A$ is a real $m\times n$ matrix with singular value decomposition $A = U\Sigma V^T$, then

The first $r$ columns of matrix $U$ comprise a basis for the column space of matrix $A$, $C(A)$.
The last $m-r$ columns of matrix $U$ comprise a basis for the null space of $A^T$, $N(A^T)$.
The first $r$ columns of matrix $V$ comprise a basis for the row space of matrix $A$, $C(A^T)$.
The last $n-r$ columns of matrix $V$ comprise a basis for the null space of $A$, $N(A)$.

7.5.6 Computing the Singular Value Decomposition¶

Example 1¶

Consider the matrix $A = \begin{bmatrix}\ \ 1\ &\ \ 1\ \\ \ \ 1\ &\ \ 1\ \\ \ \sqrt{2}\ & -\sqrt{2} \end{bmatrix}$

If $A$ is $m\times n$, and $m < n$, compute the singular value decomposition of $A^T$ using this method and transpose your result.

Since matrix $A$ is $m\times n$ and $m \ge n$,

1. Compute an orthogonal diagonalization for $A^TA$.¶

$$ A^TA = \begin{bmatrix}\ \ 1\ &\ \ 1\ &\ \sqrt{2} \\ \ \ 1\ &\ \ 1\ & -\sqrt{2} \end{bmatrix}\begin{bmatrix}\ \ 1\ &\ \ 1\ \\ \ \ 1\ &\ \ 1\ \\ \ \sqrt{2}\ & -\sqrt{2} \end{bmatrix} = \begin{bmatrix} 4\ & 0\ \\ 0\ &\ 4 \end{bmatrix} $$

In this example matrix $A^TA$ is already a diagonal matrix. The eigenvalue $\lambda=4$ is an eigenvalue with algebraic multiplicity two. The matrix $A^TA-4I$ is the zero matrix so the null space of $A^TA-4I$ is all of $\mathbb{R}^2$. An orthonormal basis for $N(A^TA)$ is given by $\left\{\mathbf{e}_1, \mathbf{e}_2\right\}$ and

$$ V = \begin{bmatrix} 1 &\ 0 \\ 0 &\ 1 \end{bmatrix} $$

If the basis you obtain is not an orthonormal basis, then you must use Gram-Schmidt to obtain an orthonormal basis for the columns of $V$. This primarily occurs when one finds an eigenvalue of $A^TA$ with algebraic multiplicity greater than one.

2. Compute the singular values of $A$ and the factor $\Sigma$.¶

The singular values for matrix $A$ are $\sigma_1 = \sqrt{4} = 2$ and $\sigma_2 = \sqrt{4}= 2$.

Since $A$ is a $3\times 2$ matrix our diagonal matrix $\Sigma$ must also be a $3\times 2$ matrix. We usually write the diagonalization of $A^TA$ as $D$

$$ D = \begin{bmatrix} 4 & 0 \\ 0 & 4 \end{bmatrix}, $$

and our diagonal matrix of singular values $\Sigma'$

$$ \Sigma' = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} $$

To create matrix $\Sigma$ we must add enough rows of zeros so that $\Sigma$ has the same dimensions as our original matrix $A$.

$$ \Sigma = \begin{bmatrix} 2 & 0 \\ 0 & 2 \\ 0 & 0 \end{bmatrix}\in\mathbb{R}^{3\times 2} $$

3. Compute the first $r = \text{rank}(A)$ columns of $m\times m$ factor $U$.¶

In order to compute the first few columns of matrix $U$ we start with our algebraic expression for the singular value decomposition

$$ A = U\Sigma V^T $$

As matrix $V$ is orthogonal, $V^T=V^{-1}$. Multiply both sides of this equation by $V$ yields

$$ \begin{align*} AV &= U\Sigma \\ \\ A\begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \dots & \mathbf{v}_n \end{bmatrix} &= \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 & \dots & \mathbf{u}_m \end{bmatrix}\begin{bmatrix} \sigma_1 & 0 & 0 & \dots & 0 & \dots & 0 \\ 0 & \sigma_2 & 0 & \dots & 0 & \dots & 0 \\ 0 & 0 & \ddots & 0 & 0 & \dots & 0 \\ 0 & 0 & \dots & \sigma_r & 0 & \dots & 0 \\ \vdots & \vdots & \vdots & 0 & \ddots & \dots & 0 \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \end{bmatrix} \\ \\ &= \begin{bmatrix} \sigma_1\mathbf{u}_1 & \sigma_2\mathbf{u}_2 & \dots & \sigma_r\mathbf{u}_r & 0\mathbf{u}_{r+1} & \dots & 0\mathbf{u}_n \end{bmatrix} \end{align*} $$

Remember that we may or may not have columns of zeros on the right, or rows of zeros on the bottom of matrix $\Sigma$. That depends on the number of zero eigenvalues $A^TA$ and $AA^T$, respectively. We have $r\le n\le m$ so there may not be vectors $0\mathbf{u}_k$ for $r+1\le k\le m$.

In any case, for the first $r$ nonzero singular values we have for $1\le k\le r$,

$$ A\mathbf{v}_k = \sigma_k\mathbf{u}_k $$

Since $1\le k\le r$, $\sigma_k\neq 0$, so

$$ \begin{align*} \mathbf{u}_k &= \frac{1}{\sigma_k}A\mathbf{v}_k \\ \\ \mathbf{u}_1 &= \frac{1}{\sigma_1}A\mathbf{v}_1 = \frac{1}{2}\begin{bmatrix}\ \ 1\ &\ \ 1\ \\ \ \ 1\ &\ \ 1\ \\ \ \sqrt{2}\ & -\sqrt{2} \end{bmatrix}\begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{\sqrt{2}} \end{bmatrix} \\ \\ \mathbf{u}_2 &= \frac{1}{\sigma_2}A\mathbf{v}_2 = \frac{1}{2}\begin{bmatrix}\ \ 1\ &\ \ 1\ \\ \ \ 1\ &\ \ 1\ \\ \ \sqrt{2}\ & -\sqrt{2} \end{bmatrix}\begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2} \\ -\frac{1}{\sqrt{2}} \end{bmatrix} \end{align*} $$

Notice that these vectors are already orthonormal.

4. Compute the remaining column vectors of matrix $U$, if any.¶

The remaining column vectors of $U$ are an orthonormal basis for $N\left(AA^T\right) = N\left(A^T\right)$.

$$ \begin{align*} A^T &= \begin{bmatrix} 1 & 1 & \sqrt{2} \\ 1 & 1 & -\sqrt{2} \end{bmatrix}\rightarrow\begin{bmatrix} 1 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \\ \\ \xi_1 &= -t \\ \xi_2 &= t\in\mathbb{R}\qquad\text{$\xi_2$ is a free variable} \\ \xi_3 &= 0 \\ \\ \mathbf{\xi} &\in\text{Span}\left\{ \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} \right\} \end{align*} $$

However, this is not a unit vector,

$$ \mathbf{u}_3 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{bmatrix} $$

5. Our Singular Value Decomposition¶

$$ \begin{align*} U\Sigma V^T &= \begin{bmatrix} \frac{1}{2} & \frac{1}{2} & \frac{1}{\sqrt{2}} \\ \frac{1}{2} & \frac{1}{2} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \end{bmatrix}\begin{bmatrix} 2 & 0 \\ 0 & 2 \\ 0 & 0 \end{bmatrix}\begin{bmatrix} 1 &\ 0 \\ 0 &\ 1 \end{bmatrix}^T \end{align*} $$

7.5.7 The Four Fundamental Subspaces¶

Determine the rank and bases for the four fundamental subspaces of matrix $A$ in Example 1.¶

The rank of matrix $A$ is the number of nonzero singular values.

$$ \text{rank}(A) = 2 $$

We have

$$ A\mathbf{x} = U\Sigma V^T\mathbf{x} = \mathbf{b} $$

Since matrices $U$ and $V$ are orthogonal matrices, they represent rotations and/or reflections in the codomain and domain, respectively. They are in fact also change of variables or transition matrices. Matrix $\Sigma$ is a diagonal matrix so it represents a scaling linear transformation. It dilates or contacts vectors in the domain and embeds them in the higher dimensional vector space of the codomain.

If our original matrix $A$ was under determined, then we compute the singular value decomposition of $A^T$. This yields

$$ A^T = U\Sigma V^T $$

so,

$$ A = \left(A^T\right)^T = \left(U\Sigma V^T\right)^T = V\Sigma^T U^T $$

Thus matrix $\Sigma^T$ scales vectors from the domain and maps them to lower dimensional vectors in the codomain. We are obliged to rename each of the matrices so that we will in general denote our singular value decomposition

$$ A = U\Sigma V^T $$

so that

matrix $U:\mathbb{R}^m\rightarrow\mathbb{R}^m$ is a transition matrix on the codomain,
matrix $V:\mathbb{R}^n\rightarrow\mathbb{R}^n$ is a transition matrix on the domain, and
matrix $\Sigma:\mathbb{R}^n\rightarrow\mathbb{R}^n$ is a diagonal, or scaling from $\mathbb{R}^n$ to $\mathbb{R}^m$.

Furthermore, the first $r = \text{rank}(A)$ columns of matrix $U$ is an orthonormal basis for the column space of $A$, $C(A)$.

$$ \begin{align*} C(A) &= \text{Span}\left\{\mathbf{u}_1,\ \mathbf{u}_2,\ \dots,\ \mathbf{u}_r \right\} = \text{Span}\left\{\begin{bmatrix}\ \frac{1}{2}\ \\ \ \frac{1}{2}\ \\ \ \frac{1}{\sqrt{2}}\ \end{bmatrix},\ \begin{bmatrix}\ \frac{1}{2}\ \\ \ \frac{1}{2}\ \\ -\frac{1}{\sqrt{2}}\ \end{bmatrix} \right\} \end{align*} $$

The remaining $m-r$ (if any) columns of matrix $U$ form a basis for the null space of $A^T$.

$$ \begin{align*} N(A^T) &= \text{Span}\left\{ \mathbf{u}_{r+1},\ \dots,\ \mathbf{u}_m \right\} = \text{Span}\left\{ \begin{bmatrix}\ \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ \ \ 0 \end{bmatrix} \right\} \end{align*} $$

The first $r$ columns of matrix $V$ (not $V^T$!) form a basis for the row space of matrix $A$.

$$ \begin{align*} \\ C(A^T) &= \text{Span}\left\{ \mathbf{v}_1,\ \mathbf{v}_2,\ \dots,\ \mathbf{v}_r \right\} = \text{Span}\left\{\begin{bmatrix} 1 \\ 0 \end{bmatrix},\ \begin{bmatrix} 0 \\ 1 \end{bmatrix} \right\} \end{align*} $$

The remaining $n-r$ (if any) columns of matrix $V$ form a basis for the null space of $A$.

$$ N(A) = \left\{ \mathbf{0} \right\} $$

7.5.8 Applications¶

One of the most common applications of the singular value decomposition are robotics, data science, data compression, control systems, and statistical analysis. The singular value decomposition decomposes a linear transformation into a composition of a basis transition in the domain composed with scaling, and composed with a basis transition in the codomain. This factorization identifies the principle natural basis (or axis) like the diagonalization. It also shows the values of the scaling along each principle vector like the diaonalization. This allows us to identify the vectors or phenomena where that are scaled the most, as well as the vectors or phenomena that contribute the least effect on the result of the linear transformation.

Let us consider a inner product on the vector space $\mathbb{R}^{m\times n}$ of $m\times n$ matrices. The most familiar inner product would look like the Euclidean norm on $\mathbb{R}^n$. For $A,B\in\mathbb{R}^{m\times n}$,

$$ \langle A,B \rangle := \displaystyle\sum_{j=1}^m\displaystyle\sum_{k=1}^n a_{ij}b_{ij} $$

The corresponding norm would be

$$ \left\|A\right\|_F := \langle A, A \rangle^{\frac{1}{2}} = \left(\displaystyle\sum_{j=1}^m\displaystyle\sum_{k=1}^n a_{ij}^2 \right)^{\frac{1}{2}} $$

The subscript indicates the name of a mathematician. This norm is called the Frobenius norm on $\mathbb{R}^{m\times n}$. We have not had time to discuss norms on vector spaces of linear transformations much. We did discuss the Euclidean or $2$-norm on the vector space $\mathbb{R}^n$ given by

$$ \left\|\mathbf{v}\right\|_2 = \langle \mathbf{v},\mathbf{v} \rangle^{\frac{1}{2}} = \left( \displaystyle\sum_{k=1}^n v_k^2 \right)^{\frac{1}{2}} $$

We can combine these definitions to obtain the identity

$$ \left\|A\right\|_F^2 = \displaystyle\sum_{j=1}^m\displaystyle\sum_{k=1}^n a_{ij}^2 = \displaystyle\sum_{j=1}^m \left( \displaystyle\sum_{k=1}^n a_{ij}^2 \right) = \displaystyle\sum_{j=1}^m \left\|\mathbf{a}_j\right\|_2^2 $$

If we recall that matrix $U$ of our singular value decomposition is an orthogonal matrix we have

$$ \left\|U\Sigma\right\|_F^2 = \left( \displaystyle\sum_{j=1}^n \left\|U\mathbf{\sigma}_j\right\|_2^2 \right)^2 = \left( \displaystyle\sum_{j=1}^n \left\|\mathbf{\sigma}_j \right\|_2^2 \right)^2 = \left\|\Sigma\right\|_F^2 $$

Likewise matrix $V$ is an orthogonal matrix so,

Moreover, since $\Sigma$ and $\Sigma^T$ are diagonal matrices,

$$ \left\|\Sigma\right\|_F^2 = \displaystyle\sum_{j=1}^r \sigma_j^2 = \left\|\Sigma^T\right\|_F^2 $$

Hence we have the identity

$$ \left\|A\right\|_F = \left\|U\Sigma V^T\right\|_F = \left\|\Sigma V^T\right\|_F = \left\|\Sigma\right\|_F = \left(\displaystyle\sum_{j=1}^r \sigma_j^2\right)^{\frac{1}{2}} $$

Example 2¶

Consider an upper triangular matrix whose diagonal elements are $1$'s,

$$ A = \begin{bmatrix} 1 & -1 & -1 & \dots & -1 & -1 \\ 0 & 1 & -1 & \dots & -1 &-1 \\ 0 & 0 & 1 & \dots & -1 & -1 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 & -1 \\ 0 & 0 & 0 & \dots & 0 & 1 \end{bmatrix} $$

Notice that the determinant of matrix $A$ is $1$, however if $n$ is very large, this matrix is almost singular. We mean that it is very close to a singular matrix. To see this we compute the residual or distance from matrix $A$ to matrix

$$ B = \begin{bmatrix} 1 & -1 & -1 & \dots & -1 & -1 \\ 0 & 1 & -1 & \dots & -1 &-1 \\ 0 & 0 & 1 & \dots & -1 & -1 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 & -1 \\ -\left(\frac{1}{2}\right)^{n-2} & 0 & 0 & \dots & 0 & 1 \end{bmatrix} $$

Matrix $B$ is singular because

$$ \begin{align*} B\begin{bmatrix} 2^{n-2} \\ 2^{n-3} \\ \vdots \\ 2^0 \\ 1 \end{bmatrix} &= \begin{bmatrix} 1 & -1 & -1 & \dots & -1 & -1 \\ 0 & 1 & -1 & \dots & -1 &-1 \\ 0 & 0 & 1 & \dots & -1 & -1 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 & -1 \\ -\left(\frac{1}{2}\right)^{n-2} & 0 & 0 & \dots & 0 & 1 \end{bmatrix}\begin{bmatrix} 2^{n-2} \\ 2^{n-3} \\ \vdots \\ 2^0 \\ 1 \end{bmatrix} \\ \\ &= \begin{bmatrix} 2^{n-2} - 2^{n-3} - 2^{n-4} - \dots - 2^0 - 1 \\ 0\ \ \ \ + 2^{n-3} - 2^{n-4} - \dots - 2^0 - 1 \\ 0\ \ \ \ + 0\ \ \ \ + 2^{n-4} - \dots -2^0 - 1 \\ \ddots \\ 0\ \ \ \ + 0\ \ \ \ + 0\ \ \ \ + \dots + 1\ \ - 1 \\ -1\ \ \ \ + 0\ \ \ \ + 0\ \ \ \ + \dots + 0\ \ + 1 \end{bmatrix} = \mathbf{0} \end{align*} $$

Thus $B\mathbf{x}=\mathbf{0}$ has a nontrivial solution and $B$ is a singular matrix. The distance from matrix $A$ to matrix $B$ is

$$ \left\|A - B\right\| = \frac{1}{2^{n-2}} $$

As $n$ gets very large, the distance between matrix $A$ and a singular matrix becomes very small.

7.5.9 Lower Rank Approximations¶

Example 3¶

Suppose that our matrix $A$ has the following singular value decomposition

$$ A = U\Sigma V^T = U\begin{bmatrix} 100 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & .001 & \dots & 0 \\ 0 & 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & \dots & 10^{-12} \end{bmatrix}V^T $$

Notice that the principle dilation has a factor of 100. The scaling of the other principle vectors (columns of $V$) are the unit and large contractions. We can approximate matrix $A$ with a matrix with rank 1,

$$ \begin{align*} B &= U\Sigma_1V^T = U\begin{bmatrix} 100 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & \dots & 0 \end{bmatrix}V^T \\ \\ &= 100\cdot U\begin{bmatrix} 1 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & \dots & 0 \end{bmatrix}V^T \\ \\ &= 100\cdot U\begin{bmatrix} \mathbf{v}_1^T \\ \mathbf{0}^T \\ \mathbf{0}^T \\ \vdots \\ \mathbf{0}^T \end{bmatrix} = 100\cdot\mathbf{u}_1\mathbf{v}_1^T = \sigma_1\mathbf{u}_1\mathbf{v}_1^T \\ \end{align*} $$

The $m\times n$ matrix product $\ \mathbf{u}\mathbf{v}^T$ is called the outer product of vectors $\mathbf{u}$ and $\mathbf{v}$.

Recall that the scalar $\mathbf{v}^T\mathbf{u} = \langle \mathbf{u},\mathbf{v} \rangle$ is the inner product, or scalar product.

The distance between $A$ and $B$ becomes

$$ \begin{align*} \left\|A-B\right\| &= \left\|U\Sigma V^T - U\Sigma_1V^T\right\| = \left\|U(\Sigma - \Sigma_1)V^T\right\| \\ \\ &= \left\|\Sigma - \Sigma_1\right\| = \left( 1^2 + .001^2 + \dots + 10^{-24} \right)^{\frac{1}{2}} \approx 1 \end{align*} $$

In general, for $1\le k < r$, the rank of matrix $A$, the closest rank-$k$ approximation is given by

$$ B = \displaystyle\sum_{j=1}^k \sigma_j\mathbf{u}_j\mathbf{v}_j^T $$

7.5.10 Principle Component Analysis¶

For matrix $A$ with singular value decomposition $A = U\Sigma V^T$, one attempts to increase performance of data analysis of a system with a large number of inputs. If the system has $n$ inputs and $m$ outputs, our matrix $A\in\mathbb{R}^{n\times n}$ represents the measured outputs for each input. These inputs can have noise associated with their measurement. This would provide a small amount of correlation from an input only due to noise, when the input really doesn't affect the output at all.

The singular value decomposition identifies the inputs in matrix $V$ by their singular value. Inputs that provide the most information about the output will have a corresponding large associated singular value. Inputs with negligible affect on the output will correspond to a small singular value.

The covariance matrix $S = \frac{1}{n-1}A^TA$, and the variance of a sample of input/output measurement $\mathbf{y}=A\mathbf{x}$ has the value

$$ \text{var}(\mathbf{y}) := \frac{(A\mathbf{x})^T(A\mathbf{x})}{n-1} = \frac{\mathbf{x}^TA^TA\mathbf{x}}{n-1} = \mathbf{x}^TS\mathbf{x} $$

In order to maximize the variance over all unit vectors in the domain of $A$, we choose a unit vector associated with the largest eigenvalue $\lambda_1$ of matrix $A^TA$. This is the vector $\mathbf{v}_1$, the first right singular vector, and first column of matrix $V$ in the singular value decomposition associated with the singular value $\sigma_1 = \sqrt{\lambda_1}$.

The corresponding left singular vector $\mathbf{u}_1$ gives us our first principle component vector,

$$ \mathbf{y}_1 = A\mathbf{v}_1 = \sigma_1\mathbf{u}_1 $$

with variance

$$ \text{var}(\mathbf{y}_1) = \frac{(A\mathbf{v}_1)^TA\mathbf{v}_1}{n-1} = \frac{\mathbf{v}_1^TA^TA\mathbf{v}_1}{n-1} = \frac{\mathbf{v}_1^T\lambda_1\mathbf{v}_1}{n-1} = \frac{\lambda_1\|\mathbf{v}_1\|^2}{n-1} = \frac{\lambda_1}{n-1} = \frac{\sigma_1^2}{n-1} $$

To find the second principle component vector, one need find a unit vector orthogonal to $\mathbf{y}_1$ that maximizes the variance. The second right singular vector yields $y_2 = \sigma_2\mathbf{u}_2$. This vector maximizes the remaining variance and they are orthogonal as

$$ \langle\mathbf{y}_1,\mathbf{y}_2\rangle = \langle\sigma_1\mathbf{u}_1,\sigma_2\mathbf{u}_2\rangle = \sigma_1\sigma_2\langle\mathbf{u}_1\mathbf{u}_2\rangle = \sigma_1\sigma_2\cdot 0 = 0 $$

In general, the singular value decomposition solves the principle component analysis problem for all principle component vectors.

Your use of this self-initiated mediated course material is subject to our $An international nonprofit organization that empowers people to grow and sustain the thriving commons of shared knowledge and culture.$ Creative Commons License 4.0