Wichita State University Logo

Math 511: Linear Algebra

7.5 Singular Value Decomposition


7.5.1 The Important Pieces of a Matrix

$\require{color}$ $\definecolor{brightblue}{rgb}{.267, .298, .812}$ $\definecolor{darkblue}{rgb}{0.0, 0.0, 1.0}$ $\definecolor{palepink}{rgb}{1, .73, .8}$ $\definecolor{softmagenta}{rgb}{.99,.34,.86}$ $\def\ihat{\mathbf{\hat{\unicode{x0131}}}}$ $\def\jhat{\mathbf{\hat{\unicode{x0237}}}}$ $\def\khat{\mathbf{\hat{k}}}$ $\def\tombstone{\unicode{x220E}}$

When analyzing our linear mathematical models we often need to describe an elaborate linear transformation in terms of a composition of simple linear transformations. The three simple linear transformation are:

  1. A rotation in $\mathbb{R}^n$ represented by an orthogonal matrix.


  2. A reflection with respect to a hyperplane is also represented by an orthogonal matrix.

    Reflection in three dimensional space with respect to the xy-plane.
  3. A scaling , that is a transformation that dilates or contracts the magnitude of vectors along a set of axes. A diagonal matrix represents as scaling transformation. (A diagonal matrix may be orthogonal too.)

    Scaling the unit sphere by one half along the x-axis, and by two along both the y-axis and z-axis.

7.5.2 Generalizing Diagonalization


Definition of a Complete Set of Orthonormal Eigenvectors

A matrix $A\in\mathbb{C}^{n\times n}$ that has $n$ orthonormal eigenvectors associated with its eigenvalues is said to have a complete set of orthonormal eigenvectors .

This is an important property of a matrix. We will use this to construct the Singular Value Decomposition.

Definition of Diagonalizable

Matrix $A\in\mathbb{C}^{n\times n}$ is called diagonalizable if there exists a nonsingular matrix $X$ and a diagonal matrix $D$ such that $$ X^{-1}AX = D $$

We say that matrix $X$ diagonalizes matrix $A$.

Definition of skew-symmetric and skew-Hermitian matrices

A matrix $A\in\mathbb{C}^{n\times n}$ is called skew-symmetric if and only if $A^T = -A$. A matrix $A\in\mathbb{C}^{n\times n}$ is called skew-Hermitian if and only if $A^H = -A$.

Recall that a matrix that is equal to its own transpose $A = A^T$ is called symmetric, and a matrix that is equal to its own Hermitian $A = A^H$ is called an Hermitian matrix.

Definition of Normal Matrices

Matrix $A\in\mathbb{R}^{n\times n}$ is an $n\times n$ is called a normal matrix if $AA^H = A^HA$.

A normal matrix commutes with its Hermitian counterpart. We discuss normal matrices because there are non-Hermitian matrices that possess a complete set of orthonormal eigenvectors. Skew-symmetric and skew-Hermitian matrices have a complete set of orthonormal eigenvectors.

Theorem 7.5.1

A matrix $A\in\mathbb{R}^{n\times n}$ is normal if and only if it possess a complete orthonormal set of eigenvectors .

Proof

$\Longleftarrow$

Any matrix with a complete set of orthonormal eigenvectors $\left\{\mathbf{\xi}_1,\ \mathbf{\xi}_2,\ \dots,\ \mathbf{\xi}_n\right\}$ has a unitary diagonalizing matrix $U$ whose columns are the $n$ orthonormal eigenvectors,

$$ \begin{align*} U &= \begin{bmatrix} \mathbf{\xi}_1 & \mathbf{\xi}_2 & \dots & \mathbf{\xi}_n \end{bmatrix} \\ \\ A &= UDU^H \end{align*} $$
However it is not necessary that the diagonal matrix be Hermitian. Most diagonalizable matrices are not Hermitian. They have a diagonalizing matrix $X$, but the eigenvalues will not all be real. that is $A^H\neq A$. If the eigenvalues are not real, then the diagonal matrix will not be Hermitian.

$$ D^H \neq D $$
Furthermore, the eigenvectors of distinct eigenvalues will be linearly independent, but not necessarily orthogonal. This means that the diagonalizing matrix $X$ will not be a Hermitian matrix.

$$ A^H = \left(XDX^{-1}\right)^H = \left(X^{-1}\right)^HD^HX^H \neq A $$
For a Hermitian matrix we have $A^H=A$ so

$$ A^H = \left(XDX^{-1}\right)^H = \left(X^{-1}\right)^HD^HX^H {\color{red}\mathbf{=}} A = XDX^{-1} $$

$$ A^H = \left(UDU^H\right)^H = U^HD^HU \neq A $$
In this case, for all three factors

$$ \begin{align*} X^H &= X^{-1} \\ D^H &= D \\ \left(X^{-1}\right)^H &= X \end{align*} $$
Our diagonalizing matrix will produce a diagonal matrix with possibly imaginary eigenvalues on the diagonal, however we will have that the product

$$ \begin{align*} D^HD &= \begin{bmatrix} \lambda_1 & & & & \\ & \lambda_2 & & & \\ & & \dots & & \\ & & & & \lambda_n \end{bmatrix}^H \begin{bmatrix} \lambda_1 & & & & \\ & \lambda_2 & & & \\ & & \dots & & \\ & & & & \lambda_n \end{bmatrix} \\ \\ &= \begin{bmatrix} \lambda_1^H & & & & \\ & \lambda_2^H & & & \\ & & \dots & & \\ & & & & \lambda_n^H \end{bmatrix} \begin{bmatrix} \lambda_1 & & & & \\ & \lambda_2 & & & \\ & & \dots & & \\ & & & & \lambda_n \end{bmatrix} \\ \\ &= \begin{bmatrix} |\lambda_1|^2 & & & & \\ & |\lambda_2|^2 & & & \\ & & \dots & & \\ & & & & |\lambda_n|^2 \end{bmatrix} = DD^H \end{align*} $$
We see that even though $D$ is not Hermitian, it is a diagonal matrix and when we multiply $D$ with its complex conjugate, transpose (Hermitian) we obtain a real diagonal matrix with the norms of each eigenvalue on the diagonal. Recall that usually matrices do not commute . This means that for matrices $B$ and $C$, $BC\neq CB$. However our diagonal matrices of eigenvalues do commut. Hence However,

$$ \begin{align*} AA^H &= \left(UDU^H\right)\left(UDU^H\right)^H = UDU^HUD^HU^H = U{\color{red}DD^H}U^H \\ \\ &= U{\color{red}D^HD}U^H = \left(UD^HU^H\right)\left(UDU^H\right) = \left(UDU^H\right)^H\left(UDU^H\right) = A^HA \end{align*} $$
So even though $D$ and $A$ are not Hermitian, we still have shown that $AA^H = A^HA$ and matrix $AS is normal.

$\Longrightarrow$

If $n\times n$ matrix $A$ is normal, then $AA^H = A^HA$. Using the Schur Decomposition Theorem 6.4.3, we have that there is a unitary matrix $U$ such that $T = U^HAU$ is an upper triangular matrix. Hence

$$ T^HT = U^HA^HUU^HAU = U^HA^HAU = U^HAA^HU = U^HAUU^HA^HU = TT^H $$

Thus matrix $T$ is a normal matrix as well. If

$$ T = \begin{bmatrix} t_{11} & t_{12} & \dots & t_{1n} \\ 0 & t_{21} & \dots & t_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & t_{nn} \end{bmatrix}, $$

then

$$ \begin{align*} T^HT &= \begin{bmatrix} t_{11} & t_{12} & t_{13} & \dots & t_{1n} \\ 0 & t_{22} & t_{23} & \dots & t_{2n} \\ 0 & 0 & t_{33} & \dots & t_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & t_{nn} \end{bmatrix}^H \begin{bmatrix} t_{11} & t_{12} & t_{13} & \dots & t_{1n} \\ 0 & t_{22} & t_{23} & \dots & t_{2n} \\ 0 & 0 & t_{33} & \dots & t_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & t_{nn} \end{bmatrix} \\ \\ &= \begin{bmatrix} t_{11} & t_{12} & t_{13} & \dots & t_{1n} \\ 0 & t_{22} & t_{23} & \dots & t_{2n} \\ 0 & 0 & t_{33} & \dots & t_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & t_{nn} \end{bmatrix} \begin{bmatrix} t_{11} & t_{12} & t_{13} & \dots & t_{1n} \\ 0 & t_{22} & t_{23} & \dots & t_{2n} \\ 0 & 0 & t_{33} & \dots & t_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & t_{nn} \end{bmatrix}^H = TT^H \\ \end{align*} $$

Comparing the diagonal elements of each product we have

$$ \begin{align*} |t_{11}|^2 + |t_{12}|^2 + |t_{13}|^2 + \cdots + |t_{1n}|^2 &= |t_{11}|^2 \\ |t_{22}|^2 + |t_{23}|^2 + \cdots + |t_{2n}|^2 &= |t_{12}|^2 + |t_{22}|^2 \\ |t_{33}|^2 + \cdots + |t_{3n}|^2 &= |t_{13}|^2 + |t_{23}|^2 + |t_{33}|^2 \\ &\vdots \\ |t_{nn}|^2 &= |t_{1n}|^2 + |t_{2n}|^2 + |t_{3n}|^2 + |t_{nn}|^2 \\ \end{align*} $$

Subtracting $|t_{kk}|^2$ from both sides of equation $k$, for $1\le k\le n$ yields

$$ \begin{align*} |t_{12}|^2 + |t_{13}|^2 + \cdots + |t_{1n}|^2 &= 0 \\ |t_{23}|^2 + \cdots + |t_{2n}|^2 &= |t_{12}|^2 \\ \cdots + |t_{3n}|^2 &= |t_{13}|^2 + |t_{23}|^2 \\ &\vdots \\ 0 &= |t_{1n}|^2 + |t_{2n}|^2 + |t_{3n}|^2 + \cdots + |t_{(n-1)n}|^2 \\ \end{align*} $$

From these equations we conclude that for all $1\le i\neq j\le n$, $t_{ij}=0$. Therefore $T$ is a diagonal matrix and $U$ diagonalizes $A$. We have using matrix multiplication

$$ \begin{align*} AU &= A\begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 & \dots & \mathbf{u}_n \end{bmatrix} \\ \\ &= \begin{bmatrix} A\mathbf{u}_1 & A\mathbf{u}_2 & \dots & A\mathbf{u}_n \end{bmatrix} \\ \\ &= UT = \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 & \dots & \mathbf{u}_n \end{bmatrix} \begin{bmatrix} t_{11} & 0 & \dots & 0 \\ 0 & t_{22} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & t_{nn} \end{bmatrix} \\ \\ &= \begin{bmatrix} t_{11}\mathbf{u}_1 & t_{22}\mathbf{u}_2 & \dots & t_{nn}\mathbf{u}_n \end{bmatrix} \end{align*} $$

Thus for $1\le j\le n$ we have $A\mathbf{u}_j = t_{jj}\mathbf{u}_j$, and $\left\{ \mathbf{u}_1,\ \mathbf{u}_2,\ \dots,\ \mathbf{u}_n\right\}$ is an orthonormal set of vectors because $U$ is a unitary matrix. Hence $\left\{ \mathbf{u}_1,\ \mathbf{u}_2,\ \dots,\ \mathbf{u}_n\right\}$ is a complete set of orthonormal eigenvectors. $\tombstone$

This means that the linear transformation represented by a matrix normal matrix $A$ is a composition of a reflection or rotation, followed by a scaling transformation, and a reflection or rotation.

However not all matrices are square matrices, normal matrices, or Hermitian .

7.5.3 Singular Value Decomposition


Definition of Positive Definite

A square matrix whose eigenvalues are all non-negative is called a positive semi-definite or positive definite matrix.

Let $A\in\mathbb{R}^{m\times n}$ be a real $m\times n$ matrix. While this matrix may not be Hermitian, symmetric, or even square, the matrix $A^TA\in\mathbb{R}^{n\times n}$ is a real and symmetric. Furthermore the eigenvalues of $A^TA$ are nonnegative, and the eigenvectors of distinct eigenvalues are orthogonal. So there is an orthogonal matrix $V$ that diagonalizes matrix $A^TA$,

Since all of the eigenvalues of matrix $A^TA$ are non-negative, $A^TA$ is a positive definite matrix.

$$ A^TA = V\Lambda V^T = \begin{bmatrix} \mathbf{v}_1 &\ \mathbf{v}_2 &\ \dots &\ \mathbf{v}_n \end{bmatrix}\begin{bmatrix} \sigma_1 &\ 0 &\ \dots &\ 0 \\ 0 &\ \sigma_2 &\ \dots &\ 0\ \\ 0 &\ 0 &\ \ddots &\ 0 \\ 0 &\ 0 &\ \dots &\ \sigma_m \end{bmatrix}\begin{bmatrix} \mathbf{v}_1^T \\ \mathbf{v}_2^T \\ \vdots \\ \mathbf{v}_n^T \end{bmatrix} $$


Likewise matrix $AA^T\in\mathbb{R}^{m\times m}$ is a real symmetric, positive definite matrix. The nonzero eigenvalues of $AA^T$ are real and nonnegative, and the eigenvectors of distinct eigenvalues are orthogonal. Thus there is an orthogonal diagonalizing matrix $U$ such that,

$$ AA^T = U\Lambda' U^T = \begin{bmatrix} \mathbf{u}_1 &\ \mathbf{u}_2 &\ \dots &\ \mathbf{u}_n \end{bmatrix}\begin{bmatrix} \sigma_1' &\ 0 &\ \dots &\ 0 \\ 0 &\ \sigma_2' &\ \dots &\ 0\ \\ 0 &\ 0 &\ \ddots &\ 0 \\ 0 &\ 0 &\ \dots &\ \sigma_m' \end{bmatrix}\begin{bmatrix} \mathbf{u}_1^T \\ \mathbf{u}_2^T \\ \vdots \\ \mathbf{u}_m^T \end{bmatrix} $$
If $\sigma$ is a nonzero eigenvalue for $A^TA$ with associated eigenvector $\mathbf{x}$, then $A^TA\mathbf{x} = \sigma\mathbf{x}$. Multiplying both sides of this equation by matrix $A$ yields

$$ AA^T\left(A\mathbf{x}\right) = A\left(A^TA\mathbf{x}\right) = A\left(\sigma\mathbf{x}\right) = \sigma\left(A\mathbf{x}\right) $$
Thus nonzero value $\sigma$ is an eigenvalue of matrix $AA^T$ with associated eigenvector $A\mathbf{x}$. Thus the extra eigenvalues when $m\neq n$ are all zeros.

$$ \Lambda = \begin{bmatrix} \sigma_1 &\ &\ &\ \\ &\ \sigma_2 &\ &\ \\ &\ &\ \ddots &\ \\ &\ &\ &\ \sigma_r \\&\ &\ &\ &\ 0_{r+1} \\ &\ &\ &\ &\ &\ \ddots \\ &\ &\ &\ &\ &\ &\ 0_n \end{bmatrix}\neq \begin{bmatrix} \sigma_1 &\ &\ &\ \\ &\ \sigma_2 &\ &\ \\ &\ &\ \ddots &\ \\ &\ &\ &\ \sigma_r \\&\ &\ &\ &\ 0_{r+1} \\ &\ &\ &\ &\ &\ \ddots \\ &\ &\ &\ &\ &\ &\ 0_m \end{bmatrix} = \Lambda' $$
That is $\Lambda$ has $n-r$ zero eigenvalues along the diagonal and $\Lambda'$ has $m-r$ zero eigenvalues along the diagonal. $\Lambda = \Lambda'$ only when $m = n$.

Let us suppose for simplicity that matrix $A$ has full rank and $m\gt n$. That is all of the columns of $A$ are linearly independent. Then we want to find a set of orthogonal vectors $\left\{\mathbf{v}_1, \mathbf{v}_2,\dots,\mathbf{v}_n\right\}\in\mathbb{R}^n$ and orthogonal vectors $\left\{\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_m\right\}\in\mathbb{R}^m$ so that

$$ \begin{align*} A\mathbf{v}_1 &= \sigma_1\mathbf{u}_1 \\ \\ A\mathbf{v}_2 &= \sigma_2\mathbf{u}_2 \\ \\ &\ddots \\ \\ A\mathbf{v}_r &= \sigma_r\mathbf{u}_r \\ \\ A\mathbf{v}_{r+1} &= 0\mathbf{u}_{r+1} = \mathbf{0} \\ \\ &\ddots \\ \\ A\mathbf{v}_n &= 0\mathbf{u}_n = \mathbf{0} \end{align*} $$
Thus

$$ A\begin{bmatrix} \mathbf{v}_1 &\ \mathbf{v}_2 &\ \dots &\ \mathbf{v}_r \end{bmatrix} = \begin{bmatrix} \mathbf{u}_1 &\ \mathbf{u}_2 &\ \dots &\ \mathbf{u}_r \end{bmatrix}\begin{bmatrix} \sigma_1 &\ &\ &\ \\ \ &\ \sigma_2 &\ &\ \\ \ &\ &\ \ddots &\ \\ \ &\ &\ &\ &\ \sigma_r \end{bmatrix} $$
This results in

$$ AV' = U'\Sigma' $$
Here $\Sigma'$ is a $r\times r$ with nonzero singular values along the diagonal. We want to expand this equation to include an orthogonal basis for the null space of $AA^T$. Recall that $N(AA^T) = N(A^T)$. Thus if $\left\{\mathbf{u}_{r+1}, \dots, \mathbf{u}_m\right\}$ is a basis for $N(A^T)$, and we add enough rows of zeros to $\Sigma'$, then we well have our orthogonal matrix

$$ U = \begin{bmatrix} \mathbf{u}_1 &\ \dots &\ \mathbf{u}_r &\ \mathbf{u}_{r+1} &\ \dots &\ \mathbf{u}_m \end{bmatrix} $$
Similarly we expand the equation to include an orthogonal basis for the null space of $A^TA$. As $N(A^TA) = N(A)$, if $\left\{\mathbf{v}_{r+1}, \dots, \mathbf{v}_n\right\}$ is a basis for $N(A)$, and we add enough columns of zeros to $\Sigma'$, then we have our orthogonal matrix

$$ V = \begin{bmatrix} \mathbf{v}_1 &\ \dots &\ \mathbf{v}_r &\ \mathbf{v}_{r+1} &\ \dots &\ \mathbf{v}_n \end{bmatrix} $$
and our equation becomes


$$ AV = U\Sigma $$
where $m\times n$ matrix $\Sigma$ has the added rows and columns of zeros added to $\Sigma'$ for the null spaces of $A$ and $A^T$.

  1. Since the sets of vectors $\left\{\mathbf{u}_1, \dots, \mathbf{u}_r\right\}$ and $\left\{\mathbf{u}_{r+1}, \dots, \mathbf{u}_m\right\}$ are orthonormal bases for orthogonal subspaces of $\mathbb{R}^m$, we have that $\left\{\mathbf{u}_1, \dots, \mathbf{u}_m\right\}$ is an orthonormal set and matrix $U$ is an orthogonal matrix.

  2. Since the sets of vectors $\left\{\mathbf{v}_1, \dots, \mathbf{v}_r\right\}$ and $\left\{\mathbf{v}_{r+1}, \dots, \mathbf{v}_n\right\}$ are orthonormal bases for orthogonal subspaces of $\mathbb{R}^n$, we have that $\left\{\mathbf{v}_1, \dots, \mathbf{v}_n\right\}$ is an orthonormal set and matrix $V$ is an orthogonal matrix.

Multiplying both sides of $AV = U\Sigma$ by $V^{-1}$ results in

$$ A = U\Sigma V^T $$

7.5.4 The SVD Theorem

Theorem 7.5.1

The SVD Theorem

If $A\in\mathbb{R}^{m\times n}$, then $A$ has a singular value decomposition.

Proof

We showed in chapter 2 that $A^TA$ is a symmetric $n\times n$ matrix. Likewise $AA^T$ is an $m\times m$ symmetric matrix.

$$ \left(AA^T\right)^T = (A^T)^TA^T = AA^T\ \Large{\color{green}{\checkmark}} $$

$\tombstone$

If we multiply both sides of equation $A=U\Sigma V^T$ on the left by $A^T$ one obtains

$$ \begin{align*} A^TA &= \left(U\Sigma V^T\right)^T\left(U\Sigma V^T\right) \\ \\ &= \left(V\Sigma^TU^T\right)\left(U\Sigma V^T\right) \\ \\ &= V\Sigma^T\left(U^TU\right)\Sigma V^T \\ \\ &= V\Sigma^T\Sigma V^T \\ \\ &= V\begin{bmatrix} \sigma_1^2 &\ &\ &\ &\ &\ \\ \ &\ \ddots &\ &\ &\ \\ \ &\ &\ \sigma_r^2 &\ &\ \\ \ &\ &\ &\ 0 &\ &\ \\ \ &\ &\ &\ &\ \ddots &\ \\ \ &\ &\ &\ &\ &\ 0 \end{bmatrix}V^T \end{align*} $$
Therefore matrix $V$ is an orthogonal diagonalizing matrix for matrix $A^TA$.

  1. The columns of orthogonal matrix $V$ are the eigenvectors of $A^TA$.

  2. The set $\left\{\sigma_1^2, \dots, \sigma_n^2\right\}$ are the associated eigenvalues of $A^TA$.

  3. The singular values of matrix $A$ are the square roots of the eigenvalues of matrix $A^TA$.

If we multiply both sides of the equation $A = U\Sigma V^T$ on the right by $A^T$, then one obtains,

$$ \begin{align*} AA^T &= \left(U\Sigma V^T\right)\left(U\Sigma V^T\right)^T \\ \\ &= \left(U\Sigma V^T\right)\left(V\Sigma^TU^T\right) \\ \\ &= U\Sigma\left(V^TV\right)\Sigma^TU^T \\ \\ &= U\Sigma \Sigma^TU^T \\ \\ &= U\begin{bmatrix} \sigma_1^2 &\ &\ &\ &\ &\ \\ \ &\ \ddots &\ &\ &\ \\ \ &\ &\ \sigma_r^2 &\ &\ \\ \ &\ &\ &\ 0 &\ &\ \\ \ &\ &\ &\ &\ \ddots &\ \\ \ &\ &\ &\ &\ &\ 0 \end{bmatrix}U^T \end{align*} $$

  1. The columns of orthogonal matrix $U$ are the eigenvectors of $AA^T$.

  2. The singular values $\sigma_1$, $\dots$, $\sigma_n$ are unique; however, the matrix $U$ and $V$ are not.
  3. The first $r$ columns of matrix $U$ comprise a basis for the column space of matrix $A$, $C(A)$.

  4. The last $m-r$ columns of matrix $U$ comprise a basis for the null space of $A^T$, $N(A^T)$.

  5. The first $r$ columns of matrix $V$ comprise a basis for the row space of matrix $A$, $C(A^T)$.

  6. The last $n-r$ columns of matrix $V$ comprise a basis for the null space of $A$, $N(A)$.

7.5.5 Computing the SVD


Example 1


Consider the matrix $A = \begin{bmatrix}\ \ 1\ &\ \ 1\ \\ \ \ 1\ &\ \ 1\ \\ \ \sqrt{2}\ & -\sqrt{2} \end{bmatrix}$

If $A$ is $m\times n$, and $m < n$, compute the singular value decomposition of $A^T$ using this method and transpose your result.

Since matrix $A$ is $m\times n$ and $m \ge n$,

  1. Compute an orthogonal diagonalization for $A^TA$.

    $$ A^TA = \begin{bmatrix}\ \ 1\ &\ \ 1\ &\ \sqrt{2} \\ \ \ 1\ &\ \ 1\ & -\sqrt{2} \end{bmatrix}\begin{bmatrix}\ \ 1\ &\ \ 1\ \\ \ \ 1\ &\ \ 1\ \\ \ \sqrt{2}\ & -\sqrt{2} \end{bmatrix} = \begin{bmatrix} 4\ & 0\ \\ 0\ &\ 4 \end{bmatrix} $$
    Matrix $A^TA$ is already a diagonal matrix. The eigenvalue $\lambda=4$ is an eigenvalue with algebraic multiplicity two. The matrix $A^TA-4I$ is the zero matrix so the null space of $A^TA-4I$ is all of $\mathbb{R}^2$. An orthonormal basis for $N(A^TA)$ is given by $\left\{\mathbf{e}_1, \mathbf{e}_2\right\}$ and

    $$ V = \begin{bmatrix} 1 &\ 0 \\ 0 &\ 1 \end{bmatrix} $$
  • If the basis you obtain is not an orthonormal basis, then you must use Gram-Schmidt to obtain an orthonormal basis for the columns of $V$.
  • The columns of matrix $V$ are called right singular vectors .
  1. Compute the singular values of $A$ and the factor $\Sigma$.
    The singular values for matrix $A$ are $\sigma_1 = \sqrt{4} = 2$ and $\sigma_2 = \sqrt{4}= 2$.
    Since $A$ is a $3\times 2$ matrix our diagonal matrix $\Sigma$ must also be a $3\times 2$ matrix. We usually write the diagonalization of $A^TA$ as $D$
    $$ D = \begin{bmatrix} 4 & 0 \\ 0 & 4 \end{bmatrix}, $$
    and our diagonal matrix if singular values $\Sigma'$
    $$ \Sigma' = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} $$
    To create matrix $\Sigma$ we must add enough rows of zeros so that $\Sigma$ has the same dimensions as our original matrix $A$.
    $$ \Sigma = \begin{bmatrix} 2 & 0 \\ 0 & 2 \\ 0 & 0 \end{bmatrix}\in\mathbb{R}^{3\times 2} $$

Matrix $V^{-1} = V^T$ is an $n\times n$ matrix, so matrix $\Sigma$ must be $m\times n$ so that the product has the same dimensions as our original matrix $A$.

  1. Compute the first $r = \text{rank}(A)$ columns of $m\times m$ factor $U$.
    In order to compute the first few columns of matrix $U$ we start with our algebraic expression for the singular value decomposition
    $$ A = U\Sigma V^T $$
    As matrix $V$ is orthogonal, $V^T=V^{-1}$. Multiply both sides of this equation by $V$ yields
    $$ \begin{align*} AV &= U\Sigma \\ \\ A\begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \dots & \mathbf{v}_n \end{bmatrix} &= \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 & \dots & \mathbf{u}_m \end{bmatrix}\begin{bmatrix} \sigma_1 & 0 & 0 & \dots & 0 & \dots & 0 \\ 0 & \sigma_2 & 0 & \dots & 0 & \dots & 0 \\ 0 & 0 & \ddots & 0 & 0 & \dots & 0 \\ 0 & 0 & \dots & \sigma_r & 0 & \dots & 0 \\ \vdots & \vdots & \vdots & 0 & \ddots & \dots & 0 \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \end{bmatrix} \\ \\ &= \begin{bmatrix} \sigma_1\mathbf{u}_1 & \sigma_2\mathbf{u}_2 & \dots & \sigma_r\mathbf{u}_r & 0\mathbf{u}_{r+1} & \dots & 0\mathbf{u}_n \end{bmatrix} \end{align*} $$
    Remember that we may or may not have columns of zeros on the right, or rows of zeros on the bottom of matrix $\Sigma$. That depends on the number of zero eigenvalues $A^TA$ and $AA^T$ have, respectively. We have $r\le n\le m$ so there may not be vectors $0\mathbf{u}_k$ for $r+1\le k\le m$.
    In any case, for the first $r$ nonzero singular values we have for $1\le k\le r$,
    $$ A\mathbf{v}_k = \sigma_k\mathbf{u}_k $$
    Since $1\le k\le r$, $\sigma_k\neq 0$, so
    $$ \begin{align*} \mathbf{u}_k &= \frac{1}{\sigma_k}A\mathbf{v}_k \\ \\ \mathbf{u}_1 &= \frac{1}{\sigma_1}A\mathbf{v}_1 = \frac{1}{2}\begin{bmatrix}\ \ 1\ &\ \ 1\ \\ \ \ 1\ &\ \ 1\ \\ \ \sqrt{2}\ & -\sqrt{2} \end{bmatrix}\begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{\sqrt{2}} \end{bmatrix} \\ \\ \mathbf{u}_2 &= \frac{1}{\sigma_2}A\mathbf{v}_2 = \frac{1}{2}\begin{bmatrix}\ \ 1\ &\ \ 1\ \\ \ \ 1\ &\ \ 1\ \\ \ \sqrt{2}\ & -\sqrt{2} \end{bmatrix}\begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2} \\ -\frac{1}{\sqrt{2}} \end{bmatrix} \end{align*} $$

Notice that these vectors are already orthonormal.

  1. Compute the remaining column vectors of matrix $U$, if any.
    The remaining column vectors of $U$ are an orthonormal basis for $N\left(AA^T\right) = N\left(A^T\right)$.
    $$ \begin{align*} A^T &= \begin{bmatrix} 1 & 1 & \sqrt{2} \\ 1 & 1 & -\sqrt{2} \end{bmatrix}\rightarrow\begin{bmatrix} 1 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \\ \\ \xi_1 &= -t \\ \xi_2 &= t\in\mathbb{R}\qquad\text{$\xi_2$ is a free variable} \\ \xi_3 &= 0 \\ \\ \mathbf{\xi} &\in\text{Span}\left\{ \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} \right\} \end{align*} $$
    However, this is not a unit vector,
    $$ \mathbf{u}_3 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{bmatrix} $$
  • If the nullity of $A^T$ is greater than $1$, then you will need to use Gram-Schmidt to compute an orthonormal basis.
  • The columns of matrix $U$ are called the left singular vectors .
  1. Our Singular Value Decomposition
    $$ \begin{align*} U\Sigma V^T &= \begin{bmatrix} \frac{1}{2} & \frac{1}{2} & \frac{1}{\sqrt{2}} \\ \frac{1}{2} & \frac{1}{2} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \end{bmatrix}\begin{bmatrix} 2 & 0 \\ 0 & 2 \\ 0 & 0 \end{bmatrix}\begin{bmatrix} 1 &\ 0 \\ 0 &\ 1 \end{bmatrix}^T \end{align*} $$

7.5.6 The Four Main Subspaces

Determine the rank and bases for the four main subspaces of matrix $A$ in Example 1.

The rank of matrix $A$ is the number of nonzero singular values.
$$ \text{rank}(A) = 2 $$
We have
$$ A\mathbf{x} = U\Sigma V^T\mathbf{x} = \mathbf{b} $$
Since matrices $U$ and $V$ are orthogonal matrices, they represent rotations and/or reflections in the codomain and domain, respectively. They are in fact also change of variables or transition matrices. Matrix $\Sigma$ is a diagonal matrix so it represents a scaling linear transformation. It dilates or contacts vectors in the domain and embeds them in the higher dimensional vector space of the codomain.
If our original matrix $A$ was under determined, then we compute the singular value decomposition of $A^T$. This yields
$$ A^T = U\Sigma V^T $$
so,
$$ A = \left(A^T\right)^T = \left(U\Sigma V^T\right)^T = V\Sigma^T U^T $$
Thus matrix $\Sigma^T$ scales vectors from the domain and maps them to lower dimensional vectors in the codomain. We are obliged to rename each of the matrices so that we will in general denote our singular value decomposition
$$ A = U\Sigma V^T $$
so that

  1. matrix $U:\mathbb{R}^m\rightarrow\mathbb{R}^m$ is a transition matrix on the codomain,
  2. matrix $V:\mathbb{R}^n\rightarrow\mathbb{R}^n$ is a transition matrix on the domain, and
  3. matrix $\Sigma:\mathbb{R}^n\rightarrow\mathbb{R}^n$ is a diagonal, or scaling from $\mathbb{R}^n$ to $\mathbb{R}^m$.

Furthermore, the first $r = \text{rank}(A)$ columns of matrix $U$ is an orthonormal basis for the column space of $A$, $C(A)$.
$$ \begin{align*} C(A) &= \text{Span}\left\{\mathbf{u}_1,\ \mathbf{u}_2,\ \dots,\ \mathbf{u}_r \right\} = \text{Span}\left\{\begin{bmatrix}\ \frac{1}{2}\ \\ \ \frac{1}{2}\ \\ \ \frac{1}{\sqrt{2}}\ \end{bmatrix},\ \begin{bmatrix}\ \frac{1}{2}\ \\ \ \frac{1}{2}\ \\ -\frac{1}{\sqrt{2}}\ \end{bmatrix} \right\} \end{align*} $$
The remaining $m-r$ (if any) columns of matrix $U$ form a basis for the null space of $A^T$.
$$ \begin{align*} N(A^T) &= \text{Span}\left\{ \mathbf{u}_{r+1},\ \dots,\ \mathbf{u}_m \right\} = \text{Span}\left\{ \begin{bmatrix}\ \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ \ \ 0 \end{bmatrix} \right\} \end{align*} $$
The first $r$ columns of matrix $V$ ( not $V^T$!) form a basis for the row space of matrix $A$.
$$ \begin{align*} \\ C(A^T) &= \text{Span}\left\{ \mathbf{v}_1,\ \mathbf{v}_2,\ \dots,\ \mathbf{v}_r \right\} = \text{Span}\left\{\begin{bmatrix} 1 \\ 0 \end{bmatrix},\ \begin{bmatrix} 0 \\ 1 \end{bmatrix} \right\} \end{align*} $$
The remaining $n-r$ (if any) columns of matrix $V$ form a basis for the null space of $A$.
$$ N(A) = \left\{ \mathbf{0} \right\} $$

7.5.7 Applications

One of the most common applications of the singular value decomposition are robotics, data science, data compression, control systems, and statistical analysis. The singular value decomposition decomposes a linear transformation into a composition of a basis transition in the domain composed with scaling, and composed with a basis transition in the codomain. This factorization identifies the principle natural basis (or axis) like the diagonalization. It also shows the values of the scaling along each principle vector like the diaonalization. This allows us to identify the vectors or phenomena where that are scaled the most, as well as the vectors or phenomena that contribute the least effect on the result of the linear transformation.

Let us consider a inner product on the vector space $\mathbb{R}^{m\times n}$ of $m\times n$ matrices. The most familiar inner product would look like the Euclidean norm on $\mathbb{R}^n$. For $A,B\in\mathbb{R}^{m\times n}$,
$$ \langle A,B \rangle := \displaystyle\sum_{i=1}^m\displaystyle\sum_{j=1}^n a_{ij}b_{ij} $$
The corresponding norm would be
$$ \left\|A\right\|_F := \langle A, A \rangle^{\frac{1}{2}} = \left(\displaystyle\sum_{i=1}^m\displaystyle\sum_{j=1}^n a_{ij}^2 \right)^{\frac{1}{2}} $$
The subscript indicates the name of a mathematician. This norm is called the Frobenius norm on $\mathbb{R}^{m\times n}$. We have not had time to discuss norms on vector spaces of linear transformations much. We did discuss the Euclidean or $2$-norm on the vector space $\mathbb{R}^n$ given by
$$ \left\|\mathbf{v}\right\|_2 = \langle \mathbf{v},\mathbf{v} \rangle^{\frac{1}{2}} = \left( \displaystyle\sum_{k=1}^n v_k^2 \right)^{\frac{1}{2}} $$
We can combine these definitions to obtain the identity
$$ \left\|A\right\|_F^2 = \displaystyle\sum_{i=1}^m\displaystyle\sum_{j=1}^n a_{ij}^2 = \displaystyle\sum_{i=1}^m \left( \displaystyle\sum_{j=1}^n a_{ij}^2 \right) = \displaystyle\sum_{i=1}^m \left\|\mathbf{a}_j\right\|_2^2 $$
If we recall that matrix $U$ of our singular value decomposition is an orthogonal matrix we have
$$ \left\|U\Sigma\right\|_F^2 = \left( \displaystyle\sum_{j=1}^n \left\|U\mathbf{\sigma}_j\right\|_2^2 \right)^2 = \left( \displaystyle\sum_{j=1}^n \left\|\mathbf{\sigma}_j \right\|_2^2 \right)^2 = \left\|\Sigma\right\|_F^2 $$
Likewise matrix $V$ is an orthogonal matrix so,
$$ \left\|\Sigma V^T\right\|_F^2 = \left\|\left(\Sigma V^T\right)^T\right\|_F^2 = \left\|V\Sigma^T\right\|_F^2 = \left\|\Sigma^T\right\|_F^2 = \left\|\Sigma\right\|_F^2 $$
Moreover, since $\Sigma$ and $\Sigma^T$ are diagonal matrices,
$$ \left\|\Sigma\right\|_F^2 = \displaystyle\sum_{j=1}^r \sigma_j^2 = \left\|\Sigma^T\right\|_F^2 $$
Hence we have the identity
$$ \left\|A\right\|_F = \left\|U\Sigma V^T\right\|_F = \left\|\Sigma V^T\right\|_F = \left\|\Sigma\right\|_F = \left(\displaystyle\sum_{j=1}^r \sigma_j^2\right)^{\frac{1}{2}} $$

Example 2

Consider an upper triangular matrix whose diagonal elements are $1$'s,
$$ A = \begin{bmatrix} 1 & -1 & -1 & \dots & -1 & -1 \\ 0 & 1 & -1 & \dots & -1 &-1 \\ 0 & 0 & 1 & \dots & -1 & -1 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 & -1 \\ 0 & 0 & 0 & \dots & 0 & 1 \end{bmatrix} $$
Notice that the determinant of matrix $A$ is $1$, however if $n$ is very large, this matrix is almost singular . We mean that it is very close to a singular matrix. To see this we compute the residual or distance from matrix $A$ to matrix
$$ B = \begin{bmatrix} 1 & -1 & -1 & \dots & -1 & -1 \\ 0 & 1 & -1 & \dots & -1 &-1 \\ 0 & 0 & 1 & \dots & -1 & -1 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 & -1 \\ -\left(\frac{1}{2}\right)^{n-2} & 0 & 0 & \dots & 0 & 1 \end{bmatrix} $$
Matrix $B$ is singular because
$$ \begin{align*} B\begin{bmatrix} 2^{n-2} \\ 2^{n-3} \\ \vdots \\ 2^0 \\ 1 \end{bmatrix} &= \begin{bmatrix} 1 & -1 & -1 & \dots & -1 & -1 \\ 0 & 1 & -1 & \dots & -1 &-1 \\ 0 & 0 & 1 & \dots & -1 & -1 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 & -1 \\ -\left(\frac{1}{2}\right)^{n-2} & 0 & 0 & \dots & 0 & 1 \end{bmatrix}\begin{bmatrix} 2^{n-2} \\ 2^{n-3} \\ \vdots \\ 2^0 \\ 1 \end{bmatrix} \\ \\ &= \begin{bmatrix} 2^{n-2} - 2^{n-3} - 2^{n-4} - \dots - 2^0 - 1 \\ 0\ \ \ \ + 2^{n-3} - 2^{n-4} - \dots - 2^0 - 1 \\ 0\ \ \ \ + 0\ \ \ \ + 2^{n-4} - \dots -2^0 - 1 \\ \ddots \\ 0\ \ \ \ + 0\ \ \ \ + 0\ \ \ \ + \dots + 1\ \ - 1 \\ -1\ \ \ \ + 0\ \ \ \ + 0\ \ \ \ + \dots + 0\ \ + 1 \end{bmatrix} = \mathbf{0} \end{align*} $$
Thus $B\mathbf{x}=\mathbf{0}$ has a nontrivial solution and $B$ is a singular matrix. The distance from matrix $A$ to matrix $B$ is
$$ \left\|A - B\right\| = \frac{1}{2^{n-2}} $$
As $n$ gets very large, the distance between matrix $A$ and a singular matrix becomes very small.

7.5.8 Lower Rank Approximations

Example 3

Suppose that our matrix $A$ has the following singular value decomposition
$$ A = U\Sigma V^T = U\begin{bmatrix} 100 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & .001 & \dots & 0 \\ 0 & 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & \dots & 10^{-12} \end{bmatrix}V^T $$
Notice that the principle dilation has a factor of 100. The scaling of the other principle vectors (columns of $V$) are the unit and large contractions. We can approximate matrix $A$ with a matrix with rank 1,
$$ \begin{align*} B &= U\Sigma_1V^T = U\begin{bmatrix} 100 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & \dots & 0 \end{bmatrix}V^T \\ \\ &= 100\cdot U\begin{bmatrix} 1 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & \dots & 0 \end{bmatrix}V^T \\ \\ &= 100\cdot U\begin{bmatrix} \mathbf{v}_1^T \\ \mathbf{0}^T \\ \mathbf{0}^T \\ \vdots \\ \mathbf{0}^T \end{bmatrix} = 100\cdot\mathbf{u}_1\mathbf{v}_1^T = \sigma_1\mathbf{u}_1\mathbf{v}_1^T \\ \end{align*} $$
The $m\times n$ matrix product $\ \mathbf{u}\mathbf{v}^T$ is called the outer product of vectors $\mathbf{u}$ and $\mathbf{v}$.

Recall that the scalar $\mathbf{v}^T\mathbf{u} = \langle \mathbf{u},\mathbf{v} \rangle$ is the inner product , or scalar product .

The distance between $A$ and $B$ becomes
$$ \begin{align*} \left\|A-B\right\| &= \left\|U\Sigma V^T - U\Sigma_1V^T\right\| = \left\|U(\Sigma - \Sigma_1)V^T\right\| \\ \\ &= \left\|\Sigma - \Sigma_1\right\| = \left( 1^2 + .001^2 + \dots + 10^{-24} \right)^{\frac{1}{2}} \approx 1 \end{align*} $$
In general, for $1\le k < r$, the rank of matrix $A$, the closest rank-$k$ approximation is given by
$$ B = \displaystyle\sum_{j=1}^k \sigma_j\mathbf{u}_j\mathbf{v}_j^T $$

7.5.9 Principle Component Analysis

For matrix $A$ with singular value decomposition $A = U\Sigma V^T$, one attempts to increase performance of data analysis of a system with a large number of inputs. If the system has $n$ inputs and $m$ outputs, our matrix $A\in\mathbb{R}^{n\times n}$ represents the measured outputs for each input. These inputs can have noise associated with their measurement. This would provide a small amount of correlation from an input only due to noise, when the input really doesn't affect the output at all.

The singular value decomposition identifies the inputs in matrix $V$ by their singular value. Inputs that provide the most information about the output will have a corresponding large associated singular value. Inputs with negligible affect on the output will correspond to a small singular value.

The covariance matrix $S = \frac{1}{n-1}A^TA$, and the variance of a sample of input/output measurement $\mathbf{y}=A\mathbf{x}$ has the value
$$ \text{var}(\mathbf{y}) := \frac{(A\mathbf{x})^T(A\mathbf{x})}{n-1} = \frac{\mathbf{x}^TA^TA\mathbf{x}}{n-1} = \mathbf{x}^TS\mathbf{x} $$
In order to maximize the variance over all unit vectors in the domain of $A$, we choose a unit vector associated with the largest eigenvalue $\lambda_1$ of matrix $A^TA$. This is the vector $\mathbf{v}_1$, the first right singular vector, and first column of matrix $V$ in the singular value decomposition associated with the singular value $\sigma_1 = \sqrt{\lambda_1}$.

The corresponding left singular vector $\mathbf{u}_1$ gives us our first principle component vector ,
$$ \mathbf{y}_1 = A\mathbf{v}_1 = \sigma_1\mathbf{u}_1 $$
with variance
$$ \text{var}(\mathbf{y}_1) = \frac{(A\mathbf{v}_1)^TA\mathbf{v}_1}{n-1} = \frac{\mathbf{v}_1^TA^TA\mathbf{v}_1}{n-1} = \frac{\mathbf{v}_1^T\lambda_1\mathbf{v}_1}{n-1} = \frac{\lambda_1\|\mathbf{v}_1\|^2}{n-1} = \frac{\lambda_1}{n-1} = \frac{\sigma_1^2}{n-1} $$

To find the second principle component vector, one need find a unit vector orthogonal to $\mathbf{y}_1$ that maximizes the variance. The second right singular vector yields $y_2 = \sigma_2\mathbf{u}_2$. This vector maximizes the remaining variance and they are orthogonal as
$$ \langle\mathbf{y}_1,\mathbf{y}_2\rangle = \langle\sigma_1\mathbf{u}_1,\sigma_2\mathbf{u}_2\rangle = \sigma_1\sigma_2\langle\mathbf{u}_1\mathbf{u}_2\rangle = \sigma_1\sigma_2\cdot 0 = 0 $$
In general, the singular value decomposition solves the principle component analysis problem for all principle component vectors.

Creative Commons Logo - White


Your use of this self-initiated mediated course material is subject to our Creative Commons License .


Creative Commons Attribution-NonCommercial-ShareAlike 4.0

Creative Commons Logo - Black
Attribution
You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Creative Commons Logo - Black
Noncommercial
You may not use the material for commercial purposes.

Creative Commons Logo - Black
Share Alike
You are free to share, copy and redistribute the material in any medium or format. If you adapt, remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.