$\require{color}$ $\definecolor{brightblue}{rgb}{.267, .298, .812}$ $\definecolor{darkblue}{rgb}{0.0, 0.0, 1.0}$ $\definecolor{palepink}{rgb}{1, .73, .8}$ $\definecolor{softmagenta}{rgb}{.99,.34,.86}$ $\def\ihat{\mathbf{\hat{\unicode{x0131}}}}$ $\def\jhat{\mathbf{\hat{\unicode{x0237}}}}$ $\def\khat{\mathbf{\hat{k}}}$ $\def\tombstone{\unicode{x220E}}$
When analyzing our linear mathematical models we often need to describe an elaborate linear transformation in terms of a composition of simple linear transformations. The three simple linear transformation are:
Definition of a Complete Set of Orthonormal Eigenvectors ¶
A matrix $A\in\mathbb{C}^{n\times n}$ that has $n$ orthonormal eigenvectors associated with its eigenvalues is said to have a complete set of orthonormal eigenvectors .
This is an important property of a matrix. We will use this to construct the Singular Value Decomposition.
Definition of Diagonalizable ¶
Matrix $A\in\mathbb{C}^{n\times n}$ is called diagonalizable if there exists a nonsingular matrix $X$ and a diagonal matrix $D$ such that $$ X^{-1}AX = D $$
We say that matrix $X$ diagonalizes matrix $A$.
Definition of skew-symmetric and skew-Hermitian matrices ¶
A matrix $A\in\mathbb{C}^{n\times n}$ is called skew-symmetric if and only if $A^T = -A$. A matrix $A\in\mathbb{C}^{n\times n}$ is called skew-Hermitian if and only if $A^H = -A$.
Recall that a matrix that is equal to its own transpose $A = A^T$ is called symmetric, and a matrix that is equal to its own Hermitian $A = A^H$ is called an Hermitian matrix.
Definition of Normal Matrices ¶
Matrix $A\in\mathbb{R}^{n\times n}$ is an $n\times n$ is called a normal matrix if $AA^H = A^HA$.
A normal matrix commutes with its Hermitian counterpart. We discuss normal matrices because there are non-Hermitian matrices that possess a complete set of orthonormal eigenvectors. Skew-symmetric and skew-Hermitian matrices have a complete set of orthonormal eigenvectors.
Theorem 7.5.1 ¶
A matrix $A\in\mathbb{R}^{n\times n}$ is normal if and only if it possess a complete orthonormal set of eigenvectors .
$\Longleftarrow$
Any matrix with a complete set of orthonormal eigenvectors $\left\{\mathbf{\xi}_1,\ \mathbf{\xi}_2,\ \dots,\ \mathbf{\xi}_n\right\}$ has a unitary diagonalizing matrix $U$ whose columns are the $n$ orthonormal eigenvectors,
$$
\begin{align*}
U &= \begin{bmatrix} \mathbf{\xi}_1 & \mathbf{\xi}_2 & \dots & \mathbf{\xi}_n \end{bmatrix} \\
\\
A &= UDU^H
\end{align*}
$$
However it is not necessary that the diagonal matrix be Hermitian. Most diagonalizable matrices are not Hermitian. They have a diagonalizing matrix $X$, but the eigenvalues will not all be real. that is $A^H\neq A$. If the eigenvalues are not real, then the diagonal matrix will not be Hermitian.
$$
D^H \neq D
$$
Furthermore, the eigenvectors of distinct eigenvalues will be linearly independent, but not necessarily orthogonal. This means that the diagonalizing matrix $X$ will not be a Hermitian matrix.
$$
A^H = \left(XDX^{-1}\right)^H = \left(X^{-1}\right)^HD^HX^H \neq A
$$
For a Hermitian matrix we have $A^H=A$ so
$$
A^H = \left(XDX^{-1}\right)^H = \left(X^{-1}\right)^HD^HX^H {\color{red}\mathbf{=}} A = XDX^{-1}
$$
$$
A^H = \left(UDU^H\right)^H = U^HD^HU \neq A
$$
In this case, for all three factors
$$
\begin{align*}
X^H &= X^{-1} \\
D^H &= D \\
\left(X^{-1}\right)^H &= X
\end{align*}
$$
Our diagonalizing matrix will produce a diagonal matrix with possibly imaginary eigenvalues on the diagonal, however we will have that the product
$$
\begin{align*}
D^HD &= \begin{bmatrix} \lambda_1 & & & & \\ & \lambda_2 & & & \\ & & \dots & & \\ & & & & \lambda_n \end{bmatrix}^H \begin{bmatrix} \lambda_1 & & & & \\ & \lambda_2 & & & \\ & & \dots & & \\ & & & & \lambda_n \end{bmatrix} \\
\\
&= \begin{bmatrix} \lambda_1^H & & & & \\ & \lambda_2^H & & & \\ & & \dots & & \\ & & & & \lambda_n^H \end{bmatrix} \begin{bmatrix} \lambda_1 & & & & \\ & \lambda_2 & & & \\ & & \dots & & \\ & & & & \lambda_n \end{bmatrix} \\
\\
&= \begin{bmatrix} |\lambda_1|^2 & & & & \\ & |\lambda_2|^2 & & & \\ & & \dots & & \\ & & & & |\lambda_n|^2 \end{bmatrix} = DD^H
\end{align*}
$$
We see that even though $D$ is not Hermitian, it is a diagonal matrix and when we multiply $D$ with its complex conjugate, transpose (Hermitian) we obtain a real diagonal matrix with the norms of each eigenvalue on the diagonal. Recall that usually matrices do not
commute
. This means that for matrices $B$ and $C$, $BC\neq CB$. However our diagonal matrices of eigenvalues do commut. Hence
However,
$$
\begin{align*}
AA^H &= \left(UDU^H\right)\left(UDU^H\right)^H = UDU^HUD^HU^H = U{\color{red}DD^H}U^H \\
\\
&= U{\color{red}D^HD}U^H = \left(UD^HU^H\right)\left(UDU^H\right) = \left(UDU^H\right)^H\left(UDU^H\right) = A^HA
\end{align*}
$$
So even though $D$ and $A$ are not Hermitian, we still have shown that $AA^H = A^HA$ and matrix $AS is normal.
$\Longrightarrow$
If $n\times n$ matrix $A$ is normal, then $AA^H = A^HA$. Using the Schur Decomposition Theorem 6.4.3, we have that there is a unitary matrix $U$ such that $T = U^HAU$ is an upper triangular matrix. Hence
$$ T^HT = U^HA^HUU^HAU = U^HA^HAU = U^HAA^HU = U^HAUU^HA^HU = TT^H $$Thus matrix $T$ is a normal matrix as well. If
$$ T = \begin{bmatrix} t_{11} & t_{12} & \dots & t_{1n} \\ 0 & t_{21} & \dots & t_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & t_{nn} \end{bmatrix}, $$then
$$ \begin{align*} T^HT &= \begin{bmatrix} t_{11} & t_{12} & t_{13} & \dots & t_{1n} \\ 0 & t_{22} & t_{23} & \dots & t_{2n} \\ 0 & 0 & t_{33} & \dots & t_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & t_{nn} \end{bmatrix}^H \begin{bmatrix} t_{11} & t_{12} & t_{13} & \dots & t_{1n} \\ 0 & t_{22} & t_{23} & \dots & t_{2n} \\ 0 & 0 & t_{33} & \dots & t_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & t_{nn} \end{bmatrix} \\ \\ &= \begin{bmatrix} t_{11} & t_{12} & t_{13} & \dots & t_{1n} \\ 0 & t_{22} & t_{23} & \dots & t_{2n} \\ 0 & 0 & t_{33} & \dots & t_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & t_{nn} \end{bmatrix} \begin{bmatrix} t_{11} & t_{12} & t_{13} & \dots & t_{1n} \\ 0 & t_{22} & t_{23} & \dots & t_{2n} \\ 0 & 0 & t_{33} & \dots & t_{3n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & t_{nn} \end{bmatrix}^H = TT^H \\ \end{align*} $$Comparing the diagonal elements of each product we have
$$ \begin{align*} |t_{11}|^2 + |t_{12}|^2 + |t_{13}|^2 + \cdots + |t_{1n}|^2 &= |t_{11}|^2 \\ |t_{22}|^2 + |t_{23}|^2 + \cdots + |t_{2n}|^2 &= |t_{12}|^2 + |t_{22}|^2 \\ |t_{33}|^2 + \cdots + |t_{3n}|^2 &= |t_{13}|^2 + |t_{23}|^2 + |t_{33}|^2 \\ &\vdots \\ |t_{nn}|^2 &= |t_{1n}|^2 + |t_{2n}|^2 + |t_{3n}|^2 + |t_{nn}|^2 \\ \end{align*} $$Subtracting $|t_{kk}|^2$ from both sides of equation $k$, for $1\le k\le n$ yields
$$ \begin{align*} |t_{12}|^2 + |t_{13}|^2 + \cdots + |t_{1n}|^2 &= 0 \\ |t_{23}|^2 + \cdots + |t_{2n}|^2 &= |t_{12}|^2 \\ \cdots + |t_{3n}|^2 &= |t_{13}|^2 + |t_{23}|^2 \\ &\vdots \\ 0 &= |t_{1n}|^2 + |t_{2n}|^2 + |t_{3n}|^2 + \cdots + |t_{(n-1)n}|^2 \\ \end{align*} $$From these equations we conclude that for all $1\le i\neq j\le n$, $t_{ij}=0$. Therefore $T$ is a diagonal matrix and $U$ diagonalizes $A$. We have using matrix multiplication
$$ \begin{align*} AU &= A\begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 & \dots & \mathbf{u}_n \end{bmatrix} \\ \\ &= \begin{bmatrix} A\mathbf{u}_1 & A\mathbf{u}_2 & \dots & A\mathbf{u}_n \end{bmatrix} \\ \\ &= UT = \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 & \dots & \mathbf{u}_n \end{bmatrix} \begin{bmatrix} t_{11} & 0 & \dots & 0 \\ 0 & t_{22} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & t_{nn} \end{bmatrix} \\ \\ &= \begin{bmatrix} t_{11}\mathbf{u}_1 & t_{22}\mathbf{u}_2 & \dots & t_{nn}\mathbf{u}_n \end{bmatrix} \end{align*} $$Thus for $1\le j\le n$ we have $A\mathbf{u}_j = t_{jj}\mathbf{u}_j$, and $\left\{ \mathbf{u}_1,\ \mathbf{u}_2,\ \dots,\ \mathbf{u}_n\right\}$ is an orthonormal set of vectors because $U$ is a unitary matrix. Hence $\left\{ \mathbf{u}_1,\ \mathbf{u}_2,\ \dots,\ \mathbf{u}_n\right\}$ is a complete set of orthonormal eigenvectors. $\tombstone$
This means that the linear transformation represented by a matrix normal matrix $A$ is a composition of a reflection or rotation, followed by a scaling transformation, and a reflection or rotation.
However not all matrices are square matrices, normal matrices, or Hermitian .
Definition of Positive Definite ¶
A square matrix whose eigenvalues are all non-negative is called a positive semi-definite or positive definite matrix.
Let $A\in\mathbb{R}^{m\times n}$ be a real $m\times n$ matrix. While this matrix may not be Hermitian, symmetric, or even square, the matrix $A^TA\in\mathbb{R}^{n\times n}$ is a real and symmetric. Furthermore the eigenvalues of $A^TA$ are nonnegative, and the eigenvectors of distinct eigenvalues are orthogonal. So there is an orthogonal matrix $V$ that diagonalizes matrix $A^TA$,
Since all of the eigenvalues of matrix $A^TA$ are non-negative, $A^TA$ is a positive definite matrix.
$$ A^TA = V\Lambda V^T = \begin{bmatrix} \mathbf{v}_1 &\ \mathbf{v}_2 &\ \dots &\ \mathbf{v}_n \end{bmatrix}\begin{bmatrix} \sigma_1 &\ 0 &\ \dots &\ 0 \\ 0 &\ \sigma_2 &\ \dots &\ 0\ \\ 0 &\ 0 &\ \ddots &\ 0 \\ 0 &\ 0 &\ \dots &\ \sigma_m \end{bmatrix}\begin{bmatrix} \mathbf{v}_1^T \\ \mathbf{v}_2^T \\ \vdots \\ \mathbf{v}_n^T \end{bmatrix} $$
Likewise matrix $AA^T\in\mathbb{R}^{m\times m}$ is a real symmetric, positive definite matrix. The nonzero eigenvalues of $AA^T$ are real and nonnegative, and the eigenvectors of distinct eigenvalues are orthogonal. Thus there is an orthogonal diagonalizing matrix $U$ such that,
$$
AA^T = U\Lambda' U^T = \begin{bmatrix} \mathbf{u}_1 &\ \mathbf{u}_2 &\ \dots &\ \mathbf{u}_n \end{bmatrix}\begin{bmatrix} \sigma_1' &\ 0 &\ \dots &\ 0 \\ 0 &\ \sigma_2' &\ \dots &\ 0\ \\ 0 &\ 0 &\ \ddots &\ 0 \\ 0 &\ 0 &\ \dots &\ \sigma_m' \end{bmatrix}\begin{bmatrix} \mathbf{u}_1^T \\ \mathbf{u}_2^T \\ \vdots \\ \mathbf{u}_m^T \end{bmatrix}
$$
If $\sigma$ is a nonzero eigenvalue for $A^TA$ with associated eigenvector $\mathbf{x}$, then $A^TA\mathbf{x} = \sigma\mathbf{x}$. Multiplying both sides of this equation by matrix $A$ yields
$$
AA^T\left(A\mathbf{x}\right) = A\left(A^TA\mathbf{x}\right) = A\left(\sigma\mathbf{x}\right) = \sigma\left(A\mathbf{x}\right)
$$
Thus nonzero value $\sigma$ is an eigenvalue of matrix $AA^T$ with associated eigenvector $A\mathbf{x}$. Thus the
extra
eigenvalues when $m\neq n$ are all zeros.
$$
\Lambda = \begin{bmatrix} \sigma_1 &\ &\ &\ \\ &\ \sigma_2 &\ &\ \\ &\ &\ \ddots &\ \\ &\ &\ &\ \sigma_r \\&\ &\ &\ &\ 0_{r+1} \\ &\ &\ &\ &\ &\ \ddots \\ &\ &\ &\ &\ &\ &\ 0_n \end{bmatrix}\neq \begin{bmatrix} \sigma_1 &\ &\ &\ \\ &\ \sigma_2 &\ &\ \\ &\ &\ \ddots &\ \\ &\ &\ &\ \sigma_r \\&\ &\ &\ &\ 0_{r+1} \\ &\ &\ &\ &\ &\ \ddots \\ &\ &\ &\ &\ &\ &\ 0_m \end{bmatrix} = \Lambda'
$$
That is $\Lambda$ has $n-r$ zero eigenvalues along the diagonal and $\Lambda'$ has $m-r$ zero eigenvalues along the diagonal. $\Lambda = \Lambda'$ only when $m = n$.
Let us suppose for simplicity that matrix $A$ has
full rank
and $m\gt n$. That is all of the columns of $A$ are linearly independent. Then we want to find a set of orthogonal vectors $\left\{\mathbf{v}_1, \mathbf{v}_2,\dots,\mathbf{v}_n\right\}\in\mathbb{R}^n$ and orthogonal vectors $\left\{\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_m\right\}\in\mathbb{R}^m$ so that
$$
\begin{align*}
A\mathbf{v}_1 &= \sigma_1\mathbf{u}_1 \\
\\
A\mathbf{v}_2 &= \sigma_2\mathbf{u}_2 \\
\\
&\ddots \\
\\
A\mathbf{v}_r &= \sigma_r\mathbf{u}_r \\
\\
A\mathbf{v}_{r+1} &= 0\mathbf{u}_{r+1} = \mathbf{0} \\
\\
&\ddots \\
\\
A\mathbf{v}_n &= 0\mathbf{u}_n = \mathbf{0}
\end{align*}
$$
Thus
$$
A\begin{bmatrix} \mathbf{v}_1 &\ \mathbf{v}_2 &\ \dots &\ \mathbf{v}_r \end{bmatrix} = \begin{bmatrix} \mathbf{u}_1 &\ \mathbf{u}_2 &\ \dots &\ \mathbf{u}_r \end{bmatrix}\begin{bmatrix} \sigma_1 &\ &\ &\ \\ \ &\ \sigma_2 &\ &\ \\ \ &\ &\ \ddots &\ \\ \ &\ &\ &\ &\ \sigma_r \end{bmatrix}
$$
This results in
$$
AV' = U'\Sigma'
$$
Here $\Sigma'$ is a $r\times r$ with nonzero singular values along the diagonal. We want to expand this equation to include an orthogonal basis for the null space of $AA^T$. Recall that $N(AA^T) = N(A^T)$. Thus if $\left\{\mathbf{u}_{r+1}, \dots, \mathbf{u}_m\right\}$ is a basis for $N(A^T)$, and we add enough rows of zeros to $\Sigma'$, then we well have our orthogonal matrix
$$
U = \begin{bmatrix} \mathbf{u}_1 &\ \dots &\ \mathbf{u}_r &\ \mathbf{u}_{r+1} &\ \dots &\ \mathbf{u}_m \end{bmatrix}
$$
Similarly we expand the equation to include an orthogonal basis for the null space of $A^TA$. As $N(A^TA) = N(A)$, if $\left\{\mathbf{v}_{r+1}, \dots, \mathbf{v}_n\right\}$ is a basis for $N(A)$, and we add enough columns of zeros to $\Sigma'$, then we have our orthogonal matrix
$$
V = \begin{bmatrix} \mathbf{v}_1 &\ \dots &\ \mathbf{v}_r &\ \mathbf{v}_{r+1} &\ \dots &\ \mathbf{v}_n \end{bmatrix}
$$
and our equation becomes
$$
AV = U\Sigma
$$
where $m\times n$ matrix $\Sigma$ has the added rows and columns of zeros added to $\Sigma'$ for the null spaces of $A$ and $A^T$.
Multiplying both sides of $AV = U\Sigma$ by $V^{-1}$ results in
$$
A = U\Sigma V^T
$$
Theorem 7.5.1 ¶
The SVD Theorem
If $A\in\mathbb{R}^{m\times n}$, then $A$ has a singular value decomposition.
We showed in chapter 2 that $A^TA$ is a symmetric $n\times n$ matrix. Likewise $AA^T$ is an $m\times m$ symmetric matrix.
$$ \left(AA^T\right)^T = (A^T)^TA^T = AA^T\ \Large{\color{green}{\checkmark}} $$
$\tombstone$
If we multiply both sides of equation $A=U\Sigma V^T$ on the left by $A^T$ one obtains
$$
\begin{align*}
A^TA &= \left(U\Sigma V^T\right)^T\left(U\Sigma V^T\right) \\
\\
&= \left(V\Sigma^TU^T\right)\left(U\Sigma V^T\right) \\
\\
&= V\Sigma^T\left(U^TU\right)\Sigma V^T \\
\\
&= V\Sigma^T\Sigma V^T \\
\\
&= V\begin{bmatrix} \sigma_1^2 &\ &\ &\ &\ &\ \\ \ &\ \ddots &\ &\ &\ \\ \ &\ &\ \sigma_r^2 &\ &\ \\ \ &\ &\ &\ 0 &\ &\ \\ \ &\ &\ &\ &\ \ddots &\ \\ \ &\ &\ &\ &\ &\ 0 \end{bmatrix}V^T
\end{align*}
$$
Therefore matrix $V$ is an orthogonal diagonalizing matrix for matrix $A^TA$.
- The columns of orthogonal matrix $V$ are the eigenvectors of $A^TA$.
- The set $\left\{\sigma_1^2, \dots, \sigma_n^2\right\}$ are the associated eigenvalues of $A^TA$.
- The singular values of matrix $A$ are the square roots of the eigenvalues of matrix $A^TA$.
If we multiply both sides of the equation $A = U\Sigma V^T$ on the right by $A^T$, then one obtains,
$$
\begin{align*}
AA^T &= \left(U\Sigma V^T\right)\left(U\Sigma V^T\right)^T \\
\\
&= \left(U\Sigma V^T\right)\left(V\Sigma^TU^T\right) \\
\\
&= U\Sigma\left(V^TV\right)\Sigma^TU^T \\
\\
&= U\Sigma \Sigma^TU^T \\
\\
&= U\begin{bmatrix} \sigma_1^2 &\ &\ &\ &\ &\ \\ \ &\ \ddots &\ &\ &\ \\ \ &\ &\ \sigma_r^2 &\ &\ \\ \ &\ &\ &\ 0 &\ &\ \\ \ &\ &\ &\ &\ \ddots &\ \\ \ &\ &\ &\ &\ &\ 0 \end{bmatrix}U^T
\end{align*}
$$
- The columns of orthogonal matrix $U$ are the eigenvectors of $AA^T$.
- The singular values $\sigma_1$, $\dots$, $\sigma_n$ are unique; however, the matrix $U$ and $V$ are not.
- The first $r$ columns of matrix $U$ comprise a basis for the column space of matrix $A$, $C(A)$.
- The last $m-r$ columns of matrix $U$ comprise a basis for the null space of $A^T$, $N(A^T)$.
- The first $r$ columns of matrix $V$ comprise a basis for the row space of matrix $A$, $C(A^T)$.
- The last $n-r$ columns of matrix $V$ comprise a basis for the null space of $A$, $N(A)$.
Consider the matrix $A = \begin{bmatrix}\ \ 1\ &\ \ 1\ \\ \ \ 1\ &\ \ 1\ \\ \ \sqrt{2}\ & -\sqrt{2} \end{bmatrix}$
If $A$ is $m\times n$, and $m < n$, compute the singular value decomposition of $A^T$ using this method and transpose your result.
Since matrix $A$ is $m\times n$ and $m \ge n$,
- If the basis you obtain is not an orthonormal basis, then you must use Gram-Schmidt to obtain an orthonormal basis for the columns of $V$.
- The columns of matrix $V$ are called right singular vectors .
Matrix $V^{-1} = V^T$ is an $n\times n$ matrix, so matrix $\Sigma$ must be $m\times n$ so that the product has the same dimensions as our original matrix $A$.
Notice that these vectors are already orthonormal.
- If the nullity of $A^T$ is greater than $1$, then you will need to use Gram-Schmidt to compute an orthonormal basis.
- The columns of matrix $U$ are called the left singular vectors .
The rank of matrix $A$ is the number of nonzero singular values.
$$
\text{rank}(A) = 2
$$
We have
$$
A\mathbf{x} = U\Sigma V^T\mathbf{x} = \mathbf{b}
$$
Since matrices $U$ and $V$ are orthogonal matrices, they represent rotations and/or reflections in the codomain and domain, respectively. They are in fact also
change of variables
or
transition
matrices. Matrix $\Sigma$ is a diagonal matrix so it represents a scaling linear transformation. It dilates or contacts vectors in the domain and
embeds
them in the higher dimensional vector space of the codomain.
If our original matrix $A$ was under determined, then we compute the singular value decomposition of $A^T$. This yields
$$
A^T = U\Sigma V^T
$$
so,
$$
A = \left(A^T\right)^T = \left(U\Sigma V^T\right)^T = V\Sigma^T U^T
$$
Thus matrix $\Sigma^T$ scales vectors from the domain and maps them to lower dimensional vectors in the codomain. We are obliged to rename each of the matrices so that we will in general denote our singular value decomposition
$$
A = U\Sigma V^T
$$
so that
Furthermore, the first $r = \text{rank}(A)$ columns of matrix $U$ is an orthonormal basis for the column space of $A$, $C(A)$.
$$
\begin{align*}
C(A) &= \text{Span}\left\{\mathbf{u}_1,\ \mathbf{u}_2,\ \dots,\ \mathbf{u}_r \right\} = \text{Span}\left\{\begin{bmatrix}\ \frac{1}{2}\ \\ \ \frac{1}{2}\ \\ \ \frac{1}{\sqrt{2}}\ \end{bmatrix},\ \begin{bmatrix}\ \frac{1}{2}\ \\ \ \frac{1}{2}\ \\ -\frac{1}{\sqrt{2}}\ \end{bmatrix} \right\}
\end{align*}
$$
The remaining $m-r$ (if any) columns of matrix $U$ form a basis for the null space of $A^T$.
$$
\begin{align*}
N(A^T) &= \text{Span}\left\{ \mathbf{u}_{r+1},\ \dots,\ \mathbf{u}_m \right\} = \text{Span}\left\{ \begin{bmatrix}\ \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ \ \ 0 \end{bmatrix} \right\}
\end{align*}
$$
The first $r$ columns of matrix $V$ (
not
$V^T$!) form a basis for the row space of matrix $A$.
$$
\begin{align*}
\\
C(A^T) &= \text{Span}\left\{ \mathbf{v}_1,\ \mathbf{v}_2,\ \dots,\ \mathbf{v}_r \right\} = \text{Span}\left\{\begin{bmatrix} 1 \\ 0 \end{bmatrix},\ \begin{bmatrix} 0 \\ 1 \end{bmatrix} \right\}
\end{align*}
$$
The remaining $n-r$ (if any) columns of matrix $V$ form a basis for the null space of $A$.
$$
N(A) = \left\{ \mathbf{0} \right\}
$$
One of the most common applications of the singular value decomposition are robotics, data science, data compression, control systems, and statistical analysis. The singular value decomposition decomposes a linear transformation into a composition of a basis transition in the domain composed with scaling, and composed with a basis transition in the codomain. This factorization identifies the principle natural basis (or axis) like the diagonalization. It also shows the values of the scaling along each principle vector like the diaonalization. This allows us to identify the vectors or phenomena where that are scaled the most, as well as the vectors or phenomena that contribute the least effect on the result of the linear transformation.
Let us consider a
inner product
on the vector space $\mathbb{R}^{m\times n}$ of $m\times n$ matrices. The most familiar inner product would look like the Euclidean norm on $\mathbb{R}^n$. For $A,B\in\mathbb{R}^{m\times n}$,
$$
\langle A,B \rangle := \displaystyle\sum_{i=1}^m\displaystyle\sum_{j=1}^n a_{ij}b_{ij}
$$
The corresponding norm would be
$$
\left\|A\right\|_F := \langle A, A \rangle^{\frac{1}{2}} = \left(\displaystyle\sum_{i=1}^m\displaystyle\sum_{j=1}^n a_{ij}^2 \right)^{\frac{1}{2}}
$$
The subscript indicates the name of a mathematician. This norm is called the
Frobenius
norm on $\mathbb{R}^{m\times n}$. We have not had time to discuss norms on vector spaces of linear transformations much. We did discuss the Euclidean or $2$-norm on the vector space $\mathbb{R}^n$ given by
$$
\left\|\mathbf{v}\right\|_2 = \langle \mathbf{v},\mathbf{v} \rangle^{\frac{1}{2}} = \left( \displaystyle\sum_{k=1}^n v_k^2 \right)^{\frac{1}{2}}
$$
We can combine these definitions to obtain the identity
$$
\left\|A\right\|_F^2 = \displaystyle\sum_{i=1}^m\displaystyle\sum_{j=1}^n a_{ij}^2 = \displaystyle\sum_{i=1}^m \left( \displaystyle\sum_{j=1}^n a_{ij}^2 \right) = \displaystyle\sum_{i=1}^m \left\|\mathbf{a}_j\right\|_2^2
$$
If we recall that matrix $U$ of our singular value decomposition is an orthogonal matrix we have
$$
\left\|U\Sigma\right\|_F^2 = \left( \displaystyle\sum_{j=1}^n \left\|U\mathbf{\sigma}_j\right\|_2^2 \right)^2 = \left( \displaystyle\sum_{j=1}^n \left\|\mathbf{\sigma}_j \right\|_2^2 \right)^2 = \left\|\Sigma\right\|_F^2
$$
Likewise matrix $V$ is an orthogonal matrix so,
$$
\left\|\Sigma V^T\right\|_F^2 = \left\|\left(\Sigma V^T\right)^T\right\|_F^2 = \left\|V\Sigma^T\right\|_F^2 = \left\|\Sigma^T\right\|_F^2 = \left\|\Sigma\right\|_F^2
$$
Moreover, since $\Sigma$ and $\Sigma^T$ are diagonal matrices,
$$
\left\|\Sigma\right\|_F^2 = \displaystyle\sum_{j=1}^r \sigma_j^2 = \left\|\Sigma^T\right\|_F^2
$$
Hence we have the identity
$$
\left\|A\right\|_F = \left\|U\Sigma V^T\right\|_F = \left\|\Sigma V^T\right\|_F = \left\|\Sigma\right\|_F = \left(\displaystyle\sum_{j=1}^r \sigma_j^2\right)^{\frac{1}{2}}
$$
Consider an upper triangular matrix whose diagonal elements are $1$'s,
$$
A = \begin{bmatrix} 1 & -1 & -1 & \dots & -1 & -1 \\ 0 & 1 & -1 & \dots & -1 &-1 \\ 0 & 0 & 1 & \dots & -1 & -1 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 & -1 \\ 0 & 0 & 0 & \dots & 0 & 1 \end{bmatrix}
$$
Notice that the determinant of matrix $A$ is $1$, however if $n$ is very large, this matrix is
almost singular
. We mean that it is
very close
to a singular matrix. To see this we compute the
residual
or
distance
from matrix $A$ to matrix
$$
B = \begin{bmatrix} 1 & -1 & -1 & \dots & -1 & -1 \\ 0 & 1 & -1 & \dots & -1 &-1 \\ 0 & 0 & 1 & \dots & -1 & -1 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 & -1 \\ -\left(\frac{1}{2}\right)^{n-2} & 0 & 0 & \dots & 0 & 1 \end{bmatrix}
$$
Matrix $B$ is singular because
$$
\begin{align*}
B\begin{bmatrix} 2^{n-2} \\ 2^{n-3} \\ \vdots \\ 2^0 \\ 1 \end{bmatrix} &= \begin{bmatrix} 1 & -1 & -1 & \dots & -1 & -1 \\ 0 & 1 & -1 & \dots & -1 &-1 \\ 0 & 0 & 1 & \dots & -1 & -1 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & 1 & -1 \\ -\left(\frac{1}{2}\right)^{n-2} & 0 & 0 & \dots & 0 & 1 \end{bmatrix}\begin{bmatrix} 2^{n-2} \\ 2^{n-3} \\ \vdots \\ 2^0 \\ 1 \end{bmatrix} \\
\\
&= \begin{bmatrix} 2^{n-2} - 2^{n-3} - 2^{n-4} - \dots - 2^0 - 1 \\ 0\ \ \ \ + 2^{n-3} - 2^{n-4} - \dots - 2^0 - 1 \\ 0\ \ \ \ + 0\ \ \ \ + 2^{n-4} - \dots -2^0 - 1 \\ \ddots \\ 0\ \ \ \ + 0\ \ \ \ + 0\ \ \ \ + \dots + 1\ \ - 1 \\ -1\ \ \ \ + 0\ \ \ \ + 0\ \ \ \ + \dots + 0\ \ + 1 \end{bmatrix} = \mathbf{0}
\end{align*}
$$
Thus $B\mathbf{x}=\mathbf{0}$ has a nontrivial solution and $B$ is a singular matrix. The
distance
from matrix $A$ to matrix $B$ is
$$
\left\|A - B\right\| = \frac{1}{2^{n-2}}
$$
As $n$ gets very large, the distance between matrix $A$ and a singular matrix becomes very small.
Suppose that our matrix $A$ has the following singular value decomposition
$$
A = U\Sigma V^T = U\begin{bmatrix} 100 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & .001 & \dots & 0 \\ 0 & 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & \dots & 10^{-12} \end{bmatrix}V^T
$$
Notice that the principle dilation has a factor of 100. The scaling of the other principle vectors (columns of $V$) are the unit and large contractions. We can approximate matrix $A$ with a matrix with rank 1,
$$
\begin{align*}
B &= U\Sigma_1V^T = U\begin{bmatrix} 100 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & \dots & 0 \end{bmatrix}V^T \\
\\
&= 100\cdot U\begin{bmatrix} 1 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & \dots & 0 \end{bmatrix}V^T \\
\\
&= 100\cdot U\begin{bmatrix} \mathbf{v}_1^T \\ \mathbf{0}^T \\ \mathbf{0}^T \\ \vdots \\ \mathbf{0}^T \end{bmatrix} = 100\cdot\mathbf{u}_1\mathbf{v}_1^T = \sigma_1\mathbf{u}_1\mathbf{v}_1^T \\
\end{align*}
$$
The $m\times n$ matrix product $\ \mathbf{u}\mathbf{v}^T$ is called the
outer product
of vectors $\mathbf{u}$ and $\mathbf{v}$.
Recall that the scalar $\mathbf{v}^T\mathbf{u} = \langle \mathbf{u},\mathbf{v} \rangle$ is the inner product , or scalar product .
The distance between $A$ and $B$ becomes
$$
\begin{align*}
\left\|A-B\right\| &= \left\|U\Sigma V^T - U\Sigma_1V^T\right\| = \left\|U(\Sigma - \Sigma_1)V^T\right\| \\
\\
&= \left\|\Sigma - \Sigma_1\right\| = \left( 1^2 + .001^2 + \dots + 10^{-24} \right)^{\frac{1}{2}} \approx 1
\end{align*}
$$
In general, for $1\le k < r$, the rank of matrix $A$, the closest rank-$k$ approximation is given by
$$
B = \displaystyle\sum_{j=1}^k \sigma_j\mathbf{u}_j\mathbf{v}_j^T
$$
For matrix $A$ with singular value decomposition $A = U\Sigma V^T$, one attempts to increase performance of data analysis of a system with a large number of inputs. If the system has $n$ inputs and $m$ outputs, our matrix $A\in\mathbb{R}^{n\times n}$ represents the measured outputs for each input. These inputs can have noise associated with their measurement. This would provide a small amount of correlation from an input only due to noise, when the input really doesn't affect the output at all.
The singular value decomposition identifies the inputs in matrix $V$ by their singular value. Inputs that provide the most information about the output will have a corresponding large associated singular value. Inputs with negligible affect on the output will correspond to a small singular value.
The
covariance matrix
$S = \frac{1}{n-1}A^TA$, and the
variance
of a sample of input/output measurement $\mathbf{y}=A\mathbf{x}$ has the value
$$
\text{var}(\mathbf{y}) := \frac{(A\mathbf{x})^T(A\mathbf{x})}{n-1} = \frac{\mathbf{x}^TA^TA\mathbf{x}}{n-1} = \mathbf{x}^TS\mathbf{x}
$$
In order to
maximize
the variance over all unit vectors in the domain of $A$, we choose a unit vector associated with the largest eigenvalue $\lambda_1$ of matrix $A^TA$. This is the vector $\mathbf{v}_1$, the first right singular vector, and first column of matrix $V$ in the singular value decomposition associated with the singular value $\sigma_1 = \sqrt{\lambda_1}$.
The corresponding left singular vector $\mathbf{u}_1$ gives us our
first principle component vector
,
$$
\mathbf{y}_1 = A\mathbf{v}_1 = \sigma_1\mathbf{u}_1
$$
with variance
$$
\text{var}(\mathbf{y}_1) = \frac{(A\mathbf{v}_1)^TA\mathbf{v}_1}{n-1} = \frac{\mathbf{v}_1^TA^TA\mathbf{v}_1}{n-1} = \frac{\mathbf{v}_1^T\lambda_1\mathbf{v}_1}{n-1} = \frac{\lambda_1\|\mathbf{v}_1\|^2}{n-1} = \frac{\lambda_1}{n-1} = \frac{\sigma_1^2}{n-1}
$$
To find the
second principle component
vector, one need find a unit vector orthogonal to $\mathbf{y}_1$ that maximizes the variance. The second right singular vector yields $y_2 = \sigma_2\mathbf{u}_2$. This vector maximizes the remaining variance and they are orthogonal as
$$
\langle\mathbf{y}_1,\mathbf{y}_2\rangle = \langle\sigma_1\mathbf{u}_1,\sigma_2\mathbf{u}_2\rangle = \sigma_1\sigma_2\langle\mathbf{u}_1\mathbf{u}_2\rangle = \sigma_1\sigma_2\cdot 0 = 0
$$
In general, the singular value decomposition solves the principle component analysis problem for all principle component vectors.
Creative Commons Attribution-NonCommercial-ShareAlike 4.0
Attribution
You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
Noncommercial
You may not use the material for commercial purposes.
Share Alike
You are free to share, copy and redistribute the material in any medium or format. If you adapt, remix, transform, or build upon the material, you must distribute your contributions under the
same license
as the original.