Math 511: Linear Algebra
7.2 Diagonalization
7.2.1 Eigenvectors¶
$$ \require{color} \definecolor{brightblue}{rgb}{.267, .298, .812} \definecolor{darkblue}{rgb}{0.0, 0.0, 1.0} \definecolor{palepink}{rgb}{1, .73, .8} \definecolor{softmagenta}{rgb}{.99,.34,.86} \definecolor{blueviolet}{rgb}{.537,.192,.937} \definecolor{jonquil}{rgb}{.949,.792,.098} \definecolor{shockingpink}{rgb}{1, 0, .741} \definecolor{royalblue}{rgb}{0, .341, .914} \definecolor{alien}{rgb}{.529,.914,.067} \definecolor{crimson}{rgb}{1, .094, .271} \def\ihat{\mathbf{\hat{\unicode{x0131}}}} \def\jhat{\mathbf{\hat{\unicode{x0237}}}} \def\khat{\mathrm{\hat{k}}} \def\tombstone{\unicode{x220E}} \def\contradiction{\unicode{x2A33}} $$
After introducing the concept of eigenvalues and exploring their properties, let us turn our attention to eigenvectors.
- Eigenvectors $\mathbf{x}$ of a linear transformation $L$ are vectors whose image under the linear transformation are scalar multiples of the eigenvector.
$$ L\left(\mathbf{x}\right) = \lambda\mathbf{x} $$
- Eigenvectors $\mathbf{x}$ of a matrix are vectors whose product $A\mathbf{x} = \lambda\mathbf{x}$ are scalar multiples of the eigenvector.
In other words an eigenvector $\mathbf{x}$ and its image $A\mathbf{x}$ are co-linear
$$
A\mathbf{x}\in\text{Span}\left\{\mathbf{x}\right\}
$$
The corresponding eigenvalues $\lambda_i$ are roots of the characteristic polynomial
$$
p_{\lambda} := \left|A-\lambda I\right|
$$
This characteristic polynomial of degree $n$ has $n$ roots, counting multiplicity, although some of these roots may be complex conjugate roots. The multiplicity of a root $\lambda_i$ of the characteristic polynomial is called its algebraic multiplicity. Each root of a polynomial is a solution of the equation
$$
p_{\lambda}(z) = 0.
$$
This means that for each of the $k\le n$ distinct roots of the characteristic polynomial $\lambda_1$, $\lambda_2$, $\dots$, $\lambda_k$, the matrix
$$
B_{\lambda_i} := A - \lambda_i I
$$
is a singular matrix with a nontrivial null space.
Definition¶
The null space of this matrix is called the eigenspace of the associated eigenvalue $\lambda_i$.
Every nonzero vector $\mathbf{x}\in N\left(B_{\lambda_i}\right)$ is an eigenvector as
$$
\begin{align*}
\mathbf{0} &= B_{\lambda_i}\mathbf{x} = \left(A - \lambda_i I\right)\mathbf{x} \\
\\
\mathbf{0} &= A\mathbf{x} - \lambda_i\mathbf{x} \\
\\
A\mathbf{x} &= \lambda_i\mathbf{x}
\end{align*}
$$
A basis of the eigenspace associated with eigenvalue $\lambda_i$ can be determined readily applying methods from Chapter 1 to matrix $B_{\lambda_i}$.
Definition¶
The nullity of matrix $B_{\lambda_i}$ is called the geometric multiplicity of the eigenvalue $\lambda_i$. The geometric multiplicity of an eigenvalue is the dimension of its associated eigenspace.
7.2.2 Linear Independence of Eigenvectors¶
Theorem 7.2.1¶
Linear Independence of Eigenvectors
If $\lambda_1$, $\lambda_2$, $\dots$, $\lambda_k$ are distinct eigenvalues of matrix $A\in\mathbb{R}^{n\times n}$ with corresponding eigenvectors $\mathbf{x}_1$, $\mathbf{x}_2$, $\dots$, $\mathbf{x}_k$, then the set of eigenvectors $\left\{\mathbf{x}_1,\ \mathbf{x}_2,\ \dots,\ \mathbf{x}_k\right\}$ is a linearly independent set.
Proof¶
Suppose that $\lambda_1$ and $\lambda_2$ are distinct eigenvalues of matrix $\in\mathbb{R}^{n\times n}$ with corresponding eigenvectors $\mathbf{x}_1$ and $\mathbf{x}_2$. This means that $\lambda_1\neq\lambda_2$,
$$ \begin{align*} A\mathbf{x_1} &= \lambda_1\mathbf{x_1}\qquad\qquad &A\mathbf{x}_2 &= \lambda_2\mathbf{x}_2 \\ \\ \mathbf{x_1}&\in N\left(A - \lambda_1 I\right)\qquad\qquad &\mathbf{x}_2&\in N\left(A - \lambda_2 I\right) \\ \end{align*} $$
What is the intersection of these two eigenspaces?
$$ N\left(A - \lambda_1 I\right)\cap N\left(A - \lambda_2 I\right) $$
If vector $x\in N\left(A - \lambda_1 I\right)\cap N\left(A - \lambda_2 I\right)$ then
$$ \lambda_1\mathbf{x} = A\mathbf{x} = \lambda_2\mathbf{x} $$
Thus
$$ (\lambda_2 - \lambda_1)\mathbf{x} = \mathbf{0} $$
Our hypothesis is that $\lambda_1\neq \lambda_2$, so $\lambda_2-\lambda_1\neq 0$. Therefore $\mathbf{x} = \mathbf{0}$. This proves that the eigenspaces for distinct eigenvalues are subspaces of $\mathbb{R}^n$ that intersect at the origin only. We know from chapter 5 that the vectors $\mathbf{x}_1\in N\left(A - \lambda_1 I\right)$ and $\mathbf{x}_2\in N\left(A - \lambda_2 I\right)$ are linearly independent.
Using this argument the first eigenvector is linearly independent from all of rest. Continuing, all of the eigenvectors, $\left\{\mathbf{x}_1,\ \mathbf{x}_2,\ \dots,\ \mathbf{x}_k\right\}$ are linearly independent.$\tombstone$
7.2.3 Algebraic Multiplicity and Geometric Multiplicity¶
If and eigenvalues have algebraic multiplicity greater than one, the one expects the null space of $B_{\lambda} = A - \lambda I$ to have a dimension greater than one. Ideally the algebraic multiplicity of an eigenvalue equals the geometric multiplicity. However this does not always hold.
Example 1¶
Consider the matrix
$$ A = \begin{bmatrix} 1 &\ 1 \\ 0 &\ 1 \end{bmatrix} $$
The characteristic polynomial for matrix $A$ is given by
$$ p_{\lambda} = \begin{vmatrix} 1-\lambda &\ 1 \\ 0 & 1-\lambda \end{vmatrix} = (1-\lambda)^2 = 0 $$
Thus $\lambda = 1$ is an eigenvalue of algebraic multiplicity two. Matrix
$$ B_1 := A - I = \begin{bmatrix} 0 &\ 1 \\ 0 &\ 0 \end{bmatrix} $$
is in row echelon form and it has nullity equal to one. Hence the geometric multiplicity of eigenvalue $\lambda=1$ is one. We can only find a one-dimensional eigenspace associated with this eigenvalue with algebraic multiplicity two.
Definition¶
If $A\in\mathbb{R}^{n\times n}$ is an $n\times n$ matrix with an eigenvalue whose geometric multiplicity is less than its algebraic multiplicity, then we call this matrix defective
If a matrix $A\in\mathbb{R}^{n\times n}$ is not defective, then it has $k\le n$ distinct roots and $n$ roots counting multiplicity. For eigenvalue we have an associated eigenspace with dimension equal to the multiplicity of the eigenvalue. This means that the sum of the geometric multiplicities of the eigenvalues will also equal $n$. Thus the direct sum of the eigenspaces will be $\mathbb{R}^n$.
$$ N(B_{\lambda_1})\oplus N(B_{\lambda_2})\oplus\dots\oplus N(B_{\lambda_k}) = \displaystyle\bigoplus_{i=1}^k N(B_{\lambda_i}) = \mathbb{R}^n $$
If we select a basis for each eigenspace, then the collection of eigenvectors $\left\{\mathbf{x}_1,\ \mathbf{x}_2,\ \dots,\ \mathbf{x}_k\right\}$ will form a basis for $\mathbb{R}^n$.
Definition¶
If a $n\times n$ matrix $A$ has a set of $n$ linearly independent eigenvectors, we say that the matrix has a complete set of eigenvectors. Such a collection of linearly independent vectors is called an eigenbasis or characteristic basis for $\mathbb{R}^n$.
7.2.4 Diagonalizable Matrices¶
Let matrix $A\in\mathbb{R}^n$, have a complete set of eigenvectors so that the set of eigenvectors $\left\{ \mathbf{x}_1,\ \mathbf{x}_2,\ \dots,\ \mathbf{x}_n \right\}$ associated with eigenvalues $\left\{ \lambda_1,\ \lambda_2,\ \dots,\ \lambda_n \right\}$ form an eigenbasis for $\mathbb{R}^n$. Remember that eigenvalues with algebraic multiplicity greater than one will appear in the list of eigenvalues more than once, a number equal to its algebraic multiplicity. Define matrix $D$ to be the diagonal matrix with the $n$ eigenvalues on the diagonal; and matrix $X$ so that the $j^{\text{th}}$ column of matrix $X$ is eigenvector $\mathbf{x}_j$.
$$ X = \begin{bmatrix} \mathbf{x}_1 & \mathbf{x}_2 & \dots & \mathbf{x}_n \end{bmatrix} $$
The matrix product $AX$ yields
$$ \begin{align*} AX &= A\begin{bmatrix} \mathbf{x}_1 & \mathbf{x}_2 & \dots & \mathbf{x}_n \end{bmatrix} \\ \\ &= \begin{bmatrix} A\mathbf{x}_1 & A\mathbf{x}_2 & \dots & A\mathbf{x}_n \end{bmatrix} \\ \\ &= \begin{bmatrix} \lambda_1\mathbf{x}_1 & \lambda_2\mathbf{x}_2 & \dots & \lambda_n\mathbf{x}_n \end{bmatrix} \\ \\ &= \begin{bmatrix} \mathbf{x}_1 & \mathbf{x}_2 & \dots & \mathbf{x}_n \end{bmatrix}\begin{bmatrix} \lambda_1 &\ &\ &\ \\ \ & \lambda_2 &\ &\ &\ \\ \ &\ &\ \ddots &\ \\ \ &\ &\ &\ & \lambda_n \end{bmatrix} \\ \\ &= XD \end{align*} $$
Matrix $X$ is nonsingular because its columns are linearly independent, thus
$$ A = XDX^{-1}\qquad\text{and}\qquad D = X^{-1}AX $$
Recall¶
If $A\in\mathbb{R}^{n\times n}$ is an $n\times n$ matrix, then it represents a linear transformation on $\mathbb{R}^n$. If matrix $B\in\mathbb{R}^n$ is a similar matrix $A$, then there is a nonsingular $n\times n$ transition matrix $S$ such that
$$ A = SBS^{-1} $$
Furthermore matrix $B$ represents the same linear transformation with respect to a different choice of basis for the domain and codomain $\mathbb{R}^n$.
From out observations, matrix $A$ and diagonal matrix $D$ are similar matrices with transition matrix $X$. Matrix $X$ is the transition matrix from the basis chosen to create matrix $A$ to the eigenbasis.
An $n\times n$ matrix $A$ is called diagonalizable if there exists a nonsingular matrix $X$ and a diagonal matrix $D$ such that
$$ D = X^{-1}AX $$
We say that matrix $X$ diagonalizes matrix $A$.
In the previous section we proved the following
Theorem 7.2.2¶
Diagonalizability Theorem
An $n\times n$ matrix $A$ is diagonalizable if and only if $A$ has a complete set of eigenvectors.
Remarks
If matrix $A$ is diagonalizable, then column vectors of the diagonalizing matrix $X$ form an eigenbasis for $\mathbb{R}^n$, and the diagonal elements of $D$ are the corresponding eigenvalues.
The diagonalizing matrix is not unique. Remember that eigenvectors are not unique; they are elements of an eigenspace. We can reorder the list of eigenvalues in matrix $D$ and this simply reorders the columns of matrix $X$.
If and $n\times n$ matrix $A$ has $n$ distinct eigenvalues, then $A$ is diagonalizable. It is not possible for an eigenvalue $\lambda$ of algebraic multiplicity one to have a trivial eigenspace because the $n\times n$ matrix $B_{\lambda} = A - \lambda I$ is a singular matrix.
If the eigenvalues of $n\times n$ matrix $A$ are not distinct, then matrix $A$ may or may not be diagonalizable depending whether it is defective.
Theorem 7.2.2 implies that if matrix $A$ is diagonalizable, then it can be factored into a product $XDX^{-1}$.
7.2.5 Diagonalizing a Matrix¶
Example 2¶
Consider the matrix
$$ A = \begin{bmatrix}\ \ 0\ &\ \ 4\ \\ -3\ &\ \ 8\ \end{bmatrix} $$
The characteristic polynomial is given by
$$ p_{\lambda} = \begin{vmatrix} -\lambda\ &\ \ 4\ \\ -3\ &\ \ 8-\lambda\ \end{vmatrix} = -\lambda(8-\lambda) + 12 = \lambda^2 - 8\lambda + 12 = (\lambda - 6)(\lambda - 2) $$
The eigenvalues of matrix $A$ are $\lambda_1=6$ and $\lambda_2=2$. Matrix $B_{\lambda_1} = B_6$ is singular
$$ B_6 = \begin{bmatrix} -6\ &\ \ 4 \\ -3\ &\ \ 2\ \end{bmatrix}\rightarrow\begin{bmatrix}\ \ 3\ & -2 \\ -3\ &\ \ 2\ \end{bmatrix}\rightarrow\begin{bmatrix}\ \ 3\ & -2 \\ \ \ 0\ &\ \ 0\ \end{bmatrix} $$
Utilizing backward substitution for $B_6\mathbf{x}=\mathbf{0}$, $x_2 = \alpha\in\mathbb{R}$ and $3x_1 - 2\alpha = 0$, or $x_1 = \frac{2}{3}\alpha$. A basis for the eigenspace associated with eigenvalue $\lambda_1=6$ can be written
$$ N(B_6) = \text{Span}\left\{ \begin{bmatrix} \frac{2}{3} \\ 1 \end{bmatrix} \right\} $$
We can choose eigenvector $\mathbf{x}_1 = \begin{bmatrix} 2 \\ 3 \end{bmatrix}$. Matrix $B_{\lambda_2} = B_2$ is also singular.
$$ B_2 = \begin{bmatrix} -2\ &\ \ 4\ \\ -3\ &\ \ 6\ \end{bmatrix}\rightarrow\begin{bmatrix}\ \ 1\ & -2 \ \\ -3\ &\ \ 6\ \end{bmatrix}\rightarrow\begin{bmatrix}\ \ 1\ & -2 \ \\ \ \ 0\ &\ \ 0\ \end{bmatrix} $$
Utilizing backward substitution for $B_2\mathbf{x}=\mathbf{0}$, $x_2 = \alpha\in\mathbb{R}$ and $x_1 - 2\alpha = 0$, so $x_1 = 2\alpha$. A basis for the eigenspace associated with eigenvalue $\lambda_2=2$ can be written
$$ N(B_2) = \text{Span}\left\{ \begin{bmatrix} 2 \\ 1 \end{bmatrix} \right\} $$
Notice that eigenvectors $\mathbf{x}_1$ and $\mathbf{x}_2 = \begin{bmatrix} 2 \\ 1 \end{bmatrix}$ are linearly independent. Thus the eigenbasis $\{ \mathbf{x}_1, \mathbf{x}_2 \}$ forms a basis for $\mathbb{R}^2$. Choose
$$ X = \begin{bmatrix} 2 & 2 \\ 3 & 1 \end{bmatrix}\qquad\text{and}\qquad D = \begin{bmatrix} 6 & 0 \\ 0 & 2 \end{bmatrix} $$
It follows that
$$ \begin{align*} X^{-1}AX &= \dfrac{1}{4}\begin{bmatrix} -1\ &\ \ 2\ \\ \ \ 3\ & -2\ \end{bmatrix}\begin{bmatrix}\ \ 0\ &\ \ 4\ \\ -3\ &\ \ 8\ \end{bmatrix}\begin{bmatrix} 2 & 2 \\ 3 & 1 \end{bmatrix} \\ \\ &= \dfrac{1}{4}\begin{bmatrix} -1\ &\ \ 2\ \\ \ \ 3\ & -2\ \end{bmatrix}\begin{bmatrix} 12\ &\ \ 4\ \\ \ 18\ &\ \ 2\end{bmatrix} \\ \\ &= \dfrac{1}{4}\begin{bmatrix}\ 24\ &\ \ 0\ \\ \ \ 0\ &\ \ 8\ \end{bmatrix} = \begin{bmatrix} 6 & 0 \\ 0 & 2 \end{bmatrix} \end{align*} $$
7.2.6 The Exponential of a Matrix¶
Whenever we have a linear transformation on finite dimensional vector space $V$, then we can create the standard matrix representation $A_L$. Matrix $A_L$ represents the linear transformation with respect to the canonical basis, so that for every vector $\mathbb{x}\in V$,
$$ L(\mathbf{v}) = A\mathbf{v} $$
A diagonalizable matrix $A_L$ is one that is similar to a diagonal matrix $D$. That means, if we choose a characteristic basis, then the matrix representation of $L$ with respect to the characteristic basis is a diagonal matrix
$$ L\left(\mathbf{v}_D\right) = D\mathbf{v}_D $$
A powerful use of the diagonal representation of a linear transformation emerges when we must apply the linear transformation repeatedly. This occurs when utilizing a Markov Process. In this case we need compute the emergent states of the system and the $k^{th}$ element of the Markov Chain of states with $n\times n$ transition of probabilities matrix $P$ and initial state vector $\mathbf{x}_0$ is given by
$$ \mathbf{x}_k = P\cdot P\cdot P\dots P\cdot P\mathbf{x}_0 = P^kx_0 $$
If the transition of probabilities matrix $P$ is diagonalizable with diagonal representaion $D$, then
$$ \begin{align*} P^2 &= \left(XDX^{-1}\right)\left(XDX^{-1}\right) = \left(XD\right)\left(X^{-1}X\right)\left(DX^{-1}\right) = XD^2X^{-1} \\ \\ &= X\begin{bmatrix} \lambda_1 &\ &\ &\ \\ \ & \lambda_2 &\ &\ &\ \\ \ &\ &\ \ddots &\ \\ \ &\ &\ &\ & \lambda_n \end{bmatrix}\begin{bmatrix} \lambda_1 &\ &\ &\ \\ \ & \lambda_2 &\ &\ &\ \\ \ &\ &\ \ddots &\ \\ \ &\ &\ &\ & \lambda_n \end{bmatrix}X^{-1} \\ \\ &= X\begin{bmatrix} \lambda_1^2 &\ &\ &\ \\ \ & \lambda_2^2 &\ &\ &\ \\ \ &\ &\ \ddots &\ \\ \ &\ &\ &\ & \lambda_n^2 \end{bmatrix}X^{-1} \end{align*} $$
In general we have that for any positive integer $k$
$$ P^k = XD^kX^{-1} = X\begin{bmatrix} \lambda_1^k &\ &\ &\ \\ \ & \lambda_2^k &\ &\ &\ \\ \ &\ &\ \ddots &\ \\ \ &\ &\ &\ & \lambda_n^k \end{bmatrix}X^{-1} $$
In differential equations, stochastic processes, and statistics it is often necessary to analyze systems of linear equations. In this case one often obtains a solution that requires the natural exponential function $f(x)=e^x$. If the matrix representation of a linear mathematical model is diagonalizable then the natural exponential function can still be utilized.
When (and Why) to Raise e to the Power of a Matrix
The matrix exponential of a diagonal matrix can be computed as in the video
$$ \begin{align*} \text{exp}(D) &= e^D := \displaystyle\lim_{n\rightarrow\infty} \left( I + D + \dfrac{1}{2!}D^2 + \dfrac{1}{3!}D^3 + \dfrac{1}{4!}D^4 + \cdots + \dfrac{1}{n!}D^n \right) \\ \\ &= \displaystyle\lim_{n\rightarrow\infty}\displaystyle\sum_{k=1}^n \dfrac{1}{k!}D^k = \displaystyle\lim_{n\rightarrow\infty}\displaystyle\sum_{k=1}^n \dfrac{1}{k!}\begin{bmatrix} \lambda_1^k &\ &\ &\ \\ \ & \lambda_2^k &\ &\ &\ \\ \ &\ &\ \ddots &\ \\ \ &\ &\ &\ & \lambda_n^k \end{bmatrix} \\ \\ &= \displaystyle\lim_{n\rightarrow\infty}\displaystyle\sum_{k=1}^n \begin{bmatrix} \dfrac{1}{k!}\lambda_1^k &\ &\ &\ \\ \ & \dfrac{1}{k!}\lambda_2^k &\ &\ &\ \\ \ &\ &\ \ddots &\ \\ \ &\ &\ &\ & \dfrac{1}{k!}\lambda_n^k \end{bmatrix} \\ \\ &= \begin{bmatrix} \displaystyle\lim_{n\rightarrow\infty}\displaystyle\sum_{k=1}^n\dfrac{1}{k!}\lambda_1^k &\ &\ &\ \\ \ & \displaystyle\lim_{n\rightarrow\infty}\displaystyle\sum_{k=1}^n\dfrac{1}{k!}\lambda_2^k &\ &\ &\ \\ \ &\ &\ \ddots &\ \\ \ &\ &\ &\ & \displaystyle\lim_{n\rightarrow\infty}\displaystyle\sum_{k=1}^n\dfrac{1}{k!}\lambda_n^k \end{bmatrix} \\ \\ &= \begin{bmatrix}\ \ e^{\lambda_1}\ &\ &\ &\ \\ \ \ &\ \ e^{\lambda_2}\ &\ &\ \\ \ \ &\ &\ \dots\ &\ \\ \ \ &\ &\ &\ &\ e^{\lambda_n}\ \end{bmatrix} \end{align*} $$
Likewise the matrix exponential of a diagonalizable matrix $A$ is given by
$$ \begin{align*} \text{exp}(A) &= e^A = \displaystyle\sum_{k=1}^{\infty} \dfrac{1}{k!}A^k \\ \\ &= \displaystyle\sum_{k=1}^{\infty} \dfrac{1}{k!}\left(XDX{-1}\right)^k \\ \\ &= \displaystyle\sum_{k=1}^{\infty} \dfrac{1}{k!}XD^kX{-1} \\ \\ &= X\left(\displaystyle\sum_{k=1}^{\infty} \dfrac{1}{k!}D^k\right)X^{-1} \\ \\ &= Xe^DX^{-1} \end{align*} $$
Example 3¶
For matrix $A$ is example 2, compute $e^A$.
$$ \begin{align*} e^A &= Xe^DX^{-1} = \begin{bmatrix} 2 & 2 \\ 3 & 1 \end{bmatrix}\begin{bmatrix}\ e^6\ &\ 0\ \\ \ 0\ &\ e^2\ \end{bmatrix}\begin{bmatrix} -\frac{1}{4}\ &\ \ \frac{1}{2}\ \\ \ \ \frac{3}{4}\ & -\frac{1}{2}\ \end{bmatrix} \\ \\ &= \begin{bmatrix} 2 & 2 \\ 3 & 1 \end{bmatrix}\begin{bmatrix} -\frac{e^6}{4}\ &\ \ \frac{e^6}{2}\ \\ \ \ \frac{3e^2}{4}\ & -\frac{e^2}{2}\ \end{bmatrix} \\ \\ &= \begin{bmatrix}\ \frac{-e^6 + 3e^2}{2}\ &\ e^6-e^2\ \\ \ \frac{-3e^6 + 3e^2}{4}\ &\ \ \frac{3e^6-e^2}{2}\ \end{bmatrix} \end{align*} $$
Your use of this self-initiated mediated course material is subject to our Creative Commons License 4.0