Math 511: Linear Algebra
2.2 Matrix Algebra
2.2.1 Matrix Algebra¶
$$ \require{color} \definecolor{brightblue}{rgb}{.267, .298, .812} \definecolor{darkblue}{rgb}{0.0, 0.0, 1.0} \definecolor{palepink}{rgb}{1, .73, .8} \definecolor{softmagenta}{rgb}{.99,.34,.86} \definecolor{blueviolet}{rgb}{.537,.192,.937} \definecolor{jonquil}{rgb}{.949,.792,.098} \definecolor{shockingpink}{rgb}{1, 0, .741} \definecolor{royalblue}{rgb}{0, .341, .914} \definecolor{alien}{rgb}{.529,.914,.067} \definecolor{crimson}{rgb}{1, .094, .271} \def\ihat{\mathbf{\hat{\unicode{x0131}}}} \def\jhat{\mathbf{\hat{\unicode{x0237}}}} \def\khat{\mathrm{\hat{k}}} \def\tombstone{\unicode{x220E}} \def\contradiction{\unicode{x2A33}} $$
The rules for the algebra of matrices is similar, but not the same as the rules for the algebra of real numbers. For example, if $a$ and $b$ are real numbers, then
$$a + b = b + a\qquad\text{and}\qquad ab = ba $$
We say that the operations of addition and multiplication of real numbers are commutative. We have already seen that multiplication of matrices is not commutative. If
$$ A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}\qquad\text{and}\qquad B = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}, $$
then we can compute the matrix product $AB$, however the product $BA$ does not exist.
Addition of matrices is commutative because for two $m\times n$ matrices, $A$, $B\ \in\mathbb{R}^{m\times n}$,
$$ A + B = B + A. $$
How can we show that this is a true statement about addition of two $m\times n$ matrices?
We can write out the matrices using patterns and dots,
$$A = \begin{bmatrix} \ a_{11} & \ a_{12} & \ \cdots & \ a_{1n} \\ \ a_{21} & \ a_{22} & \ \cdots &\ a_{2n} \\ \ \ddots & \ \ddots & \ \cdots & \ \ddots \\ \ a_{m1} & \ a_{m2} & \ \cdots & \ a_{mn} \end{bmatrix}\qquad\text{and}\qquad B = \begin{bmatrix} \ b_{11} & \ b_{12} & \ \cdots & \ b_{1n} \\ \ b_{21} & \ b_{22} & \ \cdots &\ b_{2n} \\ \ \ddots & \ \ddots & \ \cdots & \ \ddots \\ \ b_{m1} & \ b_{m2} & \ \cdots & \ b_{mn} \end{bmatrix}$$
so
$$\begin{align*}
A + B &= \begin{bmatrix} \ a_{11} & \ a_{12} & \ \cdots & \ a_{1n} \\ \ a_{21} & \ a_{22} & \ \cdots &\ a_{2n} \\ \ \ddots & \ \ddots & \ \cdots & \ \ddots \\ \ a_{m1} & \ a_{m2} & \ \cdots & \ a_{mn} \end{bmatrix} + \begin{bmatrix} \ b_{11} & \ b_{12} & \ \cdots & \ b_{1n} \\ \ b_{21} & \ b_{22} & \ \cdots &\ b_{2n} \\ \ \ddots & \ \ddots & \ \cdots & \ \ddots \\ \ b_{m1} & \ b_{m2} & \ \cdots & \ b_{mn} \end{bmatrix} \\
\\
&= \begin{bmatrix} \ a_{11} + b_{11} & \ a_{12} + b_{12} & \ \cdots & \ a_{1n} + b_{1n} \\ \ a_{21} + b_{21} & \ a_{22} + b_{22} & \ \cdots &\ a_{2n} + b_{2n} \\ \ \ddots & \ \ddots & \ \cdots & \ \ddots \\ \ a_{m1} + b_{m1} & \ a_{m2} + b_{m2} & \ \cdots & \ a_{mn} + b_{mn} \end{bmatrix} \\
\\
&= \begin{bmatrix} \ b_{11} + a_{11} & \ b_{12} + a_{12} & \ \cdots & \ b_{1n} + a_{1n} \\ \ b_{21} + a_{21} & \ b_{22} + a_{22} & \ \cdots &\ b_{2n} + a_{2n} \\ \ \ddots & \ \ddots & \ \cdots & \ \ddots \\ \ b_{m1} + a_{m1} & \ b_{m2} + a_{m2} & \ \cdots & \ b_{mn} + a_{mn} \end{bmatrix} \\
\\
&= \begin{bmatrix} \ b_{11} & \ b_{12} & \ \cdots & \ b_{1n} \\ \ b_{21} & \ b_{22} & \ \cdots &\ b_{2n} \\ \ \ddots & \ \ddots & \ \cdots & \ \ddots \\ \ b_{m1} & \ b_{m2} & \ \cdots & \ b_{mn} \end{bmatrix} + \begin{bmatrix} \ a_{11} & \ a_{12} & \ \cdots & \ a_{1n} \\ \ a_{21} & \ a_{22} & \ \cdots &\ a_{2n} \\ \ \ddots & \ \ddots & \ \cdots & \ \ddots \\ \ a_{m1} & \ a_{m2} & \ \cdots & \ a_{mn} \end{bmatrix} \\
\\
&= B + A
\end{align*}$$
The problem is that this is just property 1 of Theorem 2.2.1 and there are 8 more that need to be proved!
Theorem 2.2.1¶
Each of the following statements is valid for any scalars $\alpha$ and $\beta$ and for any matrices $A$, $B$, and $C$ for which the indicated operations are defined.
- $A+B = B+A$
- $(A+B) + C = A + (B+C)$
- $(AB)C = A(BC)$
- $A(B+C) = AB + AC$
- $(A+B)C = AC + BC$
- $(\alpha\beta)A = \alpha(\beta A)$
- $\alpha (AB) = (\alpha A)B = A (\alpha B)$
- $(\alpha + \beta)A = \alpha A + \beta A$
- $\alpha (A+B) = \alpha A + \alpha B$
We talked about a more compact notation for denoting the $m\,n$ elements of an $m\times n$ matrix,
$$ A = [a_{ij}]\qquad\text{and}\qquad B = [b_{ij}],\qquad\text{for $1\le i\le m$ and $1\le j\le n$.} $$
Now we can prove property 1 of Theorem 2.2.1, property 1 as follows,
$$ A + B = [a_{ij}] + [b_{ij}] = [a_{ij} + b_{ij}] = [b_{ij} + a_{ij}] = [b_{ij}] + [a_{ij}] = B + A, $$
where $1\le i\le m$ and $1\le j\le n$. This works because both method highlight that the sum of two matrices is just the sum of $m\cdot n$ real numbers, and each of the sums of real numbers commutes. We can show that property 2 of Theorem 2.2.1 is true as well.
If $A$, $B$ and $C$ are $m\times n$ matrices, that if $A,B,C\in\mathbb{R}^{m\times n}$, then
$$\begin{align*}
(A + B) + C &= \left([a_{ij}] + [b_{ij}]\right) + [c_{ij}] \\
\\
&= \left([a_{ij} + b_{ij}]\right) + [c_{ij}] \\
\\
&= [a_{ij} + b_{ij}] + [c_{ij}] \\
\\
&= \left[(a_{ij} + b_{ij}) + c_{ij}\right], \\
\end{align*}$$
where $1\le i\le m$ and $1\le j\le n$. So the sum of three $m\times n$ matrices is just $m\cdot n$ sums of three real numbers. Since addition of real numbers is associative we have that
$$\begin{align*}
\left[(a_{ij} + b_{ij}) + c_{ij}\right] &= \left[a_{ij} + (b_{ij} + c_{ij})\right] \\
\\
&= [a_{ij}] + [b_{ij} + c_{ij}] \\
\\
&= [a_{ij}] + \left([b_{ij} + c_{ij}]\right) \\
\\
&= [a_{ij}] + \left([b_{ij}] + [c_{ij}]\right) \\
\\
&= A + (B + C), \\
\end{align*}$$
where $1\le i\le m$ and $1\le j\le n$. So
$$(A+B)+C = A+(B+C)$$
and addition of $m\times n$ matrices is associative.
This demonstrates the utility of representing a matrix as an array of numbers $A = [a_{ij}]$.
2.2.2 Matrix Multiplication is Associative¶
Skip to the end to avoid this very tedious proof for a much easier one!
Proving property 3 of Theorem 2.2.1 is more tedious. We resort to the ugly version of the definition of matrix multiplication. If $A\in\mathbb{R}^{m\times n}$, $B\in\mathbb{R}^{n\times r}$ and $C\in\mathbb{R}^{r\times s}$, then
$$ A = [a_{ik}],\ B = [b_{kl}]\ \text{and}\ D = AB = [d_{il}], $$
where $1\le i\le m$, $1\le k\le n$ and $1\le l\le r$. Using the definition of matrix multiplication we have that
$$ d_{il} = \displaystyle\sum_{k=1}^n a_{ik}b_{kl}, $$
where $1\le i\le m$ and $1\le l\le r$, and $D$ is a $m\times r$ matrix. So
$$ D = [d_{il}],\ C = [c_{lj}]\ \text{and}\ (AB)C = DC = F = [f_{ij}], $$
where $1\le i\le m$, $1\le l\le r$ and $1\le j\le s$. The definition of matrix multiplication gives us that
$$ f_{ij} = \displaystyle\sum_{l=1}^r d_{il}c_{lj}, $$
On the other hand we can define our multiplication using
$$ B = [b_{kl}],\ C = [c_{lj}]\ \text{and}\ BC = E = [e_{kj}], $$
where $1\le k\le n$, $1\le l\le r$ and $1\le j\le s$. The definition of matrix multiplication gives us that
$$ e_{kj} = \displaystyle\sum_{l=1}^r b_{kl}c_{lj}, $$
where $1\le k\le n$ and $1\le j\le s$. Therefore
$$ A(BC) = AE = G = [g_{kj}], $$
where $1\le k\le n$ and $1\le j\le s$. Using the definition of matrix multiplication again
$$ g_{ij} = \displaystyle\sum_{k=1}^m a_{ik}e_{kj}, $$
where $1\le i\le m$ and $1\le j\le s$. Now we are ready!
$$ \begin{align*} (AB)C &= DC = F = [f_{ij} ] = \left[ \displaystyle\sum_{l=1}^r d_{il}c_{lj} \right] \\ \\ &= \left[ \displaystyle\sum_{l=1}^r \left(\displaystyle\sum_{k=1}^n a_{ik}b_{kl}\right)c_{lj} \right] \\ \\ &= \left[ \displaystyle\sum_{l=1}^r \left(\displaystyle\sum_{k=1}^n a_{ik}b_{kl}c_{lj}\right) \right] \\ \\ &= \displaystyle\sum_{l=1}^r \displaystyle\sum_{k=1}^n a_{ik}b_{kl}c_{lj} \\ \\ &= \displaystyle\sum_{k=1}^n \displaystyle\sum_{l=1}^r a_{ik}b_{kl}c_{lj} \\ \\ &= \left[ \displaystyle\sum_{k=1}^n \left(\displaystyle\sum_{l=1}^r a_{ik}b_{kl}c_{lj}\right) \right] \\ \\ &= \left[ \displaystyle\sum_{k=1}^n a_{ik}\left(\displaystyle\sum_{l=1}^r b_{kl}c_{lj}\right) \right] \\ \\ &= \left[ \displaystyle\sum_{k=1}^n a_{ik}\left(e_{kj}\right) \right] \\ \\ &= \left[ \displaystyle\sum_{k=1}^n a_{ik}e_{kj} \right] = [ g_{ij} ] = G = AE = A(BC). \\ \end{align*} $$
This is very hard to read and very hard to follow.
Let's try again. This time let's remember that in section 2.1.6 we learned that matrix multiplication corresponds to function composition of the linear transformations represented by the matrices.
Proof of Property 3 of Theorem 2.2.1¶
If $A\in\mathbb{R}^{m\times n}$, $B\in\mathbb{R}^{n\times p}$ and $C\in\mathbb{R}^{p\times r}$, then let us define functions
Function | by | Definition |
---|---|---|
$f:\mathbb{R}^n\rightarrow\mathbb{R}^m$ | $\mathbf{y} = f(\mathbf{x}) = A\mathbf{x}$ | |
$g:\mathbb{R}^p\rightarrow\mathbb{R}^n$ | $\mathbf{z} = g(\mathbf{y}) = B\mathbf{y}$ | |
$h:\mathbb{R}^r\rightarrow\mathbb{R}^p$ | $\mathbf{w} = h(\mathbf{z}) = C\mathbf{z}$ |
Then for all $z\in\mathbb{R}^r$ we have
$$ \begin{align*} A(BC)\mathbf{z} &= f\left( (BC)\mathbf{z} \right) \\ \\ &= f\left( (g\circ h)(\mathbf{z}) \right) \\ \\ &= \left( f\circ \left( g\circ h\right) \right)(\mathbf{z}) \\ \\ &= \left( f\circ g\circ h \right)(\mathbf{z}) \\ \\ &= \left(\left(f\circ g\right)\circ h\right)(\mathbf{z}) \\ \\ &= \left( f\circ g \right)\left( C\mathbf{z} \right) \\ \\ &= (AB)C\mathbf{z} \end{align*} $$
Since $A(BC)\mathbf{z} = (AB)C\mathbf{z}$ for all $\mathbf{z}\in\mathbb{R}^r$, they represent the same linear transformation and must be the same matrices. $\tombstone$
This reveals the utility of thinking of matrix multiplication as function composition. We will see uses for all four of our new ways of visualizing matrix multiplication in this chapter.
2.2.3 Multiplication by a Matrix Distributes over Addition¶
We will prove property 4 of theorem 2.2.1 and then the remaining 5 properties are an exercise for you to prove.
If $A\in\mathbb{R}^{m\times n}$ and $B,C\in\mathbb{R}^{n\times r}$, then using the definition of matrix multiplication we have
$$A(B + C) = D = [d_{ik}],$$
where $1\le i\le m$, $1\le k\le r$ and
$$ d_{ik} = \displaystyle\sum_{j=1}^n a_{ij}\left(b_{jk} + c_{jk}\right),$$
where $1\le i\le m$, $1\le j\le n$ and $1\le k\le r$. Furthermore,
$$AB = E = [e_{ik}],\ \text{and}\ e_{ik} = \displaystyle\sum_{j=1}^n a_{ij}b_{jk}.$$
We define the product $AC$ similarly,
$$AC = F = [f_{ik}],\ \text{and}\ f_{ik} = \displaystyle\sum_{j=1}^n a_{ij}c_{jk}.$$
Therefore
$$\begin{align*}
A(B + C) &= D = [d_{ik}] = \left[ \displaystyle\sum_{j=1}^n a_{ij}\left(b_{jk} + c_{jk}\right) \right] \\
\\
&= \left[ \displaystyle\sum_{j=1}^n \left(a_{ij}b_{jk} + a_{ij}c_{jk}\right) \right] \\
\\
&= \left[ \displaystyle\sum_{j=1}^n a_{ij}b_{jk} \right] + \left[ \displaystyle\sum_{j=1}^n a_{ij}c_{jk} \right] \\
\\
&= [e_{ik}] + [f_{ik}] = E + F = AB + AC. \\
\end{align*}$$
Since matrix multiplication is not commutative we must also show that $(A + B)C = AC + BC$. The proof is very similar to the this one.
2.2.4 Positive Powers of Square Matrices¶
If $A\in\mathbb{R}^{n\times n}$, then we say that $A$ is a square matrix; it has the same number of rows and columns. We can multiply a square matrix by itself and use the familiar superscript notation for the number of times that a square matrix is multiplied by itself.
$$A^2 = AA,\ A^k = AA\cdots A\ (k\text{ times})$$
If
$$A = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}$$
then
$$\begin{align*}
A^2 &= \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 2 \\ 2 & 2 \end{bmatrix}, \\
\\
A^3 &= AAA = (AA)A = A^2A = \begin{bmatrix} 2 & 2 \\ 2 & 2 \end{bmatrix}\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 4 & 4 \\ 4 & 4 \end{bmatrix} \\
\\
A^4 &= AAAA = (AAA)A = A^3A = \begin{bmatrix} 4 & 4 \\ 4 & 4 \end{bmatrix}\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 8 & 8 \\ 8 & 8 \end{bmatrix} \\
\\
A^k &= \begin{bmatrix} 2^{k-1} & 2^{k-1} \\ 2^{k-1} & 2^{k-1} \end{bmatrix} = 2^{k-1}\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}\\
\\
A^{k+1} &= A^kA = \left(2^{k-1}\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}\right)\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} \\
\\
&= 2^{k-1}\left(\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}\right) \\
\\
&= 2^{k-1}\begin{bmatrix} 2 & 2 \\ 2 & 2 \end{bmatrix} = 2^k\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} = 2^kA.\\
\end{align*}$$
Notice what happens when $A$ is a diagonal matrix,
$$\begin{align*}
A &= \begin{bmatrix} -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix} \\
\\
A^2 &= \begin{bmatrix} -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix}\begin{bmatrix} -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix} \\
\\
&= \left[\ (-1)\begin{bmatrix} -1 \\ 0 \\ 0 \\ 0 \end{bmatrix}\ \ 0\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}\ \ 2\begin{bmatrix} 0 \\ 0 \\ 2 \\ 0 \end{bmatrix}\ \ 3\begin{bmatrix} 0 \\ 0 \\ 0 \\ 3 \end{bmatrix}\ \right] \\
\\
&= \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 4 & 0 \\ 0 & 0 & 0 & 9 \end{bmatrix} \\
\\
A^3 &= A^2A = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 4 & 0 \\ 0 & 0 & 0 & 9 \end{bmatrix}\begin{bmatrix} -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix} \\
\\
&= \begin{bmatrix} -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 8 & 0 \\ 0 & 0 & 0 & 27 \end{bmatrix} \\
\\
A^k &= \begin{bmatrix} (-1)^k & 0 & 0 & 0 \\ 0 & 0^k & 0 & 0 \\ 0 & 0 & 2^k & 0 \\ 0 & 0 & 0 & 3^k \end{bmatrix} \\
\end{align*}
$$
2.2.5 The Identity Matrix¶
In scalar arithmetic, the multiplicative identity element $1$ has the property,
$$
a\cdot1 = 1\cdot a = a
$$
for every real number $a\in\mathbb{R}$. Likewise, the identity function maps every element of its domain to itself.
$$
\mathscr{I}(x) = x
$$
for every $x\in $ the domain of $\mathscr{I}$. If one composes the identity function $\mathscr{I}$ with another function, the composition has the property,
$$\left(\mathscr{I}\circ f\right)(x) = \left(f\circ\mathscr{I}\right)(x) = f(x),$$
for every $x$ in the domain of function $f$. Similarly, there is a square matrix $I\in\mathbb{R}^{n\times n}$, called the identity matrix, with the property that for every vector $\mathbf{x}\in\mathbb{R}^n$,
$$
I\mathbf{x} = \mathbf{x}.
$$
If one multiplies the identity matrix times any other matrix $A\in\mathbb{R}^{n\times n}$, one obtains
$$
AI = IA = A.
$$
Recall that when we multiply $AI$ we may also view this multiplication as
$$AI = A[\mathbf{i}_1\ \mathbf{i}_2\ \dots\ \mathbf{i}_n] = [A\mathbf{i}_1\ A\mathbf{i}_2\ \dots\ A\mathbf{i}_n] = A = [\mathbf{a}_1\ \mathbf{a}_2\ \dots\ \mathbf{a}_n]$$
So for $1\le j\le n$ we have
$$\begin{align*}
A\mathbf{i}_j &= \mathbf{a}_j \\
\\
[\mathbf{a}_1\ \mathbf{a}_2\ \dots\ \mathbf{a}_n]\begin{bmatrix} i_{1j} \\ i_{2j} \\ \vdots \\ i_{nj} \end{bmatrix} &= \mathbf{a}_j \\
\\
i_{1j}\mathbf{a}_1 + i_{2j}\mathbf{a}_2 + \dots i_{jj}\mathbf{a}_j + \dots + i_{nj}\mathbf{a}_n &= \mathbf{a}_j. \\
\\
\end{align*}$$
So for column $j$ in identity matrix $I$, $i_{kj} = 0$ for $1\le k\le n$ except for $k=j$, $i_{jj}=1$. We can write the $j^{\text{th}}$ column of $I$,
$$\mathbf{i}_j = \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix},\qquad i_{kj} = \left\{\begin{array}{cc} 0,& k\neq j \\ 1,& k=j \end{array}\right. $$
Thus identity matrix
$$ I = [\mathbf{i}_1\ \mathbf{i}_2\ \dots \mathbf{i}_n ] = \begin{bmatrix} 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & 1 & \dots & 0 \\ \ddots & \ddots & \ddots & \ddots & \ddots \\ 0 & 0 & 0 & \dots & 1 \end{bmatrix}$$
We define the symbol (the Kronecker delta)
$$\delta_{ij} = \left\{\begin{array}{cc} 1, & i=j \\ 0, & i\neq j \end{array}\right.$$
In this way we can write the identity matrix as $I = [\delta_{ij}]$.
In calculus, physics and engineering, we often denote the three basis vectors of our 3 dimensional space $\mathbb{R}^3$
$$\ihat = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix},\quad \jhat = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix},\quad\mathbf{\hat{k}} = \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix}.$$
Our three dimensional space $\mathbb{R}^3$ is also called Euclidean $3$-space, and the identity matrix in $\mathbb{R}^3$ is
$$
I_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}
$$
For and positive integer $n$ and Euclidean $n$-space we may need more than 3 elementary basis vectors so we denote these vectors
$$\mathbf{e}_1,\ \mathbf{e}_2,\ \mathbf{e}_3,\ \dots\ ,\mathbf{e}_n.$$
Here
$$\mathbf{e}_j = \begin{bmatrix} 0 \\ 0 \\ \vdots \\ 0 \\ 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix} = [\delta_{j}]\qquad\text{and}\qquad I = [\mathbf{e}_1\ \mathbf{e}_2\ \mathbf{e}_3\ \dots\ \mathbf{e}_n] = [\delta_{ij}].'
$$
2.2.6 The Matrix Transpose¶
The transpose of an $m\times n$ matrix A is an $n\times m$ matrix whose columns are the rows of $A$ and whose rows are the columns of $A$. for example if
$$A = \begin{bmatrix} -2 &\ \ 6 &\ \ 0 \\ \ 6 & -7 & -1 \end{bmatrix},$$
then
$$ A^T = \begin{bmatrix} -2 &\ \ 6 \\ \ \ 6 & -7 \\ \ \ 0 & -1 \end{bmatrix}.$$
Using the notation we developed for matrices
$$A^T = \begin{bmatrix} a_{ij} \end{bmatrix}^T = \begin{bmatrix} a_{ji} \end{bmatrix}.$$
Notice that in the last matrix, the indices are reversed. This indicates that the element in position $(1,2)$ is now in position $(2,1)$, and the element in position $(i,j)$ is now in position $(j,i)$. This relationship defines the operation of Transpose much better than talking about exchanging rows or columns. Clearly the transpose of a scalar is just the scalar in position $(1,1)$ only. Thus we have
$$\begin{align*}
\left(A^T\right)^T &= \left(\begin{bmatrix} a_{ij} \end{bmatrix}^T\right)^T = \left(\begin{bmatrix} a_{ji} \end{bmatrix}\right)^T = \begin{bmatrix} a_{ij} \end{bmatrix} = A \\
\\
A^T + B^T &= \begin{bmatrix} a_{ij} \end{bmatrix}^T + \begin{bmatrix} b_{ij} \end{bmatrix}^T = \begin{bmatrix} a_{ji} \end{bmatrix} + \begin{bmatrix} b_{ji} \end{bmatrix} \\
&= \begin{bmatrix} a_{ji} + b_{ji} \end{bmatrix} = \begin{bmatrix} a_{ij} + b_{ij} \end{bmatrix}^T = \left( A + B \right)^T \\
\\
\left(\alpha A\right)^T &= \left(\alpha\begin{bmatrix} a_{ij} \end{bmatrix}\right)^T = \left(\begin{bmatrix} \alpha a_{ij} \end{bmatrix}\right)^T = \begin{bmatrix} \alpha a_{ji} \end{bmatrix} = \alpha \begin{bmatrix} a_{ji} \end{bmatrix} = \alpha A^T
\end{align*}$$
Determining the transpose of a matrix product is more difficult to write. The first step is to remember that the dot product of two $n\times 1$ column vectors is commutative. If $\mathbf{x}$ and $\mathbf{y}$ are $n\times 1$ vectors, then
$$\begin{align*}
\mathbf{x}\cdot\mathbf{y} &= \mathbf{x}^T\mathbf{y} = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix}\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} \\
\\
&= x_1y_1 + x_2y_2 + \cdots + x_ny_n \\
\\
&= y_1x_1 + y_2x_2 + \cdots + y_nx_n \\
\\
&= \begin{bmatrix} y_1 & y_2 & \cdots & y_n \end{bmatrix}\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} = \mathbf{y}^T\mathbf{x} = \mathbf{y}\cdot\mathbf{x}
\end{align*}$$
Since the transpose of a scalar ($1\times 1$ matrix) is itself we have,
$$\mathbf{x}^T\mathbf{y} = \mathbf{x}\cdot\mathbf{y} = \left(\mathbf{x}\cdot\mathbf{y}\right)^T = \left(\mathbf{x}^T\mathbf{y}\right)^T = \mathbf{y}^T\mathbf{x}.$$
The product of matrix $A$ and $B$ is a matrix $C$ so that the elements of $C$ are given by the $m\times p$ products
$$\begin{bmatrix} c_{ij} \end{bmatrix} = \mathbf{a}^i\mathbf{b}_j.$$
We know that the rows of matrix $A$ are the columns of matrix $A^T$ and the columns of matrix $B$ are the rows of matrix $B^T$.
$$\begin{align*}
A &= \begin{bmatrix} \mathbf{a}^1 \\ \mathbf{a}^2 \\ \vdots \\ \mathbf{a}^m \end{bmatrix} \qquad &A^T &= \begin{bmatrix} \left(\mathbf{a}^T\right)_1 & \left(\mathbf{a}^T\right)_2 & \cdots & \left(\mathbf{a}^T\right)_n \end{bmatrix}
\end{align*}.$$
The equation on the left denotes that matrix $A$ is a column of $m$, row vectors $\mathbf{a}^k$, and $A^T$ is a row of $m$, column vectors $\left(\mathbf{a}^T\right)_k$. Since the $k^{\text{th}}$ row of $A$ is the $k^{\text{th}}$ column of $A^T$, $\ \left(\mathbf{a}^k\right)^T = \left(\mathbf{a}^T\right)_k$. Similarly we
$$\begin{align*}
B &= \begin{bmatrix} \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b_p} \end{bmatrix} \qquad &B^T &= \begin{bmatrix} \left(\mathbf{b}^T\right)^1 \\ \left(\mathbf{b}^T\right)^2 \\ \vdots \\ \left(\mathbf{b}^T\right)^p \end{bmatrix}
\end{align*}$$
The equation on the left denotes the matrix $B$ is a row of $p$, column vectors $\mathbf{b}_k$, and $B^T$ is a column of $p$, row vectors $\left(\mathbf{b}^T\right)^k$. Since the $k^{\text{th}}$ column of $B$ is th $k^{\text{th}}$ row of $B^T$, $\left(\mathbf{b}_k\right)^T = \left(\mathbf{b}^T\right)^k$.
So the transpose of $C$ is
$$C^T = \left[ c_{ij} \right]^T = \left[ \mathbf{a}^i\mathbf{b}_j \right]^T = \left[ \left( \mathbf{a}^i\mathbf{b}_j \right)^T \right] = \left[ \left(\mathbf{b}_j\right)^T\left(\mathbf{a}^i\right)^T \right] = \left[ \left(\mathbf{b}^T\right)^j\left(\mathbf{a}^T\right)_i \right] = B^TA^T.$$
If $A^T = A$, then we say that matrix $A$ is symmetric. If $A^T = A$, then $A$ must have the same number of rows and columns. The following are symmetric matrices.
$$\begin{bmatrix} 3 & \ \ 4 & \ \ 5 \\ 4 & \ \ 4 & -3 \\ 5 & -3 & \ \ 0 \end{bmatrix},\qquad\begin{bmatrix} 10 & 6 \\ 6 & 2 \end{bmatrix},\qquad\begin{bmatrix} 8 & 2 & 6 \\ 2 & 9 & 0 \\ 6 & 0 & 1 \end{bmatrix}.
$$
2.2.7 The Matrix Algebra of Transpose¶
The transpose of and $m\times n$ matrix $A$ is an $n\times m$ matrix whose rows are the columns of $A$ and whose columns are the rows of $A$. We can express this algebraically
$$A^T = [a_{ij}]^T = [a_{ji}].$$
Thus the transpose swaps rows for columns (columns for rows). Ever element in matrix $A$, $a_{ij}$ becomes the element $a_{ji}$ in matrix $A^T$. The algebraic rules for transposes of $m\times n$ matrices $A$ and $B$, and scalar value $\alpha$,
- $\left(A^T\right)^T = \left([a_{ij}]^T\right)^T = \left([a_{ji}]\right)^T = [a_{ji}]^T = [a_{ij}] = A$
- $\left(\alpha A\right)^T = \alpha A^T$
- $\left(A + B\right)^T = \left([a_{ij}] + [b_{ij}] \right)^T = [a_{ij} + b_{ij}]^T = [a_{ji} + b_{ji}] = [a_{ji}] + [b_{ji}] = A^T + B^T$
- $\left(AB\right)^T = B^TA^T$
2.2.8 Symmetric and Antisymmetric Matrices¶
Using the transpose operation and the above properties, in particular
- $ A = \left(A^T\right)^T $
- $ \left(AB\right)^T = B^T A^T $
we can show that a few useful matrices are symmetric $(A^T = A)$ or antisymmetric $(A^T = -A)$.
For instance, the matrix $A^T A$ is symmetric since
$$ \left(A^T A\right)^T = A^T\left(A^T\right)^T = A^T A. $$
Likewise, $AA^T$ is symmetric because
$$ \left(AA^T\right) = \left(A^T\right)^T A^T = AA^T. $$
Additionally, there are a pair of special matrices $A_+$ and $A_-$, defined as
$$ A_+ := \dfrac{A+A^T}{2} \qquad\qquad A_- := \dfrac{A-A^T}{2}. $$
We may use these definitions to write $A$ and $A^T$ as
$$ A = A_+ + A_- \qquad\qquad A^T = A_+ - A_-.$$
In addition, these matrices $A_+$ and $A_-$ are symmetric and antisymmetric, respectively, since $(A+B)^T = A^T + B^T$:
$$\begin{align*}
A_+^T &= \left(\dfrac{A+A^T}{2}\right)^T = \dfrac{A^T+A}{2} = \dfrac{A+A^T}{2} = A_+ \\
\\
A_-^T &= \left(\dfrac{A-A^T}{2}\right)^T = \dfrac{A^T-A}{2} = -\dfrac{A-A^T}{2} = -A_-
\end{align*}
$$
Your use of this self-initiated mediated course material is subject to our Creative Commons License 4.0