11 Working with matrices

11.1 Multiplying matrices

Until now we have limited the matrix multiplication to equations of the form \(\mat{A}\vec{v}\) where \(\mat{A}\) was an \(m \times n\) (rows \(\times\) columns) matrix and \(\vec{v}\) a column vector of length \(n\). We saw that the rules for a matrix-vector multiplication like

\[ \begin{bmatrix} 2 & -3\\ 1 & 4 \\ 3 & 1 \end{bmatrix} \begin{bmatrix} 2 \\ 3 \end{bmatrix}= \begin{bmatrix} -5\\ 14 \\ 10 \end{bmatrix} \]

follow from the equivalence of such matrix equations with system of linear equations. There is a logical extension of these rules to a situation where we multiply a matrix by a matrix. As an example we could ask ourselves how to define the following matrix multiplication:

\[ \begin{bmatrix} 2 & -3\\ 1 & 4 \\ 3 & 1 \end{bmatrix} \begin{bmatrix} 2 & 5 \\ 3 & -1 \end{bmatrix}=? \]

A way to approach the general question of defining matrix multiplication \(\mat{A} \mat{B} = \mat{C}\) is to take vector multiplication as a starting point and to require that \(\mat{C} \vec{v}\) should yield the same result as \(\mat{A}(\mat{B}\vec{v})\) or \(\mat{C}\vec{v} = \mat{A}(\mat{B}\vec{v})\), i.e. we want matrix-matrix multiplication to be associative with matrix-vector multiplication: \((\mat{A} \mat{B}) \vec{v} = \mat{A} (\mat{B} \vec{v})\). The question then is: how do you construct \(\mat{C}\) when you know \(\mat{A}\) and \(\mat{B}\), and you want the associative property to hold? And, under which conditions is such a matrix product defined?

Suppose that \(\mat{B}\) is a \(n \times p\) matrix, and \(\vec{v}\) is a column vector of length \(p\). Then the matrix-vector product \(\vec{w} = \mat{B} \vec{v}\) is defined and yields a column vector \(\vec{w}\) of length \(n\). If we want to be able to multiply this vector by a matrix \(\mat{A}\) then this matrix must have \(n\) columns, otherwise the matrix-vector product is not defined! Suppose then that \(\mat{A}\) is a \(m \times n\) matrix. Then the matrix \(\mat{C} = \mat{A} \mat{B}\) must be a \(m \times p\) matrix that has the same effect on \(\vec{v}\) as the composite matrix-vector multiplication \(\mat{A}(\mat{B}\vec{v})\).

Let’s write \(\mat{B}\) as a matrix of column vectors (each of length \(n\)!):

\[ \mat{B} = \begin{bmatrix} \vec{b}_1 & \vec{b}_2 & \ldots & \vec{b}_p \end{bmatrix} \]

Then we can re-write the product (using the fact that matrix-vector multiplication is distributive)

\[ \mat{A}(\mat{B}\vec{v}) = \mat{A} \left ( \vec{b}_1 v_1 + \vec{b}_2 v_2 + \ldots + \vec{b}_p v_p \right ) = \mat{A}\vec{b}_1 v_1 + \mat{A}\vec{b}_2 v_2 + \ldots + \mat{A}\vec{b}_p v_p \] The latter sum can be re-written as the matrix-vector product

\[ \mat{A}\vec{b}_1 v_1 + \mat{A}\vec{b}_2 v_2 + \ldots + \mat{A}\vec{b}_p v_p = \begin{bmatrix} \mat{A}\vec{b}_1 & \mat{A}\vec{b}_2 & \ldots & \mat{A}\vec{b}_p \end{bmatrix} \vec{v} \] This shows that the matrix product \(\mat{A} \mat{B}\) should equal

\[ \mat{A}\mat{B} = \begin{bmatrix} \mat{A}\vec{b}_1 & \mat{A}\vec{b}_2 & \ldots & \mat{A}\vec{b}_p \end{bmatrix} \] i.e. the columns of \(\mat{A}\mat{B}\) should be equal to the product of the matrix \(\mat{A}\) with each of the corresponding columns of \(\mat{B}\). Furthermore, the number of columns in \(\mat{A}\) should equal the number of rows in \(\mat{B}\), otherwise the products \(\mat{A}\vec{b}_i\) are not defined. In our example, this multiplication rule for matrices works out as

\[ \begin{bmatrix} 2 & -3\\ 1 & 4 \\ 3 & 1 \end{bmatrix} \begin{bmatrix} 2 & 5 \\ 3 & -1 \end{bmatrix}= \begin{bmatrix} \begin{bmatrix} 2 & -3 \\ 1 & 4 \\ 3 & 1 \end{bmatrix} \begin{bmatrix} 2 \\ 3 \end{bmatrix} & \begin{bmatrix} 2 & -3\\ 1 & 4 \\ 3 & 1 \end{bmatrix} \begin{bmatrix} 5 \\ -1 \end{bmatrix} \end{bmatrix} = \begin{bmatrix} -5 & 13 \\ 14 & 1 \\ 10 & 14 \end{bmatrix} \]

Perhaps you have learned a different rule which has the same result. The above demonstration shows that such a rule does not drop out of the blue: it was constructed to let matrix-matrix multiplication be associative with matrix-vector multiplication, because that is a very useful property.

11.2 The transpose of a matrix

The transpose \(\mat{A}^T\) of a square (\(n \times n\)) matrix \(\mat{A}\) is the matrix where rows and columns have been exchanged. The transpose is needed in many matrix calculations, for example when calculating least squares solutions (see Chapter 12).

Example 11.1 (The transpose of a square matrix) We have a square matrix and its transpose:

\[ \mat{A} = \begin{bmatrix} 2 & -3\\ 1 & 4 \end{bmatrix} \qquad \mat{A}^T = \begin{bmatrix} 2 & 1\\ -3 & 4 \end{bmatrix} \]

In such a case you can always calculate the products \(\mat{A} \mat{A}^T\) and \(\mat{A}^T \mat{A}\), because \(\mat{A}\) and \(\mat{A}^T\) are compatible matrices. Furthermore, this matrix product will be symmetric.

\[ \begin{bmatrix} 2 & -3\\ 1 & 4 \end{bmatrix} \begin{bmatrix} 2 & 1\\ -3 & 4 \end{bmatrix} = \begin{bmatrix} 13 & -10\\ -10 & 17 \end{bmatrix} \]

11.3 Identity and permutation matrices

An identity matrix \(\mat{I}_n\) is defined as a square (\(n \times n\)) matrix which has 1’s as its diagonal elements and 0’s as its off-diagonal elements, like:

\[ \mat{I}_n = \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{bmatrix} \]

The reason that it is called an identity matrix is the fact that, having an \(m \times n\) matrix \(\mat{A}\) (where \(n\) could be equal to 1) the multiplications \(\mat{A} \mat{I}_n\) and \(\mat{I}_m \mat{A}\) both yield \(\mat{A}\) as a result:

\[ \mat{A}\mat{I}_n = \mat{A} \qquad \text{and} \qquad \mat{I}_m \mat{A} = \mat{A} \]

It is easy to see why that is the case. Consider the matrix-vector multiplication:

\[ \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \\ 5 \\ 9 \end{bmatrix} \]

\[ \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{bmatrix} \begin{bmatrix} 0 \\ 0 \\ 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 3 \\ 7 \\ 11 \end{bmatrix} \]

We see here that using a column vector with 0 entries and just one 1-entry is a way of “selecting” one of the columns in the matrix, namely the column corresponding to the position of the 1. In the first example, the 1 is in the first position, so we select the first column of the matrix, and in the second example the 1 is in the third position, and we select the third column of the matrix. When using our definition of matrix multiplication \(\mat{A} \mat{B}\), a square matrix \(\mat{B}\) where the first column has a 1 in the first position, and every subsequent column has a 1 in the subsequent position (so, \(\mat{B}\) is an identity matrix) selects every column from the matrix \(\mat{A}\), starting with the first column. The matrix product reconstructs the matrix \(\mat{A}\), column by column:

\[ \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{bmatrix} \]

Similarly, multiplying on the left hand side by a \(1 \times n\) vector with just one 1-entry:

\[ \begin{bmatrix} 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{bmatrix} = \begin{bmatrix} 9 & 10 & 11 & 12 \end{bmatrix} \]

selects the row in the right hand matrix corresponding to the position of the 1 in the left hand matrix. And the multiplication

\[ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{bmatrix} = \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{bmatrix} \]

reconstructs the right hand matrix row by row. Now, knowing that an identity matrix selects and pastes columns (in \(\mat{A} \mat{I}\)) or rows (in \(\mat{I} \mat{A}\)) we may ask: what happens if we swap rows or columns in \(\mat{I}\)? For example in the example above, swapping the first and third row of \(\mat{I}\) we get \(\Bigl[\begin{smallmatrix} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \end{smallmatrix}\Bigr]\). If we multiply our matrix by this we obtain

\[ \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{bmatrix} = \begin{bmatrix} 9 & 10 & 11 & 12 \\ 5 & 6 & 7 & 8 \\ 1 & 2 & 3 & 4 \end{bmatrix} \]

which is identical to the original matrix with the first and third rows swapped! So, multiplying a matrix \(\mat{A}\) on the left side by an identity matrix \(\mat{I}'\) with swapped rows (\(\mat{I}' \mat{A}\)) yields a matrix \(\mat{A}'\) with the same rows swapped. Similarly, multiplication on the right side (\(\mat{A}\mat{I}'\)) with an identity matrix with columns swapped yields a matrix \(\mat{A}'\) with the same columns swapped.

Because of this property, identity matrices with swapped rows or columns (actually it does not matter whether you swap rows \(i\) and \(j\) or columns \(i\) and \(j\): you get the same result for a diagonal matrix) are called permutation matrices, because they make permutations of columns and rows in a matrix.

11.4 Gaussian elimination as a matrix multiplication

We saw that left multiplication of a matrix by a permutation matrix swaps its rows. Swapping rows is one of the operations used in Gaussian or Gauss-Jordan elimination. So we can reproduce a swap of rows in matrix \(\mat{A}\) by left-side multiplication by a permutation matrix \(\mat{I}' \mat{A}\).

We will now show that it is also possible to reproduce the other two operations in Gaussian or Gauss-Jordan elimination: multiplication of a row by a scalar \(a\) and addition of a row to another row by a left-side multiplication by a modified identity matrix.

The first operation, multiplication of a row by a scalar \(a\) is actually quite simple, if you followed the logic in the previous section. Suppose we want to multiply the first row of our example matrix by a factor 5 (which would be the first thing we would do when performing Gaussian elimination). Instead of multiplying by \(\mat{I}\), we would multiply by

\[ \begin{bmatrix} 5 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{bmatrix} = \begin{bmatrix} 5 & 10 & 15 & 20 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{bmatrix} \]

But instead, we would like to keep the first row in its original form and modify the second row by subtracting 5 times the first row from it. This is reproduced by subtracting \(5 \times\) the first row from the second row in the identity matrix, and using the resulting matrix to multiply the original matrix by it. We call the resulting modified identity matrix \(\mat{J}_1\):

\[ \mat{J}_1 = \begin{bmatrix} 1 & 0 & 0 \\ -5 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

We use this matrix to left-multiply the original matrix to obtain:

\[ \mat{J}_1 \mat{A} = \begin{bmatrix} 1 & 0 & 0 \\ -5 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{bmatrix} = \begin{bmatrix} 1 & 2 & 3 & 4 \\ 0 & -4 & -8 & -12 \\ 9 & 10 & 11 & 12 \end{bmatrix} \]

To continue Gaussian elimination, we would want to subtract \(9 \times\) the first row from the third row. This is accomplished by the modified idetity matrix

\[ \mat{J}_2 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -9 & 0 & 1 \end{bmatrix} \] Left-multiplying the result of \(\mat{J}_1\mat{A}\) by that matrix we obtain

\[ \mat{J}_2 (\mat{J}_1 \mat{A}) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -9 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 & 4 \\ 0 & -4 & -8 & -12 \\ 9 & 10 & 11 & 12 \end{bmatrix} = \begin{bmatrix} 1 & 2 & 3 & 4 \\ 0 & -4 & -8 & -12 \\ 0 & -8 & -16 & -24 \end{bmatrix} \] Finally, we want to subtract \(2 \times\) the second row from the third row, which is accomplished by

\[ \mat{J}_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -2 & 1 \end{bmatrix} \] and obtain

\[ \mat{J}_3 (\mat{J}_2 (\mat{J}_1 \mat{A})) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -2 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 & 4 \\ 0 & -4 & -8 & -12 \\ 0 & -8 & -16 & -24 \end{bmatrix} = \begin{bmatrix} 1 & 2 & 3 & 4 \\ 0 & -4 & -8 & -12 \\ 0 & 0 & 0 & 0 \end{bmatrix} \] The entire sequence of eliminations is reproduced by the matrix product

\[ \begin{split} \mat{J} = \mat{J}_3 ( \mat{J}_2 \mat{J}_1 ) &= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -2 & 1 \end{bmatrix} \left( \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -9 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ -5 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \right) \\ &= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -2 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ -5 & 1 & 0 \\ -9 & 0 & 1 \end{bmatrix} \\ &= \begin{bmatrix} 1 & 0 & 0 \\ -5 & 1 & 0 \\ -9 & -2 & 1 \end{bmatrix} \end{split} \]

Check that \(\mat{J} \mat{A}\) yields the same result as \(\mat{J}_3 (\mat{J}_2 (\mat{J}_1 \mat{A}))\).

What these examples show is that every step in Gaussian elimination can be reproduced by starting with an identity matrix and performing the same row-operations on that matrix. Subsequently, this modified identity matrix can be used to left-multiply the original matrix, to obtain the final result of Gaussian, or Gauss-Jordan elimination. To make a complete example of this, we take a previous example (Example 9.2). There we have a matrix

\[ \mat{A}= \begin{bmatrix} 1 & 1 & 1 \\ 1 & 2 & 3 \\ 0 & 1 & 2 \end{bmatrix} \]

To simplify the replication of the elimination process of the identity matrix we use a trick. We glue the identity matrix to the right side (augment it to \(\mat{A}\), is the official term, symbolically expressed as \(\bigl[\mat{A}\bigr\rvert\mat{I}\bigr]\)), and continue until we have the reduced row-echelon form (Gauss-Jordan elimination). This reproduces the effect of all elimination steps on the identity matrix.

\[ \begin{split} \left[\begin{array}{ccc|ccc} 1 & 1 & 1 & 1 & 0 & 0\\ 1 & 2 & 3 & 0 & 1 & 0\\ 0 & 1 & 2 & 0 & 0 & 1 \end{array}\right] \yields \left[\begin{array}{ccc|ccc} 1 & 1 & 1 & 1 & 0 & 0\\ 0 & 1 & 2 & -1 & 1 & 0\\ 0 & 1 & 2 & 0 & 0 & 1 \end{array}\right] \yields \\ \\ \left[\begin{array}{ccc|ccc} 1 & 1 & 1 & 1 & 0 & 0\\ 0 & 1 & 2 & -1 & 1 & 0\\ 0 & 0 & 0 & 1 & -1 & 1 \end{array}\right] \yields \left[\begin{array}{ccc|ccc} 1 & 0 & -1 & 2 & -1 & 0 \\ 0 & 1 & 2 & -1 & 1 & 0 \\ 0 & 0 & 0 & 1 & -1 & 1 \end{array}\right] \end{split} \]

The modified identity matrix now is

\[ \mat{J} = \begin{bmatrix} 2 & -1 & 0 \\ -1 & 1 & 0 \\ 1 & -1 & 1 \end{bmatrix} \]

Making the multiplication indeed yields

\[ \mat{J} \mat{A} = \begin{bmatrix} 2 & -1 & 0 \\ -1 & 1 & 0 \\ 1 & -1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 & 1 \\ 1 & 2 & 3 \\ 0 & 1 & 2 \end{bmatrix} = \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & 2 \\ 0 & 0 & 0 \end{bmatrix} \]

To understand why this works you could regard left-multiplication of a matrix as a recipe for multiplying, adding and swapping rows in the right-hand matrix. So, the first line in \(\mat{J}\) above says that the new first row in the product matrix will be formed by taking \(2 \times\) the first row from \(\mat{A}\) and subtract \(1 \times\) the second row from it. The second row in \(\mat{J}\) says that the second row in the product will be formed by taking \(-1 \times\) the first row from \(\mat{A}\) and adding the second row to it, and finally, the third row in \(\mat{J}\) says that the third row in the product will be formed by adding the first row in \(\mat{A}\) to the third row and subtracting the second row from it.

Of course we can make the same statements about how the right-hand matrix provides a recipe to multiply, add and swap columns in the left-hand matrix to make new columns in the product matrix. Try it out on the example above!

Note, that \(\mat{J}\) allows us to easily construct a formulation for the column space \(\csp{\mat{A}}\). By multiplying both sides of \(\mat{A} \vec{x} = \vec{b}\) by \(\mat{J}\) we obtain:

\[ \begin{align*} \mat{J}\mat{A} \vec{x} & = \mat{J} \vec{b} \\ \begin{bmatrix} 1 & 0 & -1 \\ 0 & 1 & 2 \\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} & = \begin{bmatrix} 2 & -1 & 0 \\ -1 & 1 & 0 \\ 1 & -1 & 1 \end{bmatrix} \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix} \end{align*} \]

The rows of interest are the ones that have only 0’s on the left hand side, i.e. the rows corresponding to the 0-valued) pivots of the free variables. Those are the rows that give us dependencies between the \(b\)’s only (the third row in the example). Here we obtain \(0 = b_1 - b_2 + b_3\), the same expression as in Equation 10.1, allowing us to construct the same formulation again.

11.5 The inverse of a matrix

Having a square matrix \(\mat{A}\), it is sometimes possible to find a matrix \(\mat{A}^{-1}\), called the inverse of \(\mat{A}\) for which it holds that \(\mat{A}^{-1} \mat{A} = \mat{A} \mat{A}^{-1} = \mat{I}\). Having such an inverse is actually a way to find the unique (single point) solution to a matrix equation like \(\mat{A} \vec{x} = \vec{b}\). If we multiply both sides of this equation by \(\mat{A}^{-1}\) we obtain

\[ \begin{align*} \mat{A}^{-1} \mat{A} \vec{x} & = \mat{A}^{-1} \vec{b} \\ \mat{I} \vec{x} & = \mat{A}^{-1} \vec{b} \\ \vec{x} & = \mat{A}^{-1} \vec{b} \end{align*} \]

where we have the column vector of unknown variables on the left hand side and the known matrix \(\mat{A}^{-1}\) and column vector \(\vec{b}\) on the right hand side. So, if solving a matrix equation is that easy, why don’t we always calculate the inverse if we want to solve a system of equations? There are two reasons:

there are many cases when there is no unique solution and no inverse, but where we have infinite solution sets, and
the algorithms for calculating inverse matrices are as complicated as Gaussian elimination in terms of numbers of elementary operations

There are cases where calculating an inverse may pay off when solving systems of linear equations. For example, when you have many systems with the same matrix \(\mat{A}\) on the left hand side and different vectors \(\vec{b}\) on the right hand side. In such a case you can re-use \(\mat{A}^{-1}\) many times.

There are a few ways to calculate an inverse, if it exists. One way is to use Gauss-Jordan elimination (i.e. elimination until a reduced row echelon form is obtained) with an augmented identity matrix. Suppose we have a square, \(3 \times 3\) matrix \[ \begin{bmatrix} 1 & 2 & 3 \\ 0 & 1 & 4 \\ 5 & 6 & 0 \end{bmatrix} \]

To calculate the inverse using Gaussian elimination we write down the table of coefficients in the usual way, augmented with a \(3 \times 3\) identity matrix, and proceed with Gauss-Jordan elimination until the reduced row echelon form is obtained. Note that the inverse only exists if the reduced row echelon form equals the identity matrix, and we don’t know yet whether it exists. This is the procedure:

\[ \begin{split} \left[\begin{array}{ccc|ccc} 1 & 2 & 3 & 1 & 0 & 0 \\ 0 & 1 & 4 & 0 & 1 & 0 \\ 5 & 6 & 0 & 0 & 0 & 1 \end{array}\right] \yields \left[\begin{array}{ccc|ccc} 1 & 2 & 3 & 1 & 0 & 0 \\ 0 & 1 & 4 & 0 & 1 & 0 \\ 0 & -4 & -15 & -5 & 0 & 1 \end{array}\right] \yields \\ \\ \left[\begin{array}{ccc|ccc} 1 & 2 & 3 & 1 & 0 & 0 \\ 0 & 1 & 4 & 0 & 1 & 0 \\ 0 & 0 & 1 & -5 & 4 & 1 \end{array}\right] \end{split} \]

We now have a triangular form, and continue with Gauss-Jordan elimination until we have the reduced row echelon form

\[ \yields \left[\begin{array}{ccc|rrr} 1 & 2 & 0 & 16 & -12 & -3 \\ 0 & 1 & 0 & 20 & -15 & -4 \\ 0 & 0 & 1 & -5 & 4 & 1 \end{array}\right] \yields \left[\begin{array}{ccc|rrr} 1 & 0 & 0 & -24 & 18 & 5 \\ 0 & 1 & 0 & 20 & -15 & -4 \\ 0 & 0 & 1 & -5 & 4 & 1 \end{array}\right] \]

and we are finished. The left side now contains the identity matrix, as it should be when an inverse exists, and the right side is the inverse \(\mat{A}^{-1}\). Let’s check this:

\[ \mat{A}\mat{A}^{-1} = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 1 & 4 \\ 5 & 6 & 0 \end{bmatrix} \begin{bmatrix} -24 & 18 & 5 \\ 20 & -15 & -4 \\ -5 & 4 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

So indeed

\[ \begin{bmatrix} -24 & 18 & 5 \\ 20 & -15 & -4 \\ -5 & 4 & 1 \end{bmatrix}= \begin{bmatrix} 1 & 2 & 3 \\ 0 & 1 & 4 \\ 5 & 6 & 0 \end{bmatrix}^{-1} \]

Why Gaussian elimination does not change the solution set

Suppose we have a matrix equation

\[ \mat{A}\vec{x} = \vec{b} \] which we want to solve for \(\vec{x}\). We doe this using Gaussian elimination. For each of the operations \(i\) performed by Gaussian elimination we can write an equivalent matrix \(\mat{J}_i\) that, when used to left multiply a matrix equation on both sides, yields the same result. In the end, we perform \(n\) elimination steps, or equivalently matrix multiplications to obtain a new matrix equation that is easier to solve.

\[ \mat{J} \mat{A} \vec{x} = \mat{J} \vec{b} \qquad \text{where} \; \mat{J} = \mat{J}_n \mat{J}_{n-1} \ldots \mat{J}_1 \]

A property of these matrices \(\mat{J}_i\) is that they all have an inverse \(\mat{J}^{-1}_i\). Therefore, also the matrix \(\mat{J}\) has an inverse \(\mat{J}^{-1}\). To demonstrate this we give examples:

Rule 1: Exchanging two rows:

The row-exchanging matrix \[ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \] has itself as the the inverse (check this)

\[ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} = \ldots \]

Rule 2: Multiplying a row by a non-zero scalar

The matrix

\[ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 3 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \] has as its inverse

\[ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \frac{1}{3} & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \] Check this!

Rule 3: Adding one row to another row

The matrix

\[ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \] which adds he second row to the third row has as its inverse

\[ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & -1 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \]

Check this! Therefore

\[ \mat{J}^{-1} = \mat{J}_1^{-1} \ldots \mat{J}_{n-1}^{-1} \mat{J}_n^{-1} \]

Suppose that \(\vec{x}_1\) is a solution to the equation \(\mat{A}\vec{x} = \vec{b}\). Then it must also be a solution to \(\mat{J}\mat{A}\vec{x} = \mat{J}\vec{b}\), because, left multiplying both sides by \(\mat{J}^{-1}\) we obtain

\[ \begin{align*} \mat{J}^{-1}\mat{J}\mat{A}\vec{x}_1 &= \mat{J}^{-1}\mat{J}\vec{b} \\ \mat{I}\mat{A}\vec{x}_1 &= \mat{I} \vec{b} \\ \mat{A}\vec{x}_1 &= \vec{b} \end{align*} \] This holds for any vector \(\vec{x}_1\) that is a solution to \(\mat{J}\mat{A}\vec{x} = \mat{J}\vec{b}\).

To conclude: Gaussian elimination does not change the solution to a matrix equation because all its operations correspond to invertible matrices.

11.6 Exercises

Multiplying matrices

Exercise 43.

Calculate the result of the following matrix multiplication \[ \begin{bmatrix} 2 & 3 \\ 4 & 0 \end{bmatrix} \begin{bmatrix} 1 & 2 & 0 \\ 5 & -1 & 0 \end{bmatrix} \]

Exercise 44.

Calculate the result of the following matrix multiplication \[ \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 3 & 2 & 4 \\ 7 & -1 & 1 \end{bmatrix} \]

Exercise 45.

Calculate the result of the following matrix multiplication \[ \begin{bmatrix} 1 & 0 \\ 2 & 3 \end{bmatrix} \begin{bmatrix} a & b \\ c & d \end{bmatrix} \]

Exercise 46.

Calculate the result of the following matrix multiplication \[ \begin{bmatrix} 1 & -2 & 7 \end{bmatrix} \begin{bmatrix} 1 \\ -2 \\ 7 \end{bmatrix} \] If this vector would represent a line segment between the origin and \((1,-2,7)\), to what property of the line segment would this product correspond?

Exercise 47.

Calculate the result of the following matrix multiplication \[ \begin{bmatrix} 1 & -2 & 7 \end{bmatrix} \begin{bmatrix} 3 \\ 5 \\ 1 \end{bmatrix} \]

Exercise 48.

Calculate the result of the following matrix multiplication \[ \begin{bmatrix} 2 & 1 & 4 \\ 0 & -1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 0 & 1 \\ 1 & 0 \end{bmatrix} \]

Exercise 49.

Calculate the result of the following matrix multiplication \[ \begin{bmatrix} 1 & 0 & 1 \\ 1 & 1 & 0 \end{bmatrix} \begin{bmatrix} 2 & 0 \\ 1 & -1 \\ 4 & 1 \end{bmatrix} \]

Inverses of matrices

Exercise 50.

Calculate the inverse of the matrix

\[\begin{equation*} \mat{A} = \begin{bmatrix} 2 & 1 \\ 1 & 3 \end{bmatrix} \end{equation*}\]

Exercise 51.

Calculate the inverse of the matrix

\[\begin{equation*} \mat{B} = \begin{bmatrix} 4 & 2 & 1 \\ 0 & 1 & 2 \\ 1 & 3 & 5 \end{bmatrix} \end{equation*}\]

Exercise 52.

In the box “Gaussian elimination does not change the solution set” we silently assumed that we are allowed to multiply both sides of a matrix equation by the same matrix and expect that the modified equation still holds for solutions of the original equation. This fact can be deduced from the validity of this operation for individual equations. For example, when multiplying \(x^2 = 4\) by \(2\) on both sides then the original solutions \(x = 2, \, x = -2\) still hold, and are in fact the only solutions to the new equation as well. Also, if we multiply both sides by \(0\) the original solutions are still valid, however now the solution set is no longer just \(x = 2, \, x = -2\), but any \(x \in \mathbb{R}\) holds: the solution set has changed.

Suppose we multiply both sides of a matrix equation by a compatible, square matrix \(\mat{0}\) with only \(0\)’s as entries.

a. Does the matrix \(\mat{0}\) have an inverse? If so, what is its inverse? If not, why not?

b. Does multiplying the matrix equation by \(\mat{0}\) in general leave the original solution set unchanged? If not, how does it change that solution set?

Applications

Exercise 53. Representing graphs as matrices

Undirected graphs consist of nodes and links or edges without a direction. Such graphs can be used to represent networks of interacting entities, for example protein-protein interactions. An undirected graph with \(n\) nodes can be represented as an \(n \times n\) matrix, called an adjacency matrix, where every entry \(a_{ij}=1\) if there is a link between nodes \(i\) and \(j\), and \(a_{ij}=0\) otherwise. The diagonal elements (\(a_{ii}\)) are also equal to 0. A small network of 4 nodes is shown below.

a. Write down the adjacency matrix for the network shown in the figure.

b. Argue why such a matrix in general must be symmetric about the diagonal \(a_{ii}\), i.e. the matrix must be equal to its transpose: \(\mat{A} = \mat{A}^T\)

c. Calculate the matrix product \(\mat{A} \cdot \mat{A} = \mat{A}^2\).

d. What do the numbers in the matrix \(\mat{A}^2\) represent?