Math1048 - Linear Algebra I

Download as pdf or txt
Download as pdf or txt
You are on page 1of 107

MATH1048 Linear Algebra I

Lecture notes

Autumn 2018-2019

Dr. Ashot Minasyan


Contents

Notation iii

Chapter 0. Introduction to complex numbers 1


0.1. Definition and operations 1
0.2. Graphical representation of complex numbers 2
0.3. Complex conjugation and reciprocals 5
0.4. Solving polynomial equations over complex numbers 6

Chapter 1. The real n-space 8


1.1. The real n-space 8
1.2. Scalar Product 10
1.3. Norm of a vector 11
1.4. Projections and Cauchy-Schwarz inequality 13
1.5. Equation of a line 15
1.6. Parametric equation of a plane 17
1.7. Cartesian equation of a plane in R3 18
1.8. Vector product 20
1.9. Intersections of planes and lines in R3 21
1.10. Distances in R3 23

Chapter 2. Matrix Algebra 26


2.1. Basic definitions and terminology 26
2.2. Operations with Matrices 26
2.3. The transpose of a matrix 31
2.4. The inverse of a matrix 32
2.5. Powers of a matrix 33

Chapter 3. Systems of Linear Equations 35


3.1. Systems of Equations and Matrices 35
3.2. Row operations and Gaussian elimination 36
3.3. Matrix inverse using row operations 40
3.4. Rank of a matrix 44

Chapter 4. Determinants 47
4.1. Axiomatic definition of determinant 47
4.2. Determinants and invertibility 53
4.3. Calculating determinants using cofactors 53
4.4. Determinant of a product 55
4.5. Inverting a matrix using cofactors 56

Chapter 5. Linear transformations 58


5.1. Basic definitions and properties 58
5.2. Linear transformations of R2 61
5.3. Composition of linear transformations 63
5.4. The inverse of a linear transformation 64
i
ii CONTENTS

Chapter 6. Subspaces of Rn 66
6.1. Definition and basic examples 66
6.2. Null spaces 67
6.3. Linear span 69
6.4. Range and column space 70
6.5. Linear independence 71
6.6. Bases 76
Chapter 7. Eigenvalues, eigenvectors and applications 78
7.1. Eigenvalues and eigenvectors 78
7.2. More examples 82
7.3. Application of eigenvectors in Google’s page ranking algorithm 84
7.4. Symmetric matrices 85
Chapter 8. Orthonormal sets and quadratic forms 90
8.1. Orthogonal and orthonormal sets 90
8.2. Gram-Schmidt orthonormalization process 91
8.3. Orthogonal diagonalization of symmetric matrices 93
8.4. Quadratic forms 95
8.5. Conic sections 98
NOTATION iii

Notation
Symbol Meaning
∅ the empty set, i.e., the set that contains no elements
N the set of natural numbers {1, 2, 3, . . .}
Z the set of integer numbers {. . . − 2, −1, 0, 1, 2, . . .}
Q the set of rational numbers {m/n | m ∈ Z, n ∈ N}
R the set of real numbers
C the set of complex numbers {a + bi | a, b ∈ R}
Rn the real n-space {(x1 , . . . , xn ) | x1 , . . . , xn ∈ R}
Mn (R) the set of all n × n matrices with real entries

∈ ‘belongs to’ or ‘in’. E.g., “m ∈ Z” means “m is an integer”



/ ‘does not belong to’ or ‘is not contained in’. E.g., “π ∈
/ Q” means “π is not a rational number”
| ‘such that’ when describing a set. E.g., “A = {n ∈ N | n < 6}” means that the set A
consists of all natural numbers n such that n < 6, i.e., A = {1, 2, 3, 4, 5}
⊂ or ⊆ ‘is a subset of’ or ‘is contained in’. E.g., “N ⊂ Z” means “N is a subset of Z”
6⊂ or * ‘is not a subset of’ or ‘is not contained in’. E.g., “R * Q” means “R is not contained in Q”
∪ ‘union’ (of sets). If A and B are sets then their union is the set defined by
A ∪ B = {x | x ∈ A or x ∈ B}
∩ ‘intersection’ (of sets). If A and B are sets then their intersection is the set defined by
A ∩ B = {x | x ∈ A and x ∈ B}

∀ ‘for all’. E.g., “∀ n ∈ N” means “for every natural number n”


∃ ‘there exists’. E.g., “∃ x ∈ R” means “there is a real number x”
∃! ‘there exists a unique’. E.g., “∃! x ∈ R such that x3 = 1” means
“there is a unique real number x such that x3 = 1”
=⇒ ‘therefore’. E.g., 2x = 4 =⇒ x = 2
⇐⇒ ‘if and only if’. E.g., x − 1 = 0 ⇐⇒ x = 1

n
X 4
X
‘the sum as j goes from m to n’. E.g., aj = a1 + a2 + a3 + a4
j=m j=1
CHAPTER 0

Introduction to complex numbers

Historically complex numbers have been invented to solve polynomial equations which have no
real solutions, such as x2 + 1 = 0 or 3x8 + 10x4 + 2 = 0. However, it turned out that it suffices to
add to R one “imaginary” number, i (such that i2 = −1), together with all possible combinations
of it, to get the field of complex numbers C, over which every polynomial equation has a solution!
This makes complex numbers extremely useful, not only in mathematics but also in many other
sciences and engineering. The goal of this chapter is to give a quick introduction into the theory
of complex numbers.

0.1. Definition and operations


Definition 0.1 (Complex numbers). To any pair of real numbers a, b ∈ R we can associate the
complex number z = a + bi, where a is said to be the real part of z (notation: a = Re(z)), b is said
to be the imaginary part of z (notation: b = Im(z)), and i is called the imaginary unit. The set of
all complex numbers is called the complex plane and is denoted by C.
√ 5
For example, 1 + i ∈ C, − 3 + i ∈ C, 2 + (−6)i = 2 − 6i ∈ C, e + πi ∈ C, etc., are all complex
7
numbers.
The set of real numbers R can be thought of as a subset of C consisting of all complex numbers
with zero imaginary part: R = {z ∈ C | Im(z) = 0}. Thus we say that a complex number with
zero imaginary part is real. On the other hand, if Re(z) = 0, i.e., z = ib, for some b ∈ R, then z is
said to be purely imaginary.
Note that for a complex number z ∈ C,

z = 0 if and only if Re(z) = 0 and Im(z) = 0.

More generally, for two complex numbers z = a + bi and w = c + di we have

z = w ⇐⇒ Re(z) = Re(w) and Im(z) = Im(w) ⇐⇒ a = c and b = d.

Complex numbers can be added and subtracted according to the following rules: if z = a+bi ∈ C
and w = c + di ∈ C then
• z + w = (a + bi) + (c + di) = (a + c) + (b + d)i ∈ C;
• z − w = (a + bi) − (c + di) = (a − c) + (b − d)i ∈ C.
In other words, to add/subtract two complex numbers we add/subtract their real and imaginary
parts. E.g., (1 + 5i) + (−3 − i) = −2 + 4i.

Note 0.2. The addition of complex numbers satisfies the following standard properties:
• z + 0 = 0 + z = z, for all z ∈ C (Existence of zero);
• if z = a+bi ∈ C then we define −z = −a+(−b)i = −a−bi ∈ C so that z+(−z) = 0 = −z+z
(Existence of additive inverse);
• z + w = w + z, for all z, w ∈ C (Commutativity);
• (z + w) + u = z + (w + u), for any z, w, u ∈ C (Associativity).
1
2 0. INTRODUCTION TO COMPLEX NUMBERS

All of these properties are easy to prove using the definitions. For example, to show that the
commutativity holds, assume that z = a + bi ∈ C and w = c + di ∈ C. Then
z + w = (a + c) + (b + d)i
= (c + a) + (d + b)i (by commutativity of the addition of real numbers)
= w + z.

To multiply complex numbers, we first define i2 = i·i = −1, i.e., we proclaim that the imaginary
unit i is a square root of −1. Then, for z = a + bi ∈ C and w = c + di ∈ C, assuming the the
standard laws of distributivity, associativity and commutativity hold, we must have

z w = (a + bi)(c + di) = ac + adi + bic + bidi = ac + (ad + bc)i + bdi2 =


ac + (ad + bc)i + bd(−1) = (ac − bd) + (ad + bc)i ∈ C.
This justifies the following definition:
Definition 0.3 (Product of complex numbers). For two complex numbers z = a + bi ∈ C and
w = c + di ∈ C, we define their product by
z w = (a + bi)(c + di) = (ac − bd) + (ad + bc)i ∈ C.
For example, (5 − 3i)(−2 − i) = 5(−2) − (−3)(−1) + (5(−1) + (−3)(−2))i = −13 + i.
If z = a + 0i ∈ C is a real number (b = 0) and w = c + di ∈ C then z w = ac + adi, which means
that multiplying complex numbers by real numbers is easy. E.g., 7(4 − 3i) = 28 − 21i.
Note 0.4. The product of complex numbers satisfies the following standard properties:
• if 1 = 1 + 0i ∈ C and z ∈ C then 1 z = z 1 = z (Existence of multiplicative identity);
• z w = w z, for all z, w ∈ C (Commutativity);
• (z w) u = z (w u), for any z, w, u ∈ C (Associativity);
• z(w + u) = z w + z u and (z + w)u = z u + w u, for any z, w, u ∈ C (Distributivity).
Again all of these properties are easy to prove using the corresponding properties of real num-
bers.

0.2. Graphical representation of complex numbers


Complex numbers can be associated with vectors in the Cartesian plane R2 :
z = a + bi ∈ C corresponds to (a, b) ∈ R2 .
This is quite natural and respects the operations of addition and subtraction. I.e., the sum of two
complex numbers is associated to the sum of the vectors corresponding to them.

Thus we can think of C as the (complex) plane.


In this case the x-axis is often called the real axis
R and the y-axis is called the imaginary axis iR.

Definition 0.5 (Modulus of a complex number).


√ The modulus of the complex number z =
a + bi ∈ C, is the real number defined by |z| = a2 + b2 . In other words, |z| is the norm (length)
of the corresponding vector (a, b) ∈ R2 .
0.2. GRAPHICAL REPRESENTATION OF COMPLEX NUMBERS 3

Evidently for any z ∈ C, |z| is always a non-negative real number, and

|z| = 0 if and only if z = 0.


Examples: |i| = 1, |4 − 3i| = 42 + 32 = 5, etc. √
Observe that if z = a + 0i ∈ R is a real number, then |z| = a2 = |a| is simply the standard
modulus of the real number. This means the notion of the complex modulus naturally extends the
notion of the real modulus.
It is easy to see that |z| = | − z|, for all z ∈ C. More generally, we have the following

Lemma 0.6. For any two complex numbers z, w ∈ C we have |z w| = |z| |w|.

Proof. Suppose that z = a + bi ∈ C and w = c + di ∈ C. Then

p
|z w| = |(ac − bd) + (ad + bc)i| = (ac − bd)2 + (ad + bc)2
p p
= a2 c2 − 2acbd + b2 d2 + a2 d2 + 2adbc + b2 c2 = (a2 + b2 )(c2 + d2 ) = |z| |w|.

Definition 0.7 (Argument of a complex number). The argument arg(z), of a non-zero complex
number z = a + bi ∈ C is the angle θ (in radians) measured anti-clockwise from the positive real
axis to the vector (a, b) ∈ R2 . For z = 0 we set arg(z) = 0.

Figure 0.1

To calculate θ = arg(z) for z = a + bi ∈ C, observe that a = r cos θ and b = r sin θ, where


r = |z| (see Figure 0.1). Thus

(0.1) z = r cos θ + i sin θ), where r = |z| and θ = arg(z).

Note 0.8. Since the functions cos θ and sin θ are periodic with period 2π, adding a multiple
of 2π to θ does not change z. Therefore it is convenient to think that arg(z) is only defined up to
adding a multiple of 2π. E.g, arg(i) = π/2 + 2πk, k ∈ Z.

Definition 0.9 (Exponential notation). Given any θ ∈ R we will write eiθ to denote the
complex number cos θ + i sin θ ∈ C.
4 0. INTRODUCTION TO COMPLEX NUMBERS

Since cos2 θ + sin2 θ = 1, on the complex plane


the point eiθ belongs to the unit circle (centered
at the origin, of radius 1).

Equation (0.1) gives a representation of the complex number z in polar coordinates: for any
z ∈ C, we have z = reiθ , where r = |z| and θ = arg(z).
Example√ 0.10. Find the argument and expression in polar coordinates for (a) z = −3i and
(b) w = − 3 + i
(a) | − 3i| = | − 3| |i| = 3, so, in polar coordinates, −3i = 3(cos θ + i sin θ), thus cos θ = 0 and
sin θ = −1. This yields that arg(−3i) = θ = 3π/2 + 2πk, k ∈ Z. This can confirmed by plotting
the vector (0, −3), corresponding to −3i, on the Cartesian plane R2 and observing that the angle
from the positive
√ x-axis√to this vector is 3π/2. Thus, in polar coordinates,
√ −3i = 3ei3π/2 .
√ 3 + i| = 3 + 1 = 2, so, in polar√coordinates, − 3 + i = 2(cos θ + i sin θ), thus
(b) | −
cos θ = − 3/2√and sin θ = 1/2. Therefore arg(− 3 + i) = θ = 5π/6 + 2πk, k ∈ Z. Thus, in polar
coordinates, − 3 + i = 2ei5π/6 .
Note 0.11. In fact, eiθ is the value of the standard exponential function at iθ ∈ C, which can
be shown using the power series expansions of ez , cos x and sin x.
Like cos θ and sin θ, the function eiθ is periodic with period 2π: ei(θ+2πk) = eiθ , for all k ∈ Z and
θ ∈ R. In particular, the expression of a complex number in polar coordinates is never unique,
as the argument can be changed by adding an integer multiple of 2π. E.g., ei2π = e0 = 1.
Since cos π = −1 and sin π = 0, we achieve
(0.2) eiπ = −1 (Euler’s identity).
Theorem 0.12. (Properties of eiθ )
(i) |eiθ | = 1 for any θ ∈ R;
(ii) if θ1 , θ2 ∈ R then eiθ1 eiθ2 = ei(θ1 +θ2 ) .
p
Proof. (i) Given any θ ∈ R, we have |eiθ | = | cos θ + i sin θ| = cos2 θ + sin2 θ = 1.
(ii) Let θ1 , θ2 ∈ R. Then, using Definition 0.3 and the standard formulas for cosine and sine of
θ1 + θ2 , we get:
eiθ1 eiθ2 = (cos θ1 + i sin θ1 )(cos θ2 + i sin θ2 ) = (cos θ1 cos θ2 − sin θ1 sin θ2 )
+ i(sin θ1 cos θ2 + cos θ1 sin θ2 ) = cos(θ1 + θ2 ) + i sin(θ1 + θ2 ) = ei(θ1 +θ2 ) .

Theorem 0.12 shows that polar coordinates may be much more convenient to use for computing
products or powers of complex numbers: if z = r1 eiθ1 and w = r2 eiθ2 , then
z w = r1 eiθ1 r2 eiθ2 = r1 r2 ei(θ1 +θ2 ) .
And for any natural number n ∈ N, we have z n = (r1 eiθ1 )n = r1n einθ1 . For example,
√ !
 4 1 3 √
2eiπ/6 = 24 ei4π/6 = 16ei2π/3 = 16 cos(2π/3) + i sin(2π/3) = 16 − +

i = −8 + 8 3i.
2 2
0.3. COMPLEX CONJUGATION AND RECIPROCALS 5

0.3. Complex conjugation and reciprocals


Definition 0.13 (Complex conjugate). If z = a+bi ∈ C is a complex number, then the complex
conjugate of z is the number z̄ ∈ C defined by z̄ = a − bi. In other words, Re(z̄) = Re(z) and
Im(z̄) = −Im(z).

Graphically, z̄ is obtained from z by applying a reflec-


tion in the real axis:

For example, 2 + 4i = 2 − 4i, −5 − 6i = −5 + 6i, ī = −i, 0̄ = 0, etc. Observe that z̄¯ = z for
all z ∈ C, i.e., taking the complex conjugate of a complex conjugate returns the original complex
number.
Main properties of the complex conjugate are listed below.
Note 0.14. Let z, w ∈ C. Then
(i) if z = reiθ then z̄ = re−iθ (in particular, |z| = |z̄|);
(ii) z = z̄ if and only if z ∈ R (i.e., Im(z) = 0);
(iii) z z̄ = |z|2 ;
(iv) z + w = z̄ + w̄;
(v) z w = z̄ w̄;
Proof. We prove claims (i) and (iii), and leave the rest as exercises. For (i), recall that, by
definition, z = reiθ = r(cos θ + i sin θ). Therefore
z̄ = r(cos θ − i sin θ) = r(cos(−θ) + i sin(−θ)) = re−iθ ,
where we used the facts that cos(−θ) = cos θ and sin(−θ) = − sin θ for every θ ∈ R.
To prove (iii), let z = a+bi ∈ C. Then z̄ = a−bi, so z z̄ = (a+bi)(a−bi) = a2 +b2 +i(−ab+ab) =
a + b2 = |z|2 , as required.
2 
1
Lemma 0.15. If z ∈ C is non-zero, then the complex number z −1 = z̄ is the reciprocal of z.
|z|2
In other words, z z −1 = 1 = z −1 z.
1
Proof. Since z 6= 0 then |z| > 0, so z −1 = z̄ is indeed a complex number (with Re(z −1 ) =
|z|2
Re(z)/|z|2 and Im(z −1 ) = −Im(z)/|z|2 ). Now we can use Note 0.14.(iii) to achieve:
1 z z̄
z z −1 = z 2 z̄ = 2 = 1.
|z| |z|
Finally, z −1 z = z z −1 = 1 by Note 0.4. 
a b
Note that if z = a + bi 6= 0 then |z|2 = a2 + b2 , so z −1 = −i 2 ∈ C.
a2 + b2 a + b2
Example 0.16. Find the reciprocal of z = 8 − 5i ∈ C.
1
Observe that z̄ = 8 + 5i and |z|2 = 82 + 52 = 89. Hence, by Lemma 0.15, z −1 = (8 + 5i) =
89
8 5
+ i.
89 89
6 0. INTRODUCTION TO COMPLEX NUMBERS

Lemma 0.15 allows to define the division of complex numbers.


Definition 0.17 (Ratio of complex numbers). If z, w ∈ C and w 6= 0 then we define the ratio
z/w by the formula z/w = z w−1 , where w−1 is the reciprocal of w.
The next example shows how to calculate complex fractions.
3+i
Example 0.18. Find the real and the imaginary parts of the complex fraction .
−2 − 2i
Let us multiply both numerator and the denominator by −2 − 2i = −2 + 2i:
3+i (3 + i)(−2 + 2i) −6 − 2 + i(6 − 2) −8 + 4i 1
= = = = −1 + i.
−2 − 2i (−2 − 2i)(−2 + 2i) | − 2 − 2i|2 8 2
   
3+i 3+i 1
Thus Re = 1 and Im = .
−2 − 2i −2 − 2i 2
In the polar form taking reciprocals and dividing complex numbers is easier.
1 −iθ 1
Note 0.19. If z = reiθ ∈ C is non-zero then z −1 = e . Thus |z −1 | = and arg(z −1 ) =
r |z|
− arg(z).

Proof. Indeed, if z = reiθ , then |z| = r and z̄ = r̄ eiθ = re−iθ by Note 0.14. Therefore
1 1 1
z −1 = 2
z̄ = 2 (re−iθ ) = e−iθ .
|z| r r

It follows that for z, w ∈ C, such that z = r1 eiθ1 and w = r2 eiθ2 , w 6= 0, we have
z r1 eiθ1 r1
= = ei(θ1 −θ2 ) ,
w r2 eiθ2 r2
z |z|
in particular, = , just like for moduli of real numbers.

w |w|
−3i
Example 0.20. Express the fraction √ in polar coordinates.
− 3+i √
From Example 0.10 we know that −3i = 3 ei3π/2 and − 3 + i = 2 ei5π/6 . Hence
−3i 3ei3π/2 3 3 3
√ = i5π/6 = ei(3π/2−5π/6) = ei4π/6 = ei2π/3 .
− 3+i 2e 2 2 2

0.4. Solving polynomial equations over complex numbers


As we already know, i2 = −1, so z = i is a complex root of the equation z 2 + 1 = 0. This
equation also has the root z = −i, as (−i)2 = i2 = −1. In fact i and −i are the only complex roots
of this equation.
More generally, we can consider a polynomial equation of degree n with complex coefficients:
(0.3) an z n + an−1 z n−1 + · · · + a1 z + a0 = 0, where a0 , a1 , . . . , an ∈ C, and an 6= 0.
The complex number α ∈ C is said to be a root (or a solution) of this equation if an αn +an−1 αn−1 +
· · · + a1 α + a0 = 0. The multiplicity of the root α is the maximal natural number m ∈ N such that
(z − α)m divides the polynomial p(z) = an z n + an−1 z n−1 + · · · + a1 z + a0 .
The main importance of complex numbers comes from the following theorem:
Theorem 0.21 (Fundamental Theorem of Algebra). Any polynomial equation (0.3) has at least
one complex root, and the number distinct complex roots does not exceed n. More precisely, this
equation has exactly n complex roots, counting multiplicities.
0.4. SOLVING POLYNOMIAL EQUATIONS OVER COMPLEX NUMBERS 7

We have already found two complex roots ±i of the equation z 2 + 1 = 0, so by Theorem 0.21,
these are the only complex roots of this equation.
On the other hand, consider the equation z 3 − 3z − 2 = 0. After factorizing, we see that
z 3 − 3z − 2 = (z + 1)2 (z − 2), so z = −1 is a root of multiplicity 2, z = 2 is a root of multiplicity 1,
and the equation has no other roots. Since 2 + 1 = 3, we can see that this is in line with the claim
of Theorem 0.21.
The following observation is an easy consequence of Note 0.14.
Note 0.22. If all coefficients of the equation (0.3) are real (that is, a0 , . . . , an ∈ R), then for
any complex root α ∈ C, its conjugate ᾱ ∈ C is also a root of that equation.
A quadratic equation
(0.4) az 2 + bz + c = 0, with a, b, c ∈ C and a 6= 0,
can be solved using the standard formula:

−b ± D
z= , where D = b2 − 4ac is the discriminant.
2a

Now, D here could be any complex number, so we need to explain how D is calculated. For
simplicity, we will restrict ourselves√to the case when all coefficients are real numbers: a, b, c ∈ R.
Then D ∈ R, so if D ≥ 0, then D ∈ R is the usual square root of D. However, if D < 0
(this
√ ispthe case when√the equation (0.4) has √ no real solutions), then −D > 0, so we can write
D = (−D)(−1) = −D i, where we set −1 = i (as i2 = −1).
Example 0.23. Find the complex roots of the equation z 2 + 10z + 26 = 0.
In this equation a = 1, b = 10 and c = 26. So D = 100 − 104 = √ −4. Since
√ D <√0,√the given
equation has no real solutions. To find the complex roots, we calculate D = −4 = 4 −1 = 2i.
−10 − 2i −10 + 2i
Thus the equation has two complex roots z1 = = −5 − i and z2 = = −5 + i.
2 2
(Observe that the two roots z1 and z2 are complex conjugates of each other, which was to be
expected in view of Note 0.22.)
Example 0.24. Find all complex solutions of the equation z 5 = 1.
Here, in view of Theorem 0.21, we should expect to find 5 different complex roots (one of which
must, of course, be 1, as 15 = 1). It is convenient to use polar coordinates for z: z = reiθ . Then
z 5 = (reiθ )5 = r5 ei5θ . Thus
z 5 = 1 ⇐⇒ r5 ei5θ = 1.
Taking moduli of both sides we get |r5 ||ei5θ | = |1| = 1, which means that |r5 | = 1, as |ei5θ | = 1 by
Theorem 0.12.(i).
√ On the other hand, |r5 | = |r|5 = r5 = 1, as r > 0 (r is a positive real number).
Hence r = 1 = 1, thus it remains to find all θ satisfying ei5θ = 1.
5

Now, ei5θ = cos(5θ) + i sin(5θ) = 1 if and only if cos(5θ) = 1 and sin(5θ) = 0, i.e., 5θ = 2πk,
k ∈ Z. So, θ = 2πk/5, k ∈ Z, and thus z = ei2πk/5 , k ∈ Z. However, the function eit is 2π-periodic,
therefore the latter formula gives only 5 different values of z for k = 0, 1, 2, 3, 4:

z = 1, ei2π/5 , ei4π/5 , ei6π/5 , ei8π/5 .

The diagram to the right shows that these are indeed 5


different complex numbers, forming vertices of a regu-
lar pentagon on the complex plane. By Theorem 0.21
the equation z 5 = 1 (⇔ z 5 − 1 = 0) has no other
complex roots.
CHAPTER 1

The real n-space

In this chapter we will recall the notion of the real n-space, Rn and will study its basic objects,
such as vectors, points, lines and planes. Throughout this chapter n ∈ N will denote some positive
integer, although the last four sections will focus on the case when n = 3.

1.1. The real n-space


Definition 1.1 (The real n-space). The real n-space Rn is defined as the set of all n-tuples
v = (x1 , x2 , . . . , xn ), called vectors, where the numbers x1 , x2 , . . . , xn ∈ R are said to be the
coordinates of the vector v.

For example, when n = 1, we get the real line R1 = R, when n = 2 we get the plane R2 and
when n = 3 we get the real 3-space R3 .
Alternative view: we can also think of Rn as a set of points P = (x1 , . . . , xn ), where
x1 , . . . , xn ∈ R. The vector v = (x1 , . . . , xn ), with the same coordinates as a point P ∈ Rn , is
called the position vector of the point P . You can imagine v as the arrow from the origin to P .
More generally, given any two points P = (x1 , . . . , xn ) and Q = (y1 , . . . , yn ) in Rn , we define
# »
the vector from P to Q as P Q = Q − P = (y1 − x1 , . . . , yn − xn ) ∈ Rn .

Example 1.2. (a) Let v = (1, 2, 3) ∈ R3 . Then v is the position vector of the point (1, 2, 3).
(b) If P = (5, −2, −4), Q = (2, −3, 1) are two points in R3 , then
# »
P Q = Q − P = (2, −3, 1) − (5, −2, −4) = (−3, −1, 5).

Definition 1.3 (Basic operations with vectors). Let a, b ∈ Rn be two vectors, with a =
(a1 , . . . , an ) and b = (b1 , . . . , bn ), and let λ ∈ R be a real number. Then we define
• the sum a + b ∈ Rn to be the vector in Rn given by
a + b = (a1 + b1 , . . . , an + bn );
• the difference a − b ∈ Rn to be the vector in Rn given by
a − b = (a1 − b1 , . . . , an − bn );
• the multiple λ a to be the vector in Rn given by
λ a = (λa1 , . . . , λan ).

For a vector v = (a1 , . . . , an ) ∈ Rn , we will write −v to denote the vector −v = (−a1 , . . . , −an ) =
(−1) v.
Figure 1.1 below describes some geometric rules for drawing sums and differences of vectors.

Definition 1.4 (Same/opposite direction). Two vectors a and b in Rn are said to have
i) the same direction if ∃ λ > 0 (“there exists λ > 0”) such that a = λ b;
ii) opposite directions if ∃ λ < 0 such that a = λ b.

For example, any non-zero vector a ∈ Rn has the same direction with the vector 7 a and opposite
directions with the vector −3 a.
8
1.1. THE REAL n-SPACE 9

Figure 1.1. Rules for vector operations.


10 1. THE REAL n-SPACE

Note 1.5 (Properties of vector operations). Let a, b, c ∈ Rn . Then:


1) a + b = b + a (commutativity of addition);
2) a + (b + c) = (a + b) + c (associativity of addition);
3) let 0 = (0, 0, . . . , 0) ∈ Rn be the zero vector then a + 0 = 0 + a = a (existence of zero);
4) if a = (a1 , a2 , . . . , an ) and −a = (−a1 , −a2 , . . . , −an ) then a + (−a) = (−a) + a = 0
(existence of additive inverse);
5) (µλ) a = µ (λ a) for any λ, µ ∈ R; 
λ (a + b) = λ a + λ b, for all λ ∈ R
6) (two distributivity properties).
(λ + µ) a = λ a + µ a, for all λ, µ ∈ R

Proof. All of these properties can be easily verified by using the corresponding properties of
real numbers. For example, let us prove 5). Suppose that a = (a1 , . . . , an ) ∈ Rn and λ, µ ∈ R.
Then


(µλ) a = (µλ)a1 , . . . , (µλ)an (by Definition 1.3)

= µ(λa1 ), . . . , µ(λan ) (by associativity of the multiplication of real numbers)
= µ (λa1 , . . . , λan ) (by Definition 1.3)
= µ (λ a). (by Definition 1.3)

1.2. Scalar Product


Definition 1.6 (Scalar product). Given two vectors u = (u1 , . . . , un ), v = (v1 , . . . , vn ) in Rn ,
the scalar product of u and v is the real number u · v ∈ R defined by the formula

n
X
u · v = u1 v1 + u2 v2 + · · · + un vn = ui vi
i=1

n
X
(the symbol ci means that we take he sum of ci ’s as i ranges from 1 to n).
i=1

In particular, in R3 , (x, y, z) · (a, b, c) = xa + yb + zc.

Warning: the scalar product of two vectors is a real number, it is not a vector! DO NOT
WRITE (u1 v1 , u2 v2 , . . . , un vn )!

Example 1.7. Let u = (1, −2, 3), v = (4, 6, −7) be two vectors in R3 .

u · v = 1 · 4 + (−2) · 6 + 3 · (−7)
= 4 − 12 − 21 = −29.

Theorem 1.8 (Properties of scalar product). For all u, v, w ∈ Rn the following properties hold.
SP1: u · v = v · u (commutativity);
SP2: u · (v + w) = u · v + u · w (distributivity);
SP3: If λ ∈ R then (λ u) · v = u · (λ v) = λ (u · v);
SP4: If v = 0 then v · v = 0, otherwise v · v > 0 (positive definite).
1.3. NORM OF A VECTOR 11

Proof.
SP1: u · v = u1 v1 + u2 v2 + . . . + un vn (definition of scalar product)
= v1 u1 + v2 u2 + . . . + vn un (commutativity property in R)
=v·u

SP2: v + w = (v1 + w1 , . . . , vn + wn ) (definition of vector sum)


=⇒ u · (v + w) = u1 (v1 + w1 ) + . . . + un (vn + wn ) (definition of scalar product)
= u1 v1 + u1 w1 + . . . + un vn + un wn (distributivity and associativity in R)
= (u1 v1 + . . . + un vn ) + (u1 w1 + . . . + un wn ) (commutativity in R)
=u·v+u·w (definition of scalar product)

SP4: v · v = v12 + v22 + . . . + vn2 (definition of scalar product)


If v = 0, then v1 = v2 = . . . = vn = 0, so v · v = 0 + · · · + 0 = 0.
If v 6= 0, there is at least one i ∈ {1, 2, . . . , n} such that vi 6= 0, hence vi2 > 0. As vj2 ≥ 0 for all
j = 1, . . . , n, the sum v12 + v22 + . . . + vn2 is strictly greater than 0, and hence v · v > 0.
The proof of SP3 is similar and is left as an exercise. 
Note that a · b = 0 is possible without either a = 0, or b = 0. For example, let a = (1, 2, 3)
and b = (2, 1, − 43 ). Then a · b = 2 + 2 + 3(− 43 ) = 0.
Definition 1.9 (Orthogonal vectors). We will say that two vectors a, b ∈ Rn are orthogonal
(perpendicular ) if a · b = 0.
Example 1.10. The standard basis vectors in R3 , sometimes denoted i = (1, 0, 0), j = (0, 1, 0),
and k = (0, 0, 1), are clearly pairwise orthogonal: i · j = j · k = i · k = 0.

1.3. Norm of a vector


Definition 1.11 (Norm). We define the norm (also called magnitude or length) of a vector
v ∈ Rn as

kvk = v · v.
2 , if v = (a, b), then kvk =

Example
√ 1.12.
√ (a) In R a2 + b2 . E.g., for v = (1, 2), we have
kvk = 1 + 4 = 5.
p
(b) In R3 , if v = (v√1 , v2 , v3 ), then kvk = v12 + v22 + v32 . E.g., for v = (−1, 2, −3), we have

kvk = 1 + 4 + 9 = 14.
The concept of norm in Rn recovers the standard notions of length in R2 and R3 .
Note 1.13. Observe that kvk ≥ 0 for any vector v ∈ Rn . Moreover, if v 6= 0 then kvk > 0.
Also: kvk = k − vk, for every vector v = (v1 , . . . , vn ) ∈ Rn , as
q q
kvk = v12 + v22 + . . . + vn2 = (−v1 )2 + (−v2 )2 + . . . + (−vn )2 = k − vk.

Definition 1.14
p #(Distance). Let A, B be two points in Rn . We define the distance between A
# » » # » p
and B as kABk = AB · AB = (B − A) · (B − A).
Note 1.15. The distance between two points A and B in Rn equals the distance between B
# » # » # » # »
and A as kABk = kBAk (since BA = A − B = −(B − A) = −AB).

If a, b are position vectors of points A and


B, then the distance between A and B
# »
equals kABk = kb − ak = ka − bk =
# »
kBAk.
12 1. THE REAL n-SPACE

Example 1.16. Let A = (2, 0, 1, 4) and B = (0, −1, 2, 3) in 4


4 # » √ √ R . The distance between A and
B in R equals kABk = k(−2, −1, 1, −1)k = 4 + 1 + 1 + 1 = 7.
Note 1.17. This definition of distance is consistent with the standard definitions of distances
in R2 and R3 .
Theorem 1.18. Let λ ∈ R and v ∈ Rn . Then kλ vk = |λ| kvk.
Proof. Exercise. (Hint: consider kλ vk2 = (λ v) · (λ v) = . . ..) 
Definition 1.19 (Unit vectors and normalization). A vector u ∈ Rn of norm 1 (kuk = 1) is
said to be a unit vector. Given any non-zero vector a ∈ Rn , we define the normalization of a as
1
â = a.
kak
1
Note 1.20. If a ∈ Rn , a 6= 0, then â is a unit vector, as Theorem 1.18 gives kâk = kak kak = 1
1
(here λ = kak > 0 by Note 1.13).
Recall that, according to Definition 1.4, two vectors a, b have the same direction if there is
λ > 0 such that a = λ b. With this notation, â is the unit vector with the same direction as a.
3
p √
√ Example 1.21.1 Let a = (1, 2, −3) ∈ R . Then kak = (1, 2, −3) · (1, 2, −3) = 1 + 4 + 9 =
14. Hence â = √14 (1, 2, −3) (unit vector with direction (1, 2, −3)).

Example 1.22. The standard basis vectors i, j, k in R3 are unit vectors. More generally, the
vectors e1 , . . . , en are unit vectors, where el = (0, . . . , 0, 1, 0, . . . , 0) and 1 appears in the l-th place,
l ∈ {1, . . . , n}. (In R3 , e1 = i, e2 = j, e3 = k.)
Given two vectors a, b, the condition
ka + bk = ka − bk
is special:
Proposition 1.23. Let a, b ∈ Rn . Then ka + bk = ka − bk is equivalent to a · b = 0.
Proof. Exercise. (Hint: begin with ka + bk = ka − bk, square both sides and manipulate the
resulting equation.) 
We can use Proposition 1.23 to justify our definition of perpendicularity: indeed the following
geometric argument shows that if the norm of a + b is equal to the norm of a − b then a is
# » # »
perpendicular to b (here ∠BAC denotes the angle between the vectors AB and AC at A).
If ka+bk = ka−bk then triangles ∆1 and
∆2 have equal side lengths, so they are
congruent to each other. Hence ∠BAC =
∠DAC, but ∠BAC + ∠DAC = 180◦ .
Therefore ∠BAC = 90◦ , i.e., a is perpen-
dicular to b.

Theorem 1.24 (Generalized Pythagoras’ Theorem). If a, b ∈ Rn are orthogonal then


ka + bk2 = kak2 + kbk2 .
Proof.
p 2
ka + bk2 = (a + b) · (a + b) (by definition of norm)
=a·a+b·a+a·b+b·b (by properties of scalar product)
=a·a+b·b (orthogonal: a · b = b · a = 0)
= kak2 + kbk2 (by definition of norm).

1.4. PROJECTIONS AND CAUCHY-SCHWARZ INEQUALITY 13

1.4. Projections and Cauchy-Schwarz inequality

Let a, b be position vectors of points A, B


in Rn , with B 6= O. Let P be the point
# » # »
on the line through OB such that P A is
perpendicular to b. Clearly we can write
# »
OP = λ b for some λ ∈ R, and we would
like to find λ in terms of a, b.
# » # »
Given that P A and OB are perpendicular, we get:
(a − λ b) · b = 0
a · b − λb · b = 0
a·b
=⇒ λ = (as b 6= 0).
b·b
Thus there is a unique real number λ ∈ R such that the vectors a − λ b and b are orthogonal.
Definition 1.25 (Components and projections). Let a and b be two vectors in Rn , with b 6= 0.
a·b
i) We define the component of a along b as the number λ = .
b·b
ii) The projection of a along b is the vector λ b, where λ is given by i).
Example 1.26. (a) The component of (4, 5, −6) ∈ R3 in the direction of y-axis is 5: this
direction is given by the vector j = e2 = (0, 1, 0), so
(4, 5, −6) · (0, 1, 0) 5
λ= = = 5.
(0, 1, 0) · (0, 1, 0) 1
The projection of (4, 5, −6) in direction of y-axis (along e2 ) is 5 e2 = (0, 5, 0).
(b) Let a = (1, 2, −3), b = (1, 1, 2) in R3 . Then the component of a along b is:
(1, 2, −3) · (1, 1, 2) 1+2−6 1
λ= = =− .
(1, 1, 2) · (1, 1, 2) 1+1+4 2
(Think about this: why is λ negative here?)  
1 1 1
The projection of a along b: λ b = − (1, 1, 2) = − , − , −1 .
 2   2 2
1 1 3 5
Check: a − λb = (1, 2, −3) − − , − , −1 = , , −2 , so
2 2 2 2
 
3 5 3 5
(a − λ b) · b = , , −2 · (1, 1, 2) = + − 4 = 0,
2 2 2 2
as expected.
Theorem 1.27 (Cauchy-Schwarz Inequality). Let a, b be any two vectors in Rn , then
|a · b| ≤ kak kbk.
Proof. Let a, b ∈ Rn and x ∈ R be any real number. Using properties of scalar products (see
Theorem 1.8) we get:

(1.1) kx a + bk2 = (x a + b) · (x a + b) = (x a) · (x a) + (x a) · b + b · (x a) + b · b
↑ ↑
Def. 1.11 SP2, SP1

= x2 (a · a) + 2(x a) · b + b · b = kak2 x2 + 2(a · b)x + kbk2 .


↑ ↑
SP3, SP1 Def. 1.11, SP3
14 1. THE REAL n-SPACE

Denote α = kak2 ∈ R, β = 2 (a · b) ∈ R and γ = kbk2 ∈ R. Then, according to (1.1), we have


(1.2) αx2 + βx + γ ≥ 0 for all x ∈ R,
because kxa + bk2 ≥ 0.
If α = 0, then kak = 0, so a = 0 by Note 1.13, and |a · b| = |0| = 0 = kak kbk, i.e., the
inequality holds in this case.
Thus we can suppose that α 6= 0, in which case αx2 + βx + γ is a quadratic polynomial,
and the inequality (1.2) implies that this polynomial has at most 1 real root. Therefore the
discriminant D = β − 4αγ ≤ 0, which shows that (2 a · b)2 − 4kak2 kbk2 ≤ 0. It follows that
(2 a · b)2 ≤ 4kak2 kbk2 , so (a · b)2 ≤ kak2 kbk2 , hence |a · b| ≤ kak kbk. 
Example 1.28. Let a = (1, 2, 5) and b = (4, 3, 7). Then
√ √ √ √
|a · b| = |4 + 6 + 35| = 45, and kak kbk = 1 + 4 + 25 16 + 9 + 49 = 2220 > 2025 = 45.
Thus |a · b| ≤ kak kbk, as expected from Theorem 1.27.
Corollary 1.29. Let a, u ∈ Rn , u 6= 0. Then the norm of the projection of a along u does
not exceed the norm of a.
a·u
Proof. The projection of a along u is λu, where λ = . Hence
u·u
 
a·u a·u
kuk = |a · u| kuk ≤ kak kuk = kak.

kλuk =
u =

2
u·u kuk kuk2 kuk
↑ ↑ ↑
Thm. 1.18 kuk > 0 Thm. 1.27


Proof. (Alternative, geometric proof of Corollary 1.29.)

By Pythagoras’ theorem
kλuk2 + kbk2 = kak2

Since kbk2 ≥ 0, we see that
kλuk2 ≤ kak2 hence kλuk ≤ kak.

In view of Theorem 1.27, for any two non-zero vectors a, b in Rn we see that

a·b
kak kbk ≤ 1, which is equivalent to

a·b
−1 ≤ ≤ 1.
kak kbk
Therefore there is a unique angle θ ∈ [0, π] such that
a·b
(1.3) cos θ = .
kakkbk
Definition 1.30 (Angle). Given any two non-zero vector a, b ∈ R, we define  θ ∈ [0, π]
 the angle
a·b a · b
between a and b by the equation cos θ = (in other words, θ = cos−1 ).
kakkbk kakkbk
Example 1.31. Compute the angle between the vectors (2, 0, 1, 4) and (−1, 3, −5, 7) in R4 .
By definition of the angle θ between these vectors, θ ∈ [0, π] and
(2, 0, 1, 4) · (−1, 3, −5, 7) −2 − 5 + 28 1
cos(θ) = = √ √ = .
k(2, 0, 1, 4)k k(−1, 3, −5, 7)k 21 84 2
Therefore θ = π/3.
1.5. EQUATION OF A LINE 15

Equation (1.3) is sometimes used to give the geometric formula for the scalar product:

if a, b ∈ Rn , then a · b = kak kbk cos θ,

where θ ∈ [0, π] is the angle between a and b.


Cauchy-Schwarz inequality can also be used to prove the following important fact:

Theorem 1.32 (Triangle Inequality). If a, b ∈ Rn then ka + bk ≤ kak + kbk.

Proof. Both sides are non-negative, so it is sufficient to show that ka + bk2 ≤ (kak + kbk)2 .
To this end, consider:
ka + bk2 = (a + b) · (a + b) (definition of norm)
= a · a + 2a · b + b · b (properties of scalar product)
≤ kak2 + 2kakkbk + kbk2 (by Cauchy-Schwarz inequality – see Thm. 1.27)
= (kak + kbk)2

Therefore ka + bk2 ≤ (kak + kbk)2 . After taking square roots we get the desired inequality. 

Geometric idea:
ka + bk ≤ kak + kbk is equivalent to the
statement:
“The length of a side of a triangle does not
exceed the sum of lengths of the two other
sides”.

1.5. Equation of a line


Given a point P = (p1 , p2 , . . . , pn ) ∈ Rn and a vector a = (a1 , . . . , an ) ∈ Rn we define their sum
as the point S = P + a = (p1 + a1 , p2 + a2 , . . . , pn + an ) ∈ Rn .

# »
S = P + a, thus a = P S.

Definition 1.33 (Line). Let P ∈ Rn be a point and let a ∈ Rn be a non-zero vector. We


define the line L passing through P and parallel to a as the set of all points R ∈ Rn of the form
R = P + λ a, for some parameter λ ∈ R. The vector a is called the direction vector of the line L.
In other words, L = {R ∈ Rn | ∃ λ ∈ R such that R = P + λ a}. We will write

such that

L : R = P + λ a, λ ∈ R,

and we will call this a parametric equation of the line L.

If n = 3 and (x, y, z) are the standard coordinates in R3 , we can also write

L : (x, y, z) = P + λ a, λ ∈ R.

More generally, in Rn the standard coordinates can be denoted (x1 , x2 , . . . , xn ), and the equation
of L becomes
L : (x1 , x2 , . . . , xn ) = P + λ a, λ ∈ R.
| {z }
R
16 1. THE REAL n-SPACE

Example 1.34. (a) In R4 , the line L, passing through the point P = (0, 1, 2, 3) and parallel to
the vector a = (4, 5, 6, 7), has the parametric equation
     
x1 0 4
x2  1 5
L:x3  = 2 + λ 6 , λ ∈ R.
    

x4 3 7
(b) Let us take P = (1, 2) and a = (2, −1) in R2 . The corresponding line L, passing through P
and parallel to a, will then have the parametric equation

L : (x, y) = (1, 2)+λ (2, −1), λ ∈ R.

Now, (x, y) ∈ L is equivalent to


   1 1
x = 1 + 2λ x − 1 = 2λ 2x − 2 = λ .
⇐⇒ ⇐⇒
y =2−λ y − 2 = −λ 2−y =λ
1 1
Join up: 2x − 2 = 2 − y, so y = − 12 x + 5
2 – this is the Cartesian equation of the line L in R2 .
(c) In R3 , let L be the line with parametric equation L : (x, y, z) = (1, −3, 2) + λ (−5, 6, 1),
λ ∈ R. Then

 λ = − 15 (x − 1)
 
 x = 1 − 5λ  1
1 − 5 (x − 1) = z − 2
(x, y, z) ∈ L ⇐⇒ y = −3 + 6λ ⇐⇒ λ = 6 (y + 3)λ ⇐⇒ 1 .
6 (y + 3) = z − 2
z =2+λ λ=z−2
 

This system of equations defines the line L in R3 as the set of all solutions (x, y, z). Thus this
system corresponds to a Cartesian equation of the line L in R3 . (In R4 we would need 3 equations.)
Note 1.35. Any two distinct points A, B ∈ Rn determine a unique line passing through them:
# » # »
L : R = A + λ AB, λ ∈ R, where AB = B − A.
Given two vectors a, b ∈ Rn we shall say that they are parallel if there exists α 6= 0, α ∈ R
such that b = α a. Two lines are parallel if and only if their direction vectors are parallel.
Example 1.36 (Intersection of two lines). In R2 two distinct lines are either parallel or intersect,
but in R3 there are 3 possibilities: parallel, intersecting or skew. (Skew lines are neither parallel
nor intersecting.)
Consider two lines L1 , L2 in R3 , given by their parametric equations:
L1 : (x, y, z) = (5, 1, 3) + λ(2, 1, 2), λ ∈ R
L2 : (x, y, z) = (2, −3, 0) + µ(−1, 2, −1), µ ∈ R
(here λ and µ are two different independent parameters).
It is easy to see that L1 is not parallel to L2 since their direction vectors (2, 1, 2) and (−1, 2, −1)
are not multiples of each other (i.e., there is no α ∈ R such that (2, 1, 2) = α(−1, 2, −1)). In
particular, these lines are distinct.
To check if the intersection exists we look for (x, y, z) ∈ R3 such that
 
 x = 5 + 2λ = 2 − µ  5 + 2λ = 2 − µ (1)
y = 1 + λ = −3 + 2µ =⇒ 1 + λ = −3 + 2µ (2) .
z = 3 + 2λ = −µ 3 + 2λ = −µ (3)
 
1.6. PARAMETRIC EQUATION OF A PLANE 17

Solve for λ, µ: (1)−2×(2) gives


3 = 8 − 5µ
−5 = −5µ ⇒ µ = 1.
Substitute µ = 1 into (1) or (2): 5 + 2λ = 2 − 1, so λ = −2.
Need to check whether these values of λ and µ satisfy equation (3):
3 + 2(−2) = −1 X
Therefore µ = 1 and λ = −2 satisfy all three equations (1)–(3), so the two lines do intersect.
To find the point of intersection, substitute λ = −2 into the equation of L1 or µ = 1 into the
equation of L2 (or both). E.g., taking µ = 1 in the equation of L2 we get (x, y, z) = (2, −3, 0) +
1(−1, 2, −1) = (1, −1, −1). Thus the given lines L1 , L2 intersect at the point (1, −1, −1).
(Check: letting λ = −2 in the equation of L1 , we get (x, y, z) = (5, 1, 3)−2(2, 1, 2) = (1, −1, −1)
– same result as above X.)
Example 1.37. Different parametric equations may still represent the same line in Rn . This
is because there are infinitely possibilities for the choices of the point P and the vector a. For
instance, let
L1 : (x, y, z) = (8, −7, −4) + λ(−3, 3, 9), λ ∈ R
.
L2 : (x, y, z) = (2, −1, 14) + µ(2, −2, −6), µ ∈ R
In this case we immediately see that the direction vector of L2 is a multiple of the direction vector
of L1 :
2
(2, −2, −6) = − (−3, 3, 9).
3
So, L1 and L2 are at least parallel.
Now L1 coincides with L2 if and only if (2, −1, 14) ∈ L1 . To check this, substitute (2, −1, 14)
in the parametric equation of L1 and look for a suitable λ:

 2 = 8 − 3λ
−1 = −7 + 3λ .
14 = −4 + 9λ

The first equation gives λ = 2, and we can see that this value of λ also works for the remaining
two equations (−7 + 3 · 2 = −1 X, and −4 + 9 · 2 = 14 X). Therefore
(2, −1, 14) = (8, −7, −4) + 2(−3, 3, 9) =⇒ (2, −1, 14) ∈ L1 =⇒ L1 = L2 .
Thus the lines L1 and L2 coincide.

1.6. Parametric equation of a plane


Definition 1.38 (Plane). Suppose P ∈ Rn is a point and a, b ∈ Rn are vectors, which are not
parallel and a 6= 0, b 6= 0. The plane Π passing through P and parallel to a, b is defined as the set
of all points R ∈ Rn of the form R = P + λ a + µ b, where λ, µ ∈ R are arbitrary real numbers.
In this case we will say that the plane Π has the parametric equation
Π : R = P + λ a + µ b, λ, µ ∈ R, or
Π : (x1 , . . . , xn ) = P + λ a + µ b, λ, µ ∈ R,
where R = (x1 , . . . , xn ) ∈ Rn .

Π = {R ∈ Rn | ∃ λ, µ ∈ R such that R = P + λ a + µ b}
18 1. THE REAL n-SPACE

Example 1.39. Given P = (0, 1, 2, 3), a = (4, 5, 6, 7) and b = (5, 6, 7, 8) in R4 , the plane Π
passing through P and parallel to the vectors a, b is given by the parametric equation
       
x1 0 4 5
x2  1 5 6
Π: x3  = 2 + λ 6 + µ 7 , λ, µ ∈ R.
      

x4 3 7 8
Note 1.40. In R3 any plane can also be given by a Cartesian equation of the form ax+by +cz =
d, where a, b, c, d ∈ R and (a, b, c) 6= (0, 0, 0).
Example 1.41 (From a parametric equation to a Cartesian equation). Let P = (1, −2, 3),
a = (4, 2, 1), b = (5, 4, 0) in R3 .
       
x 1 4 5
(1.4) Π:  y  =  −2  +λ  2  + µ 4 , λ, µ ∈ R,

z 3 1 0
so 
 x = 1 + 4λ + 5µ (1)
y = −2 + 2λ + 4µ (2) .
z =3+λ (3)

Let’s solve for λ, µ to obtain a Cartesian equation for Π:
(3)⇒ λ = z − 3, (1)⇒ x = 1 + 4(z − 3) + 5µ, so µ = 51 (x + 11 − 4z).
(2)⇒ y = −2 + 2(z − 3) + 45 (x + 11 − 4z). Multiply both sides by 5, and simplify to get

−4x + 5y + 6z = 4 – a Cartesian equation of Π in R3 .


Example 1.42 (From a Cartesian equation to a parametric equation). Let Π be the plane in R3
given by the Cartesian equation −4x + 5y + 6z = 4 (this is the same plane as in Example 1.41). To
find a parametric equation of Π, find any 3 non-collinear points on this plane: e.g., A = (1, −2, 3),
B = (−1, 0, 0) and C = (2, 0, 2).
Now, use these points to construct two vectors parallel to the plane:
# » # »
AB = B − A = (−2, 2, −3) and AC = C − A = (1, 2, −1).
# » # »
Note that AB is not parallel to AC, so these vectors are sufficient for writing down a parametric
# » # » # » # »
equation of the plane Π. (We could equally use AB and BC or BC and AC.)
So, Π has the parametric equation
       
x 1 −2 1
(1.5) Π :  y  = −2 + λ  2 + µ  2 , λ, µ ∈ R.
z 3 −3 −1
Note 1.43. Different parametric equations may still define the same plane (compare equations
(1.4) and (1.5) of the plane Π in the two examples above). This is because we have a lot of freedom
in choosing the non-collinear points A, B, C ∈ Π.

1.7. Cartesian equation of a plane in R3


In R3 , there is an alternative way to define a plane using normal vectors.
Suppose that P ∈ R3 is a point and a, b ∈ R3 are non-zero vectors that are not parallel to each
other. If n ∈ R3 is a vector, which is perpendicular to both a and b (i.e., n · a = 0 = n · b), then
n is perpendicular to the plane Π (i.e., it is perpendicular to every vector in this plane), given by
the parametric equation
Π : R = P + λa + µb, λ, µ ∈ R.
In such a case n is said to be a normal vector to the plane Π.
Why is orthogonality to a, b enough for being orthogonal to the plane Π?
1.7. CARTESIAN EQUATION OF A PLANE IN R3 19

# »
P R = R − P = λ a + µ b, hence
# » SP2
n · P R = n · (λ a + µ b) ==== n · (λ a) + n · (µ b)
SP3
==== λ(n · a) + µ(n · b) = 0.
# »
It is also true that for any point R ∈ R3 if P R is orthogonal to n then R ∈ Π. Let r, p ∈ R3
denote the position vectors of the points R and P respectively.

The above discussion shows that the plane Π


consists of all points R ∈ R3 such that
# »
P R · n = 0, for all R ∈ Π ⇐⇒
# »
(r − p) · n = 0 (as P R = r − p) ⇐⇒
r · n − p · n = 0 ⇐⇒ r · n = p · n.

Therefore, in R3 we can make the following definition:


Definition 1.44 (Cartesian equation of a plane). Given a point P , with the position vector p,
and a vector n 6= 0 in R3 , the plane Π, passing through P and perpendicular to n, is defined as the
set of all points R ∈ R3 such that the position vector r, of R, satisfies
(1.6) r · n = p · n.
Expressing R = (x, y, z) in Cartesian coordinates in R3 , we can re-write equation (1.6) to obtain
the Cartesian equation of Π:
(x, y, z) · n = p · n.
Example 1.45. Let’s again look at the plane Π from Examples 1.41 and 1.42. Thus P =
(1, −2, 3), a = (4, 2, 1) and b = (5, 4, 0).
Take n = (−4, 5, 6). Since n · a = 0 = n · b, n is perpendicular to Π, so, according to
Definition 1.44, the Cartesian equation of Π is

Π : (x, y, z) · (−4, 5, 6) = (1, −2, 3) · (−4, 5, 6),


giving
−4x + 5y + 6z = −4 − 10 + 18 ⇐⇒ −4x + 5y + 6z = 4,
which is the same Cartesian equation as in Exercise 1.41.
What happens in Rn ? Given a point P and a vector n 6= 0 in Rn , the set of all points R
satisfying r · n = p · n is called a hyperplane. This is a subset of “dimension” n − 1, so if n = 3 we
get a plane, if n = 2, we get a line.
Note 1.46. If a plane Π in R3 is given by the equation
(1.7) c1 x + c2 y + c3 z = d,
where c1 , c2 , c3 , d ∈ R are some coefficients, then the vector n = (c1 , c2 , c3 ) is normal to Π.
Indeed, suppose that P = (p1 , p2 , p3 ) ∈ Π is some point. Then c1 p1 + c2 p2 + c3 p3 = d and (1.7)
becomes:
c1 x + c2 y + c3 z = c1 p1 + c2 p2 + c3 p3 ⇐⇒ (x, y, z) · n = p · n.
Example 1.47 (Cartesian equation of a plane from 3 points). Suppose that Π is a plane in
R3 containing the points A = (1, 2, 3), B = (−1, −3, 2) and C = (3, −1, −2). To find a Cartesian
20 1. THE REAL n-SPACE

# » # »
equation of Π we need to a normal vector n = (a, b, c) So, we write n · AB = 0 and n · BC = 0
# » # » # » # »
(n · AC = 0 will hold automatically as AC = AB + BC). This is equivalent to
 
(a, b, c) · (−2, −5, −1) = 0 −2a − 5b − c = 0 (1)
⇐⇒ .
(a, b, c) · (4, 2, −4) = 0 4a + 2b − 4c = 0 (2)
2 equations, 3 variables, so 1 degree of freedom (because a plane only specifies the direction of a
normal vector, but not its norm).
3
2 × (1) + (2) : −8b − 6c = 0, so b = − c.
4
There exist infinitely many possibilities; choose one: c = 8, b = −6. Then from (1) we get that
a = 11, and so n = (11, −6, 8) (only unique up to a multiple).
Thus the Cartesian equation of Π is:
(11, −6, 8) · (x, y, z) = (11, −6, 8) · (1, 2, 3) ⇐⇒ 11x − 6y + 8z = 23.

1.8. Vector product


A good way for finding a normal to a plane, which is parallel to two given vectors a, b in R3 ,
is to use vector products.
   
a1 b1
Definition 1.48 (Vector product). Let a = a2 and b = b2  be two vectors in R3 . The
  
a3 b3
vector product a × b is the vector in R3 with coordinates
 
a2 b3 − a3 b2
a × b = a3 b1 − a1 b3  .
a1 b2 − a2 b1
Theorem 1.49. If a, b ∈ R3 then a × b is orthogonal to both a and b.
Proof. Exercise (it is a straightforward calculation to check that v · a = 0 and v · b = 0, where
v = a × b). 

The vector product of two vectors in R3 is orthogonal


to each of them:

Example 1.50. Find a normal vector to a plane Π which is parallel to the vectors a = (−1, 0, 2)
and b = (3, −4, 5) in R3 .
By Theorem 1.49 we can use a × b as a normal vector.
       
−1 3 0 · 5 − (−4) · 2 8
a × b =  0 × −4 =  2 · 3 − (−1) · 5  = 11
2 5 (−1) · (−4) − 3 · 0 4
 
8
So, we can take n = a × b = 11.
4
1.9. INTERSECTIONS OF PLANES AND LINES IN R3 21

Let us check than n is indeed orthogonal to a and b:


       
−1 8 3 8
a · n =  0 · 11 = −8 + 0 + 8 = 0, b · n = −4 · 11 = 24 − 44 + 20 = 0 X.
2 4 5 4
Warning: the vector product is only defined for vectors in R3 (and not in Rn for n 6= 3)!
Proposition 1.51 (Some properties of the vector product). Let a, b be two vectors in R3 . Then
(1) If a = λ b for some λ ∈ R, then a × b = 0 in R3 .
(2) If a 6= 0, b 6= 0 and a is not parallel to b in R3 , then a × b 6= 0 (in fact, ka × bk equals
the area of a parallelogram with sides a, b).
(3) a × b = −b × a.
Proof. (1) and (3) are straightforward. See Theorem 5.7 in the book by Hirst and Singerman
(p. 165) for the proof of (2). 
Note 1.52. Property (3) from Proposition 1.51 shows that vector products are anti-commu-
tative. Moreover, vector product in R3 is not associative. This means that there are vectors
a, b, c ∈ R3 such that (a × b) × c 6= a × (b × c).

1.9. Intersections of planes and lines in R3


Problem: given two planes in R3 , find their intersection.
In this case there are three possibilities:
i) no intersection: planes are parallel (and distinct);
ii) infinitely many points of intersection: planes coincide (same plane);
iii) infinitely many points of intersection: a line in R3 .
In cases i) or ii) the normal vectors to the two planes will be parallel (i.e., they will be multiples
of each other), which makes them easy to distinguish from case iii).
Example 1.53 (Intersection of two planes in R3 ). Let
Π1 : x − 2y + 3z = 2 (1)
(1.8)
Π2 : −2x + 3y − z = 4 (2)
be two planes in R3 , given by their Cartesian equations. Find their intersection.
From Note 1.46 we know that n1 = (1, −2, 3) is a normal vector to Π1 and n2 = (−2, 3, −1) is
a normal vector to Π2 . Clearly these two vectors are not multiples of each other, so we are in case
iii) above.
To solve system (1.8), observe that 2 × (1) + (2) gives
−y + 5z = 8.
Since there are 3 variables but only 2 equations, we can choose z (or y) freely to be any real
parameter. Say, z = λ, λ ∈ R. Then y = 5z − 8 = 5λ − 8, and substituting this into (1) we get
x = 2 + 2y − 3z
= 2 + 2(5λ − 8) − 3λ
= −14 + 7λ.
Thus, in parametric form, the intersection of Π1 with Π2 has equation
(x, y, z) = (−14, −8, 0) + λ(7, 5, 1), λ ∈ R,
which defines a line L in R3 (as expected).

Problem: given a plane Π and a line L in R3 , find their intersection.


Again there exist three possibilities:
i) no intersection: L is parallel to Π (and is not contained in Π);
22 1. THE REAL n-SPACE

ii) infinitely many points: L is contained in Π;


iii) one point: L intersects Π in a single point.
Cases i) and ii) can be easily distinguished from case iii), by checking whether the normal vector
to Π is orthogonal to the direction vector of L.
Example 1.54 (Intersection between a line and a plane). Find the intersection of the line L
and the plane Π in R3 , with equations
L : (x, y, z) = (−3, 2, 1) + λ(2, −1, −2), λ ∈ R, and Π : x − y + 3z = 4.
Observe: L has the direction vector a = (2, −1, −2) and n = (1, −1, 3) is normal to Π. Since
a · n = −3 6= 0, we must be in case iii).
Thus we need to find λ ∈ R such that the point
   
x −3 + 2λ
y  =  2 − λ 
z 1 − 2λ
satisfies the equation of Π. So, we substitute:
(−3 + 2λ) − (2 − λ) + 3(1 − 2λ) = 4, =⇒ −2 − 3λ = 4, =⇒ λ = −2.

Hence the intersection of L with Π is the single point −3 + 2(−2), 2 − (−2), 1 − 2(−2) = (−7, 4, 5).

Problem: find the intersection of three planes in R3 .


Here the possibilities are
• the intersection is a single point (this
is the generic case);
• the intersection is a line (“star” case);
• the intersection is a plane (when all
three planes coincide);
• the intersection is empty (either at
least two of the planes are parallel, or
they form a “prism”).

Example 1.55 (Intersection of 3 planes in R3 ). Let Π1 , Π2 and Π3 be three planes in R3 given


by their Cartesian equations
Π1 : 2x + y + 2z = 2, Π2 : 2x − y − z = 1, Π3 : x + y + 2z = 1.
To find the intersection of these planes, we need to solve the following system of 3 equations:

 2x + y + 2z = 2 (1) 
(1) − 2 × (3) ⇒ −y − 2z = 0 y=2
2x − y − z = 1 (2) =⇒ .
(2) − 2 × (3) ⇒ −3y − 5z = −1 z = −1
x + y + 2z = 1 (3)

Now, using (3), we see that x = 1.


Thus the three planes intersect at the single point (1, 2, −1). (Don’t forget to check that this
point indeed satisfies all three equations (1), (2), (3).)
Example 1.56 (Intersection of 3 planes in R3 ). (a) Consider the following three planes:

 x+y+z =3 (1)
x + 2y − z = 4 (2) .
3x + 4y + z = 9 (3)

Note that the corresponding normal vectors are (1, 1, 1), (1, 2, −1) and (3, 4, 1). No two of them
are parallel, so no two planes are parallel.
Eliminate x:

(2) − (1) ⇒ y − 2z = 1
there is no solution since 1 6= 0.
(3) − 3 × (1) ⇒ y − 2z = 0
1.10. DISTANCES IN R3 23

hence this is the “prism” case.


(b) Now, change equation (3) to (30 ) : 3x+4y +z = 10. Then 2×(1)+(2) gives 3x+4y +z = 10,
which is the same equation as new equation (30 ). Hence (30 ) is redundant, and the system becomes
 
x+y+z =3 x+y+z =3
⇐⇒ .
x + 2y − z = 4 (2)−(1) y − 2z = 1
The latter system has 2 equations and 3 variables, so we can choose z to be a free variable: z = λ,
λ ∈ R. Then y = 1 + 2z = 1 + 2λ, and from (1) we get x = 3 − y − z = 3 − (1 + 2λ) − λ = 2 − 3λ.
So the intersection of the three planes has the form
(x, y, z) = (2, 1, 0) + λ(−3, 2, 1), λ ∈ R,
which is a parametric equation of a line L in R3 (thus this is the “star” case).

1.10. Distances in R3
Problem: given a plane Π and a point P in R3 , find the distance between P and Π (in other
# »
words, we need to find the minimal possible norm kP Ak, where A runs over all points in Π).

# »
Idea: the distance from P to Π is the norm kP N k,
# »
where N ∈ Π is the point such that P N is perpendic-
ular to Π.

Here is how we can justify the above idea: for any A ∈ Π, the triangle P AN is right-angled, so
# » # » # » # »
kP Ak2 = kP N k2 + kN Ak2 ≥ kP N k2 ,
# » # » # » # »
as kN Ak2 ≥ 0. Therefore kP Ak ≥ kP N k, so kP N k minimizes the distance between P and all
points of Π.
# »
Question: how do we find kP N k?
Method 1: if we already know some n 6= 0, which is normal to Π (e.g., if Π is given by a
# »
Cartesian equation c1 x + c2 y + c3 z = d, then (c1 , c2 , c3 ) = n is normal to Π), then P N is the
# »
projection of P A (for any point A ∈ Π) along n:
# » ! # » # » # »
# » PA · n # » PA · n |P A · n| |P A · n|
PN = n, hence kP N k = knk = · knk = .

n·n n·n knk2 knk

by Thm. 1.18

Method 2: given a normal n to Π, we can find N by finding the intersection of the line
# »
L : R = P + λn, λ ∈ R, with Π. Then we can compute kP N k = kN − P k = . . .
Method 3: if Π is given by a parametric equation Π : R = A + λ a + µ b, λ, µ ∈ R, for some
non-zero, non-parallel vectors a, b in R3 , then compute n = a × b and follow either Method 1 or
Method 2 above.
Example 1.57 (Distance from point to plane, using Method 1). Let Π be the plane in R3 given
by the Cartesian equation 2x + 2y − z = 9, and let P = (3, 4, −4) be a point.
To find the distance from P to Π, choose any point A ∈ Π. For example, A = (0, 3, −3) (A ∈ Π
as 2 · 0 + 2 · 3 − (−3) = 9). Observe that the n = (2, 2, −1) is normal to Π (cf. Note 1.46). Now,
# » # »
P N is the projection of P A along n, so
# »
# » PA · n
P N = λ n, where λ = .
n·n
24 1. THE REAL n-SPACE

We calculate:
# » (−3, −1, 1) · (2, 2, −1) −6 − 2 − 1
P A = A − P = (−3, −1, 1), hence λ = = = −1.
(2, 2, −1) · (2, 2, −1) 4+4+1
# » # » √
Therefore P N = λ n = (−2, −2, 1), and the required distance is kP N k = 4 + 4 + 1 = 3.
# »
Note that λ is negative here, which means that n and P N have opposite directions.
Exercise 1.58. Use Method 2 above to find the distance from Example 1.57.

Problem: find the distance from a point P to a line L in R3 , where L is given by the parametric
equation L : R = A + λ a, λ ∈ R, so that A ∈ R3 is a point and a ∈ R3 is a non-zero vector.

Idea: using Pythagoras’ Theorem, it’s easy to


show that the distance between P and L is equal
# »
to kP N k, where N ∈ L is the point such that
# »
P N is perpendicular to a.

# »
Question: how do we calculate kP N k?
# » !
# » # » # » PA · a # »
Method 1: clearly N A is the projection of P A along a, so N A = a and P N =
a·a
# » !
# » # » # » # » # » # » P A·a
P A − N A. Hence kP N k = kP A − N Ak = P A − a .

a·a
Method 2: consider the plane Π passing through P and perpendicular to L. Then Π will have
the equation
(x, y, z) · a = p · a,
where p is the position vector of P . Clearly N is the intersection of Π with L. So, by finding this
# »
intersection (cf. Example 1.54), we find N and kP N k = kN − P k.
Example 1.59. In R3 , find the distance from the point P = (2, 1, −3) and the line L, given by
the equation L : (x, y, z) = (1, 0, −4) + λ(2, 2, −1), λ ∈ R.
# »
1) Using Method 1: A = (1, 0, −4), a = (2, 2, −1) so P A = A − P = (−1, −1, −1), and
# »
P A · a = −2 − 2 + 1 = −3, a · a = 22 + 22 + 1 = 9, so

# » !  
PA · a 3 1 2 2 1
a = − a = − (2, 2, −1) = − , − , .
a·a 9 3 3 3 3
Thus the distance between P and L is equal to

# » # » !  
PA · a 2 2 1
P A − a = (−1, −1, −1) − − , − ,

a·a 3 3 3


1 1 16 √
  r
1 1 4
= − 3 , − 3 , − 3 = 9 + 9 + 9 = 2.

2) Using Method 2: the plane Π, perpendicular to a, passing through P , will have the equation
Π : (x, y, z) · a = p · a, where a = (2, 2, −1), p = (2, 1, −3).
Therefore Π : 2x + 2y − z = 9 (as p · a = 2 · 2 + 2 · 1 + (−1) · (−3) = 9).
1.10. DISTANCES IN R3 25

No let us find the intersection of L and Π as in Example 1.54. We can re-write the parametric
equation of L as L : (x, y, z) = (1 + 2λ, 2λ, −4 − λ), λ ∈ R. Substituting this in the Cartesian
equation of Π above we get:
1
2(1 + 2λ) + 2(2λ) − (−4 − λ) = 9 ⇐⇒ 9λ + 6 = 9 ⇐⇒ λ = .
3
Plugging in λ = 13 back into the equation of L,we  see that thepoint N , of the intersection of L
2 2 1 5 2 13
and Π, has coordinates N = 1 + , , −3 − = , ,− .
3 3 3 3 3 3
# »
 
5 2 13 1 1 4
Therefore P N = N − P = − 2, − 1, − − (−3) = − , − , − . Hence the distance
3 3 3 3 3 3
from P to L is equal to
# » √
 
1 1 4
kP N k = − , − , −
= 2.
3 3 3
Thus we see that Methods 1 and 2 produce the same answer, as expected.
CHAPTER 2

Matrix Algebra

This chapter will introduce matrices and basic operations with them. Matrices represent one
of the key concepts in the module and we will frequently use them further on.

2.1. Basic definitions and terminology


Definition 2.1 (Matrix). Suppose that m, n ∈ N are  m × n matrix A is
 natural numbers. An
a11 a12 · · · a1n
 a21 a22 · · · a2n 
a rectangular array with m rows and n columns: A =  .. .. .
 
..
 . . . 
am1 am2 · · · amn
• The coefficients a11 , a12 , . . . , amn are called the entries of the matrix. Such entries are
usually numerical (e.g., integers, real numbers, or complex numbers).
• For any i ∈ {1, . . . , m} and j ∈ {1, . . . , n}, the entry aij is said to be the (i, j)-th entry (or
the (i, j)-th element) of A. We will also use (A)ij to denote aij . Thus aij is the entry of
the matrix at the intersection of the i-th row with the j-th column.
• We will write A = (aij ) to specify that aij are the entries of the matrix A.
• The zero matrix of size m × n is the m × n matrix in which all entries are 0.
• The entries a11 , a22 , a33 , . . . are said to be diagonal.
• If m = n then A is a square matrix of size n.
• A square matrix
 A is diagonal ifaij = 0 whenever i 6= j (i.e., all non-diagonal entries are
a11 0 · · · 0
 0 a22 · · · 0 
zero): A =  .. .. .
 
.. . .
 . . . . 
0 0 · · · ann
• An n 
×n diagonal matrix
 with 1’s on the diagonal is called the identity matrix of size n:
1 0 ··· 0
0 1 · · · 0

1 if i = j
In =  .. . Thus In = (δij ), where δij = .
 
.. . . ..  0 if i 6= j
. . . . 
0 0 · · · 1 n×n


 
1 2 √sin(15)
Example 2.2. A = is a 2 × 3 matrix with a12 = 2, a22 = y, a23 = z − w,
x y z−w
etc.
Definition 2.3 (Equal matrices). Two matrices A = (aij ) and B = (bij ) are said to be equal
if they have the same size (m × n) and aij = bij for all i, j.
In the remainder of this chapter and throughout these notes we will assume that the entries in
all the matrices are real numbers, unless specified otherwise.

2.2. Operations with Matrices


Definition 2.4 (Basic operations with matrices). Let A = (aij ) and B = (bij ) be two m × n
matrices (with real entries) and let λ ∈ R be any real number.
26
2.2. OPERATIONS WITH MATRICES 27

1) (Multiplication by a scalar) We define the matrix λA to be the m × n matrix of the form


(λaij ). I.e., (λA)ij = λaij , for all i = 1, . . . , m, j = 1, . . . , n.
2) (Addition and subtraction of matrices) The sum A + B is the m × n matrix C = (cij ),
defined by cij = aij + bij , for all i = 1, . . . , m, j = 1, . . . , n. The difference A − B is the
m × n matrix D = (dij ), defined by dij = aij − bij , for all i = 1, . . . , m, j = 1, . . . , n.
   
1 2 3 0 1 2
Example 2.5. Let A = , and B = be two matrices of size 2 × 3.
−4 5 −6 2 1 0
 
3 6 9
Then 3A = , and
−12 15 −18
       
0 2 4 1 2 3 0 2 4 1 0 −1
A − 2B = A − = − = .
4 2 0 −4 5 −6 4 2 0 −8 3 −6
Note 2.6. The addition and subtraction of matrices is only defined for matrices of the same
size, and the resulting matrix is also of that size.
Theorem 2.7 (Properties of matrix operations). Let A, B, C be matrices of size m × n and let
r, s ∈ R. Then
(i) A + B = B + A (commutativity);
(ii) A + (B + C) = (A + B) + C (associativity);
(iii) if O is the m × n zero matrix then A + O = A = O + A (existence of the additive identity);
(iv) r(sA) = (rs)A;
(v) r(A + B) = rA + rB (distributivity with respect to the multiplication by scalars).
Proof. v) Let A = (aij ), B = (bij ) and D = r(A + B). Then D = (dij ) is an m × n matrix,
and so is rA + rB. Now, note that for all i, j we have
dij = r(aij + bij ) (by Definition 2.4 of multiplication by scalars and matrix addition)
= raij + rbij (by distributivity law in R)
= (rA)ij + (rB)ij (by Definition 2.4).
Hence r(A + B) = D = rA + rB.
The proofs of the remaining properties (i)-(iv) are similar. 
Definition 2.8 (Matrix multiplication). Let A = (aik ) be an m × n matrix and let B = (bkj )
be an n × p matrix, where m, n, p ∈ N are some natural numbers (note that the number n, of
columns in A, is equal to the number of rows in B). We define the product C = A B to be the
m × p matrix C = (cij ) such that
n
X
cij = aik bkj = ai1 b1j + ai2 b2j + ai3 b3j + . . . + ain bnj .
k=1

(For example, c11 = a11 b11 + a12 b21 + . . . + a1n bn1 , c25 = a21 b15 + a22 b25 + . . . + a2n bn5 , etc.)
What does this really mean? Think of A consisting of row vectors a1 , . . . , am ∈ Rn :
 
row vector a1
 
row vector a2
 
 
A=
 ..
.

 . 
 
row vector am
 
1 5
E.g., if A =  −3 4  , then a1 = (1, 5), a2 = (−3, 4) and a3 = (0, −2).
0 −2 3×2
28 2. MATRIX ALGEBRA

And think of B consisting of column vectors b1 , . . . , bp ∈ Rn :


 
 
 

column vector b1

column vector b2

column vector bp
 
 
 
 
B= .
 

..
.
 
 
 
 
 
 

 
7 8 9 10
E.g., if B = , then b1 = (7, 11), b2 = (8, 12) and b3 = (9, 13) and
11 12 13 14 2×4
b4 = (10, 14).
Then  
a1 · b1 a1 · b2 ... a1 · bp
 a2 · b1 a2 · b2 ... a2 · bp 
C = AB =  . .
 
.. .. 
 .. . . 
am · b1 am · b2 . . . am · bp m×p
So, cij = ai · bj (the (i, j)-th entry of A B is equal to the scalar product of the i-th row of A
with the j-th column of B).
Note 2.9. According to Definition 2.8, the product A B is only defined if the number of
columns in A equals the number of rows in B.
 
  1 2
1 2 3
Example 2.10. Let A = and B = 2 0. Note that A has 3 columns and B has
2 1 3
1 1
3 rows, so the product A B is defined (and it must be a 2 × 2 matrix):
 
  1 2    
1 2 3  1·1+2·2+3·1 1·2+2·0+3·1 8 5
AB = 2 0 = = .
2 1 3 2·1+1·2+3·1 2·2+1·0+3·1 7 7
1 1
The product B A is also defined (and has size 3 × 3):
     
1 2   1+4 2+2 3+6 5 4 9
1 2 3
B A = 2 0 = 2 + 0 4 + 0 6 + 0 = 2 4 6 .
2 1 3
1 1 1+2 2+1 3+3 3 3 6
Note 2.11. Example 2.10 shows that in general the product of matrices is not commutative,
as A B 6= B A (in fact, even the sizes of these two products are different!).
  
1 2 3 1 2 3
Example 2.12. a) The product is undefined as 3 6= 2.
2 1 3 2 1 3
   
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ −1 ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 ∗ ∗ ∗
   
2 4 3 1 6 5 7 ∗ ∗ ∗ 2 ∗ ∗ ∗
   
b) If A = (aik ) = ∗ ∗ ∗ ∗ ∗ ∗ ∗ , B = (bkj ) = ∗ ∗ ∗ 5 ∗ ∗ ∗ and
 

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ −3 ∗ ∗ ∗
   
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 4 ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ 7×7 ∗ ∗ ∗ 6 ∗ ∗ ∗ 7×7
C = A B, then (C)34 = (AB)34 = −2 + 0 + 6 + 5 − 18 + 20 + 42 = 53
P7
(because (AB)34 = a3k bk4 = a31 b14 + a32 b24 + a33 b34 + . . . + a37 b74 ).
k=1
2.2. OPERATIONS WITH MATRICES 29

Theorem 2.13 (Properties of matrix product). Let A be an m × n matrix and let B, C be


matrices of appropriate sizes so that the following expressions are defined. Then
(i) A (B + C) = A B + A C (distributivity);
(ii) (A + B) C = A C + B C (distributivity);
(iii) A (B C) = (A B) C (associativity of the product);
(iv) for any r ∈ R we have r(A B) = (r A) B = A (r B);
(v) Im A = A In = A (where Im denotes the m × m identity matrix);
(vi) if On×p denotes the n × p zero matrix, then A On×p = Om×p , and Ol×m A = Ol×n .

Proof. Parts (i), (ii) and (vi) are left as exercises.


Let us start with proving property (iv). Assume that A has size m × n, B has size n × p, and
denote D = A B. Then it is clear that all of the matrices r D = r (A B), (r A) B and A (r B) have
Pn
the same size m × p. So, if D = (dij ), then dij = aik bkj (by Definition 2.8), where A = (aik )
k=1
and B = (bkj ).
The matrix rD has general element

(r D)ij = rdij (by Definition 2.4)


n
X
=r aik bkj (by the definition of D)
k=1
n
X
= r(aik bkj ) (by distributivity in R)
k=1
Xn n
X
= (raik )bkj = aik (rbkj ) (by associativity and commutativity in R).
k=1 k=1
| {z } | {z }
((r A) B)ij (A (r B))ij

By Definition 2.4, rA = (raik ) and rB = (rbkj ), so, according to Definition 2.8, ((r A) B)ij =
n
P n
P
(raik )bkj and (A (r B))ij = aik (rbkj ). Thus the above equation shows the that (r D)ij =
k=1 k=1
((r A) B)ij = (A (r B))ij for all i = 1, . . . , m and j = 1, . . . , p. Since these three matrices are all of
the same size m × p, we can conclude that r (A B) = (r A) B = A  (r B), as required.
1 if i = k
(v) Recall that Im = (δik ) is an m × m matrix, where δik = .
0 if i 6= k
Pm
Therefore (Im A)ij = δik akj = δii aij = aij , as δii = 1 is the only non-zero term among δik .
k=1
Since this is true for all possible i and j, and Im A has the same size m × n, as A, we can conclude
that Im A = A.
The equality A In = A can be proved similarly.
Now we are going to prove property (iii), which is extremely important. Let us first start with
the following general observation concerning the Σ-notation:

Note 2.14. Suppose that γkj are some real numbers, where k = 1, . . . , r and j = 1, . . . , n.
Then
 
r n n r
!
X X X X
 γkj  = γkj ,
k=1 j=1 j=1 k=1

i.e., permuting the order of the two summations does not change the result.
30 2. MATRIX ALGEBRA

Proof of Note 2.14. Using standard properties of real numbers (commutativity and asso-
ciativity), we get:
 
Xr Xn n
X n
X n
X
 γkj =
 γ1j + γ2j + . . . + γrj
k=1 j=1 j=1 j=1 j=1
| {z } | {z } | {z }
k=1 k=2 k=r
= (γ11 + γ12 + . . . + γ1n ) + (γ21 + γ22 + . . . + γ2n ) + . . . + (γr1 + γr2 + . . . + γrn )
| {z } | {z } | {z }
k=1 k=2 k=r
= (γ11 + γ21 + . . . + γr1 ) + (γ12 + γ22 + . . . + γr2 ) + . . . + (γ1n + γ2n + . . . + γrn )
| {z } | {z } | {z }
j=1 j=2 j=n
r r r n r
!
X X X X X
= γk1 + γk2 + . . . + γkn = γkj .
k=1 k=1 k=1 j=1 k=1
| {z } | {z } | {z }
j=1 j=2 j=n


We are now ready to prove the associativity of matrix multiplication (claim (iii) in Theo-
rem 2.13). Let A = (aij ) be an m × n matrix, B = (bjk ) be an n × r matrix, C = (ckl ) be an
n
P
r × s matrix, and D = A B, G = B C. Then D = (dik ) is an m × r matrix, dik = aij bjk , and
j=1
r
P
G = (gjl ) is an n × s matrix, with gjl = bjk ckl .
k=1
Let E = (eil ) = (A B) C = D C and F = (fil ) = A (B C) = A G, then both E and G have the
same size m × s, and
 
Xr r
X Xn
eil = dik ckl =  aij bjk  ckl (by definitions of E = D C and D = A B)
k=1 k=1 j=1
   
r
X X n Xr X n
= aij bjk ckl  = γkj  (by distributivity in R)
 

k=1 j=1
| {z } k=1 j=1
call this γkj
n r n r
! !
X X X X
= γkj = aij bjk ckl (by Note 2.14)
j=1 k=1 j=1 k=1
n r
!
X X
= aij bjk ckl (by distributivity in R, as aij are indepedent of k)
j=1 k=1
Xn
= aij gjl = fil (by definitions of G = B C and F = A G).
j=1

Hence, E = F , i.e. (A B) C = A (B C), and (iii) is proved. 


Example 2.15. Given any v = (v1 , . . . , vn ) ∈ Rn , we can think of it as a row vector , corre-
sponding to the 1 × n 
v = v1 . . . vn .
On the other hand, we can also think of v as a column vector, in which case, it corresponds to the
n × 1 matrix  
v1
v =  ...  .
 

vn
2.3. THE TRANSPOSE OF A MATRIX 31

For example, if v = (1, 2, 3) then the associated row vector is the matrix 1 2 3 1×3 , and the
 
1
associated column vector is  2  .
3 3×1
This correspondence allows to multiply matrices and vectors following rules of matrix multi-
plication. Thus an m × n matrix A can be multiplied with a column vector v ∈ Rn , so that the
result is an m × 1 matrix (corresponding
 to a column vector in Rm ).
1  
−3 0 5
For instance, if v =  2  and A = , then the matrix product A v makes
−2 1 7 2×3
3 3×1
sense and the result is a column vector in R2 :
 
  1  
−3 0 5   12
Av = 2 = .
−2 1 7 21
3

2.3. The transpose of a matrix


Definition 2.16 (Transpose of a matrix). Let A = (aji ) be an m × n matrix. The transpose
of A is the n × m matrix AT defined by (AT )ij = (A)ji = aji , for all i, j, 1 ≤ i ≤ n, 1 ≤ j ≤ m. In
other words, rows of A become columns of AT , columns of A become rows of AT .
If A = AT , then the matrix A is said to be symmetric.
Thus, according to Definition 2.16, if
   
a11 a12 · · · a1n a11 a21 · · · am1
 a21 a22 · · · a2n   a12 a22 · · · am2 
T
A =  .. then A = .
   
.. . . .
.   .. .. . . .. 
 . . . .   . . . . 
am1 am2 · · · amn m×n a1n a2n · · · amn n×m
   T  
 T 1 4 1 2 3 1 4 7
1 2 3
Example 2.17. (a) = 2 5, 4 5 6 = 2 5 8.
4 5 6
3 6 7 8 9 3 6 9
 T  
1 2 1 2
(b) = , so this matrix is symmetric.
2 3 2 3
Note 2.18. A symmetric matrix is necessarily a square matrix.
Theorem 2.19 (Properties of the transpose). Let A, B be matrices of appropriate sizes, so that
all of the following expressions are defined. Then
(i) (AT )T = A;
(ii) (A + B)T = AT + B T ;
(iii) (r A)T = r (AT );
(iv) (A B)T = B T AT ;
(v) (In )T = In .
Proof. (ii), (iii), and (v) are left as exercises. To prove (i), suppose that A = (aij ) is a matrix
of size m × n. Then AT has size n × m, so (AT )T has size m × n, same as A. Also (AT )ij = aji , so
((AT )T )ij = (AT )ji = (A)ij = aij . It follows that (AT )T = A.
(iv) Let A have size m × n, B – size n × p. Then (AB)T has size p × m, which is also the size
of B T AT . Now, according to the definitions of the transpose and matrix product, we have:
n
X
T
((A B) )ij = (A B)ji = ajk bki , where A = (ajk ), B = (bki ).
k=1
32 2. MATRIX ALGEBRA

On the other hand,


Xn Xn Xn
(B T AT )ij = (B T )ik (AT )kj = bki ajk = ajk bki .
k=1 k=1 ↑ k=1
by commutativity in R

Hence the (i, j)-th element of (A B)T equals the (i, j)-th element of B T AT , for all i, j. Therefore
(A B)T = B T AT . 
 
  1 2
1 2 3
Example 2.20. Let A = and B = 2 0 be matrices from Example 2.10. We
2 1 3
1 1
 
  1 2  
8 5 T T 1 2 1
have already seen that A B = . Now, A = 2 1 and B =
  , so
7 7 2 0 1
3 3
 
  1 2    T
T T 1 2 1  8 7 8 5
B A = 2 1 = = = (A B)T .
2 0 1 5 7 7 7
3 3
Thus (A B)T = B T AT , as expected from Theorem 2.19.(iv).

2.4. The inverse of a matrix


Definition 2.21 (Invertible matrix). An n × n matrix A is said to
 be invertibleif there exists
1 0 ··· 0
0 1 · · · 0
an n × n matrix B such that A B = In and B A = In , where In =  .. .. . . ..  is the n × n
 
. . . .
0 0 ··· 1
identity matrix. In this case B is called the inverse of A, and is denoted B = A−1 .
   
1 2 −2 1
Example 2.22. Let A = , then the inverse of A is the matrix B = 3 1 .
  2 − 2 
3 4
      
1 2 −2 1 1 0 −2 1 1 2 1 0
Indeed: A B = 3 1 = = I2 and B A = 3 1 = =
3 4 2 −2 0 1 2 −2 3 4 0 1
I2 .
Note 2.23. Observe that
(1) only a square (n × n) matrix can be invertible, by definition.

1 0
(2) not all square matrices are invertible. E.g., A = has no inverse.
0 0
 
a b
(Indeed: if B = A−1 = then A B = I2 , so
c d
        
1 0 a b 1 0 a b 1 0
= , =⇒ = ,
0 0 c d 0 1 0 0 0 1
which is impossible as 0 6= 1.)
(3) if A is an invertible n × n matrix and B, C are some n × n matrices, then A B = A C (or
B A = C A) implies that B = C.
(Indeed: if A B = A C then A−1 (A B) = A−1 (A C), so, by associativity, (A−1 A) B =
(A−1 A) C, hence In B = In C, which, using the property of In from Theorem 2.13.(v),
yields B = C.)
(4) the identity matrix In is invertible and its inverse is In , since In In = In .
       
1 0 0 0 0 0 0 0
Example 2.24. If A = ,B = ,C = , then A B = = A C, but
0 0 0 1 0 0 0 0
B 6= C. Thus it is indeed necessary to require that A is invertible in Note 2.23.(3).
2.5. POWERS OF A MATRIX 33
 
a b
Theorem 2.25. Let A = be a 2 × 2 matrix with ad − bc 6= 0. Then A is invertible and
c d
 
−1 1 d −b
A =
ad − bc −c a
Proof. Exercise (straightforward check). 
Theorem 2.26 (Properties of invertible matrices). Let A and B be invertible n × n matrices.
Then the following are true:
(a) A has a unique inverse;
(b) A−1 is invertible and (A−1 )−1 = A;
(c) A B is invertible and (A B)−1 = B −1 A−1 ;
(d) AT is invertible and (AT )−1 = (A−1 )T .
Proof. (a) Suppose that C and D are two inverse matrices to A, i.e., A C = In = C A and
A D = In = D A. Then, by properties of In (cf. Theorem 2.13.(v)), we have
(2.1) D (A C) = D In = D and (D A) C = In C = C.
From Theorem 2.13.(iii) (associativity of the matrix product), we know that D (A C) = (D A) C,
hence (2.1) implies that D = C, i.e., A has a unique inverse.
(b) A−1 A = In = A A−1 by definition of A−1 , but this also means that A is the inverse matrix
for A−1 , i.e., (A−1 )−1 = A.
(c) Exercise.
(d) Exercise (use properties of transpose from Theorem 2.19). 

2.5. Powers of a matrix


Definition 2.27 (Power of a matrix). Let A be a square (n × n) matrix and let s be any
integer. The s-th power of A, As , is the n × n matrix defined by the following formula:
A
| A{z
. . . A} if s > 0



 s times

As = In if s = 0 and A is invertible .
−1 −1 −1
A A . . . A if s < 0 (so −s > 0) and A is invertible



 | {z }
−s times

E.g., A3 = A A A, A−5 = A−1 A−1 A−1 A−1 A−1 , A0 = In (if A is invertible, that is if A−1
exists).
Note 2.28. If A is not invertible then As is undefined for s ≤ 0.
Proposition 2.29. Let A be a square matrix and let r, s be integers, so that Ar and As are
defined. Then
(i) Ar+s = Ar As ;
(ii) (Ar )s = Ars .
Proof. Exercise (need to consider different cases depending on the signs of r and s). 
 
1 2
Example 2.30. Let A = , calculate A4 and A−4 .
3 4
    
2 1 2 1 2 7 10
A = AA = = .
3 4 3 4 15 22
    
4 2 2 7 10 7 10 199 290
A = A A = = .
15 22 15 22 435 634

Prop. 2.29.(i)
34 2. MATRIX ALGEBRA

Observe that A is invertible, by Theorem 2.25, as 1 · 4 − 2 · 3 = −2 6= 0. Thus A−4 is defined,


and
 −1  
199 290 1 634 −290
A−4 = (A4 )−1 =  =  
435 634 199 · 634 − 290 · 435 −435 199
↑ ↑
Prop. 2.29.(ii) Thm. 2.25
   
634
1  634 −290 − 290
= =  16 16 .
16 −435 199 − 435 199
16 16
CHAPTER 3

Systems of Linear Equations

This chapter introduces Gaussian Elimination, as a method for solving systems of linear equa-
tions, and then discusses how one can use row operations to find the rank and the inverse of a
matrix.

3.1. Systems of Equations and Matrices


A system of m linear equations with n unknowns x1 , x2 , . . . , xn is a system of the form:


 a11 x1 + a12 x2 + ... + a1n xn = b1
 a21 x1

+ a22 x2 + ... + a2n xn = b2
(3.1) .. .


 .
 a x + a x + ... + a x = b
m1 1 m2 2 mn n m

For example,

x+y+z =5
3x − 2y = 6
is a system with 2 equations and 3 variables (unknowns).
The numbers aij are called coefficients of the system. The matrix of coefficients of the system
(3.1) is the m × n matrix
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A =  .. ..  .
 
..
 . . . 
am1 am2 · · · amn
    
a11 a12 · · · a1n x1 a11 x1 + a12 x2 + · · · + a1n xn
 a21
  x2   a21 x1 + a22 x2 + · · · + a2n xn 
a22 · · · a2n     
Note 3.1.  .. ..   ..  =  .

.. ..
 . . .  .   . 
am1 am2 · · · amn xn am1 x1 + am2 x2 + · · · + amn xn
| {z } | {z } | {z }
A x column vector b

Therefore the system (3.1) is equivalent to the matrix equation


A x = b,

where A is the matrix of coefficients, x = (x1 , . . . , xn )T is the column vector of variables, and
b = (b1 , . . . , bm )T ∈ Rm .
Definition 3.2 (Augmented matrix). The augmented matrix of system (3.1) is
 
a11 · · · a1n b1
(A | b) =  ... .. ..  ,

. . 
am1 · · · amn bm
which can be thought of as an m × (n + 1) matrix.
35
36 3. SYSTEMS OF LINEAR EQUATIONS

 x1 + x2 + x3 = 3
Example 3.3. The augmented matrix for the system 2x1 + x2 + x3 = 4 is
x1 − x2 + 2x3 = 5

 
1 1 1 3
 2 1 1 4 .
1 −1 2 5

3.2. Row operations and Gaussian elimination


As we know, the solution of system (3.1) is preserved if we perform the following operations:

O1: multiply both sides of an equation by a non-zero real number;


O2: interchange 2 equations;
O3: add a multiple of one equation to another.

The effects of O1, O2, O3 on the augmented matrix correspond to row operations:

RO1: multiply a row by a non-zero number λ ∈ R (Row scaling, Ri → λRi );


RO2: interchange two rows (Row interchange, Ri ↔ Rj );
RO3: add a multiple of one row to another (Row replacement, Ri → Ri + µRj , i 6= j, µ ∈ R).

Definition 3.4 (Row equivalence). We will say that two matrices are row equivalent if one can
be obtained from the other by applying a finite sequence of row operations.
   
0 −3 3 1 2 −1
Example 3.5. Matrix A =  1 2 −1  is row equivalent to matrix B =  0 1 −1 .
−2 4 7 0 8 5
This is because B can be obtained from A by applying 3 row operations as follows:

  interchange rows 1 and 2   multiply row 2 by −1/3


0 −3 3 R1 ↔ R2
1 2 −1 R2 → − 13 R2
A= 1 2 −1  −−−−−−−−−−−−−−−−−→  0 −3 3  −−−−−−−−−−− −−−−−→
−2 4 7 −2 4 7

  add 2×row 1 to row 3  


1 2 −1 R3 → R3 + 2R1
1 2 −1
 0 1 −1  −−−−−−−−−−−−−−−→  0 1 −1  = B.
−2 4 7 0 8 5

Note 3.6. Row operations are invertible and their inverses are also row operations.

For example, the inverse of the operation R3 → R3 + 2R1 is R3 → R3 − 2R1 .


3.2. ROW OPERATIONS AND GAUSSIAN ELIMINATION 37

Example 3.7. How do we solve the system from Example 3.3?

  
 x1 + x2 + x3 = 3 1 1 1 3
corresponds to
2x1 + x2 + x3 = 4  2 1 1 4 
x1 − x2 + 2x3 = 5 1 −1 2 5

subtract 2× first equation


R2 →R2 −2R1
from the second one and
subtract first equation R3 →R3 −R1
from the third one
  
 x1 + x2 + x3 = 3 1 1 1 3
−x2 − x3 = −2  0 −1 −1 −2 
−2x2 + x3 = 2 0 −2 1 2

subtract 2×second equation R3 →R3 −2R2


from the third one

  
 x1 + x2 + x3 = 3 1 1 1 3
−x2 − x3 = −2  0 −1 −1 −2 
3x3 = 6 0 0 3 6

R2 →−R2
multiply second equation by−1,
divide third equation by 3 R3 → 13 R3

  
 x1 + x2 + x3 = 3 1 1 1 3
x2 + x3 = 2  0 1 1 2 
x3 = 2 0 0 1 2

R1 →R1 −R3
subtract equation 3 from
the first and the second equations R2 →R2 −R3

  
 x1 + x2 = 1 1 1 0 1
x2 = 0  0 1 0 0 
x3 = 2 0 0 1 2

subtract equation 2 from equation 1 R1 →R1 −R2

  
 x1 = 1 1 0 0 1
x2 = 0  0 1 0 0 
x3 = 2 0 0 1 2

Definition 3.8 (Row echelon form). The first non-zero entry of a row in a matrix is called a
pivot. A matrix is said to be in a (row) echelon form if it has the following two properties:

• if a row is non-zero, then its pivot is further to the right with respect to pivots of the rows
above it;
• zero rows are at the bottom of the matrix.
38 3. SYSTEMS OF LINEAR EQUATIONS

For example, the followingmatrices are in row echelon forms:


 
−1 ∗ ∗ ∗ ∗ ... ∗ ∗
 0 2 ∗ ∗ ∗ ∗ ∗  
  2 ∗ ∗ ... ∗
 0 0 0 5 ∗ ... ∗ ∗
.. or 0 −3 ∗ . . . ∗
 
 .. .. .. .. .. ..
.. .

 . . . . . . . . .. . .
 ..
 
 . . .
 0 0 0 0 0 . . . 0 −7

 0 0 0
 0 0 ... 8 ∗
0 0 ... 0 0
0 0 0 0 0 ... 0 0
 
1 −2 0 7
On the other hand, the matrix 0 0 9 6 is not in a row echelon form, as the pivot of
0 0 5 −2
the third row is in the same column as the pivot of the second row.
Definition 3.9 (Reduced row echelon form). A matrix A is in a reduced (row) echelon form if
all of the following are satisfied:
• A is in a row echelon form;
• each non-zero row of A starts with 1 (i.e., every pivot is equal to 1);
• every column of A containing a pivot (of some row) has zeros elsewhere.
   
1 0 0 5 1 −2 0 0
E.g., the matrices 0 1 0 3 and 0 0 1 0 are in reduced echelon forms. On the
0 0 1 −2 0 0 0 1
 
1 0 3
other hand, the matrix 0 1 0 is not in a reduced echelon form, as the third column contains
0 0 1
the pivot of the third row and has another non-zero entry above it.
As we see from Example 3.7, bringing the augmented matrix of a system of linear equations to
a reduced echelon form corresponds to solving this system. E.g., the augmented matrix
     
1 0 0 0 α x1 α
 0 1 0 0 β  x2   β 
 0 0 1 0 γ  corresponds to the solution x3  =  γ  .
     

0 0 0 1 δ x4 δ
Definition 3.10 (Gaussian elimination). The method of solving a system of linear equations,
which starts with the augmented matrix, performs row operations with it to end up with a reduced
row echelon form, and then “reads off” the solution, is called Gaussian elimination.
Example 3.11. Use Gaussian elimination to solve the following system of linear equations:
     R2 → R2 − 2R1
 2x1 + 3x2 + x3 = 6 2 3 1 6 1 1 1 6 R3 → R3 − 2R1
R1 ↔R3
2x1 + 4x2 + x3 = 5  2 4 1 5 − −−−−→  2 4 1 5  −−−−−−−−−−−→
x1 + x2 + x3 = 6 1 1 1 6 2 3 1 6

   
1 1 1 6 R ↔ R3
1 1 1 6 R3 → R3 − 2R2
 0 2 −1 −7  −−−2−−−− →  0 1 −1 −6  −−−−−−−−−−−→
0 1 −1 −6 0 2 −1 −7
  R1 → R1 − R3    
1 1 1 6 R2 → R2 + R3
1 1 0 1 R1 → R1 − R2
1 0 0 2
 0 1 −1 −6  −−−−−−−−−−−→  0 1 0 −1  −−−−−−−−−−−→  0 1 0 −1  .
0 0 1 5 0 0 1 5 0 0 1 5
| {z } | {z }
row echelon form of reduced row echelon form
the augmented matrix of the augmented matrix
3.2. ROW OPERATIONS AND GAUSSIAN ELIMINATION 39
   
x1 2
Hence the solution of the system is x2  = −1. Geometrically, this is a single point in R3 .
x3 5
We can easily check the answer by substituting this back into the system:

 2 · 2 + 3(−1) + 5 = 6 X
2 · 2 + 4(−1) + 5 = 5 X .
2−1+5=6 X

Example 3.12. Find the general solution of a system of linear equations whose augmented
matrix has been reduced to
  R1 → R1 + 2R3  
1 6 2 −5 −2 −4 R2 → R2 + R1
1 6 2 −5 0 10 R2 → 1 R2
 0 0 2 −8 −1 3  −−−−−−−−−−−→  0 0 2 −8 0 10  −−−−−−2−−→
0 0 0 0 1 7 0 0 0 0 1 7
   
1 6 2 −5 0 10 R1 → R1 − 2R2
1 6 0 3 0 0
 0 0 1 −4 0 5  −−−−−−−−−−−→  0 0 1 −4 0 5  ⇐⇒
0 0 0 0 1 7 0 0 0 0 1 7
| {z }
reduced row echelon form

 x1 + 6x2 + 3x4 = 0
x3 − 4x4 = 5 .
x5 = 7

We now have three equations and five variables with x4 and x2 free variables: say, x4 = λ, x2 = µ
where λ, µ ∈ R. It follows that x1 = −6µ − 3λ, x3 = 5 + 4λ and x5 = 7. Thus the general solution
of the system has the form
         
x1 −6µ − 3λ 0 −6 −3
 x2   µ   0   1   0 
         
 x3  =  5 + 4λ  =  5  + µ  0  + λ  4  , λ, µ ∈ R.
         
 x4   λ   0   0   1 
x5 7 7 0 0
So, there are infinitely many solutions. Geometrically, we see that the set of all solutions forms a
plane in R5 , passing through the point P = (0, 0, 5, 0, 7) and parallel to the vectors a = (−6, 1, 0, 0, 0)
and b = (−3, 0, 4, 1, 0).
Example 3.13.
   R2 → R2 − 2R1
 x−y+z =2 1 −1 1 2 R3 → R3 − 4R1
2x + y + 3z = 2  2 1 3 2  −−−−−−−−−−−→
4x − y + 5z = 1 4 −1 5 1

   
1 −1 1 2 R3 → R3 − R2
1 −1 1 2
 0 3 1 −2  −−−−−−−−−−−→  0 3 1 −2  .
0 3 1 −7 0 0 0 −5
The last row is equivalent to the equation 0 · x + 0 · y + 0 · z = −5 ⇐⇒ 0 = −5, which is
impossible. Therefore this system has no solution. This is an example of an inconsistent system.
Definition 3.14 (Consistent system). A system of linear equations is said to be consistent if
it has at least one solution. Otherwise the system is said to be inconsistent.
We can now formulate the following important theorem:
Theorem 3.15. (Theorem about the existence and uniqueness of solutions in systems of linear
equations.)
a) A system of linear equations is consistent if and only if the corresponding augmented
matrix in a row echelon form has no row of the form 0 0 . . . 0 | b, where b 6= 0.
40 3. SYSTEMS OF LINEAR EQUATIONS

b) If the system is consistent then either there is a unique solution (i.e., there are no free
variables), or there are infinitely many solutions (i.e., there are free variables). In the
latter case the number of free variables is equal to the difference between the number of
variables and the number of non-zero rows in an echelon form of the augmented matrix of
the system.

Proof. This can be proved by a straightforward analysis of the possible echelon forms of the
augmented matrix. (Details are omitted – check yourself!) 

The fact that the number of free variables is independent of the way we solve a system of linear
equations follows from

Theorem 3.16. Each matrix is row equivalent to a unique matrix in reduced row echelon form.
In particular, every row echelon form of a matrix has the same number of non-zero rows.

Proof. Omitted. 

3.3. Matrix inverse using row operations


×n
Recall, from Section 2.4, that a square n matrixA is invertible if there is an n × n matrix
1 ... 0
 .. . . .. 
B such that A B = B A = In , where In =  . . . is the identity matrix of size n × n.
0 . . . 1 n×n
−1
(Notation: B = A .)

Theorem 3.17. Suppose that we are given a system of n linear equations with n unknowns:

 a11 x1 + a12 x2 + . . . + a1n xn = b1

(3.2) .. .
 .
an1 x1 + an2 x2 + . . . + ann xn = bn

 
b1
 .. 
Let A = (aij )n×n be the corresponding matrix of coefficients and let b =  .  . If A is invertible
bn n×1
 
x1
 .. 
then the system has a unique solution  .  = A−1 b.
xn

Proof. From Note 3.1, we know that system (3.2) is equivalent to the matrix equation
 
x1
(3.3) A x = b, where x =  ...  .
 

xn

Since A is invertible, there exists A−1 , an n × n matrix.


First we check that x = A−1 b is indeed a solution of (3.3):

A x = A (A−1 b) = (A A−1 ) b = In b = b X.
↑ ↑ ↑
Assoc. Def. of A−1 Prop. of In
3.3. MATRIX INVERSE USING ROW OPERATIONS 41

To show that this solution is unique, multiply both sides of the equation A x = b by A−1 on
the left:
A−1 (A x) = A−1 b
⇐⇒ (A−1 A) x = A−1 b (by associativity of matrix multiplication)
−1
⇐⇒ In x = A b (by definition of A−1 )
⇐⇒ x = A−1 b (as In x = x).
Thus the solution is indeed unique. 
Definition 3.18 (Elementary matrix). An elementary matrix E of size n is an n × n matrix
obtained from the identity matrix In by applying exactly one row operation.
 
1 0 0 −5
0 1 0 0
E.g.,
0 0 1
 is the elementary matrix of size 4 corresponding to the row operation
0
0 0 0 1
 
0 1 0
R1 → R1 − 5R4 , 1 0 0 is the 3 × 3 elementary matrix corresponding to the row operation
0 0 1
 
1 0 0 0
0 1 0 0
R1 ↔ R2 , and 0 0 1 0 is the 4 × 4 elementary matrix corresponding to the row operation

2
0 0 0 1
R3 → 21 R3 .  
1 0 0
On the other hand, 0 1 0 is not an elementary matrix because in order to obtain it from
1 0 2
I3 we need to perform at least two row operations.
Note 3.19. If E is an elementary matrix obtained from In by doing some row operation, and
A is any n × n matrix, then the product E A can be obtained from A by doing the exact same row
operation.
Proof. Exercise (see Example 3.20 for the idea). 

Thus applying a row operation to a matrix A is equivalent to multiplying A by the corresponding


elementary matrix on the left.
 
0 0 0 1
0 1 0 0
Example 3.20. a) Let E =  0 0 1 0
 be the elementary matrix obtained from I4 by
1 0 0 0 4×4
   
1 1 1 1 4 4 4 4
2 2 2 2 2 2 2 2
doing R1 ↔ R4 . And let A = 3 3 3 3. Then E A = 3 3 3 3 , i.e., E A is obtained
  

4 4 4 4 1 1 1 1 4×4
from A by applying R1 ↔ R4 .  
1 0 0
b) The elementary matrix E = 2 1 0 is obtained from I3 by doing R2 → R2 + 2R1 .
0 0 1 3×3
   
1 2 3 1 2 3
Let A = 4 5 6. Then E A = 6 9 12 is obtained from A by doing R2 → R2 + 2R1 .
7 8 9 7 8 9
42 3. SYSTEMS OF LINEAR EQUATIONS

Proposition 3.21. Every elementary matrix is invertible, and its inverse is an elementary
matrix (of a similar type).
Proof. Let E be an elementary matrix of size n. Then there are three possibilities.
Case 1: E is obtained from In by exchanging row i with row j (for some 1 ≤ i < j ≤ n). Then
E −1 = E as E E = In . Indeed, according to Note 3.19, E E is obtained from E by exchanging row
i with row j, thus we get In back.
Case 2: E is obtained from In by multiplying row i by a number λ 6= 0. Let F be the elemen-
tary matrix obtained from In by multiplying row i by 1/λ. The same argument as above (using
Note 3.19) shows that F E = In and E F = In , hence F = E −1 .
Case 3: E is obtained from In by adding λ·ith row to j th row, for some λ ∈ R and i 6= j, 1 6 i, j 6 n.
Let F be the elementary matrix obtained from In by applying Rj → Rj − λRi . Again, in view of
Note 3.19, F E can be obtained from E by applying the row operation Rj → Rj − λRi . However,
E was obtained from In by applying Rj → Rj + λRi . Thus the two row operations cancel each
other out, so, as a result, we get In back: F E = In . Similarly, E F = In . Therefore E is invertible
and E −1 = F . 
Theorem 3.22 (Characterization of invertible matrices). If A is an n × n matrix then the
following are equivalent:
(i) A is invertible;  
x1
 .. 
(ii) the equation A x = 0 only has the trivial solution x = 0, where x =  . ;
xn
(iii) the reduced row echelon form of A is In ;
(iv) A is a product of n × n elementary matrices.
Proof. We will show that (i) ⇒ (ii) ⇒ (iii) ⇒ (iv) ⇒ (i).
“(i) ⇒ (ii)” By Theorem 3.17, the only solution of A x = 0 is x = A−1 0 = 0.
“(ii) ⇒ (iii)” Since the system A x = 0 has a unique solution, there are no free variables. Hence,
by Theorem 3.15, the reduced echelon form of the augmented matrix (A | 0) has no zero rows and
no rows of the form 0 . . . 0 | b, where b 6= 0. In particular, every row in the reduced echelon form
of (A | 0)starts with 1. It follows
 that the only possibility for the reduced echelon form of this
1 0 ... 0 ∗
 0 1 ... 0 ∗ 
matrix is  .. .. . . .. ..  . As we see the left-hand side of this reduced echelon form is
 
 . . . . . 
0 0 . . . 1 ∗ n×(n+1)
the identity matrix In . So, matrix A can be brought to In via a finite sequence of row operations.
Therefore, by Theorem 3.16, In is the reduced echelon form of A, i.e., (iii) holds.
“(iii) ⇒ (iv)” (iii) implies that we can get from A to In by applying finitely many row operations.
By Note 3.19 this means that there exist elementary matrices E1 , . . . , Ek (all n × n) such that
Ek (Ek−1 . . . (E1 A) . . .) = In . In view of the associativity of matrix multiplication, the latter is
equivalent to (Ek Ek−1 . . . E1 ) A = In .
By Proposition 3.21, for each i = 1, 2, . . . , k, Ei is invertible and Ei−1 is an elementary ma-
trix. Arguing as in the proof of Theorem 2.26.(c), one can show that Ek . . . E1 is invertible and
(Ek . . . E1 )−1 = E1−1 . . . Ek−1 . After multiplying both sides of the equation (Ek Ek−1 . . . E1 ) A = In
by (Ek . . . E1 )−1 on the left and using the standard properties of matrices from Theorem 2.13, we
obtain A = (Ek . . . E1 )−1 = E1−1 . . . Ek−1 , i.e., (iv) holds.
“(iv) ⇒ (i)” Suppose that A = E1 . . . Ek for some elementary matrices E1 , . . . , Ek . Arguing as above
(basically by Theorem 2.26.(c) and Proposition 3.21), we conclude that E1 . . . Ek is invertible and
(E1 . . . Ek )−1 = Ek−1 . . . E1−1 . Hence A = E1 . . . Ek is invertible, i.e., (i) holds. 
3.3. MATRIX INVERSE USING ROW OPERATIONS 43

Theorem 3.22 will be very useful further on. As the first application, let us show that for a
square matrix to be invertible, it is sufficient only to have a left (or right) inverse of the same size.
(Recall that according to our Definition 2.21, an n × n matrix B is invertible if there is a matrix A
such that B is both the left and the right inverse of A: A B = In and B A = In .)
Theorem 3.23. If A and B are n × n matrices such that A B = In , then both A and B are
invertible and B = A−1 , A = B −1 .
 
x1
 .. 
Proof. Let x =  .  and suppose that B x = 0. Since A B = In , we get A (B x) = A 0, so,
xn
by associativity, In x = 0, yielding that x = 0. Hence the equation B x = 0 has only the trivial
solution x = 0. So, by Theorem 3.22, B is invertible.
Now, multiplying both sides of the equation A B = In by B −1 on the right, we get (A B) B −1 =
In B −1 . Using associativity and the fact that B B −1 = In , we obtain A In = B −1 , so A = B −1 (by
properties of In ). Therefore A is invertible and A−1 = (B −1 )−1 = B (see Theorem 2.26.(b)). 
Example
  3.24. The claim of Theorem 3.23 does not hold for non-square matrices.E.g., take
1 0     1 0 0
1 0 0 1 0
B = 0 1  , A = . Then A B = = I2 but B A = 0 1 0 6= I3 (in
0 1 0 2×3 0 1
0 0 3×2 0 0 0
any case, a non-square matrix cannot be invertible, by definition).
Exercise. Let A be an n × n matrix. Prove that the following are equivalent:
(i) A is not invertible;
(ii) AT is not invertible;
(iii) there is a non-zero n × n matrix B such that A B = O, where O is the n × n zero matrix;
(iv) there is a non-zero n × n matrix C such that C A = O, where O is the n × n zero matrix.
We can now describe an algorithm for finding the inverse of a matrix using row operations.
Algorithm 3.25 (Matrix inverse via row operations). Start with an n × n matrix A and form
the new extended matrix (A | In ). Then apply row operations to bring A (the left part of the
extended matrix) to In .
If the above succeeds, then A is invertible and A−1 can be read off the resulting extended matrix,
which will have the form (In | A−1 ). On the other hand, if it is not possible to bring A to In via
row operations (that is, if some echelon form of A has a row of zeros), then A is not invertible.
Let us justify why the above algorithm is valid.
Proposition 3.26. Starting with a square matrix A, Algorithm 3.25 will indeed terminate,
either by showing that A is not invertible or by outputting the matrix A−1 if A is invertible.
Proof. First, we know (from Theorem 3.22) that if A cannot be brought to In via row oper-
ations then A is not invertible.
So, suppose that the algorithm is successful, i.e., we managed to transform the left-hand side
of the extended matrix to In . Recall that, according to Note 3.19, each row operation corresponds
to multiplication of A by an elementary matrix on the left, ending up in In . So, at the end of the
algorithm, the left half of the extended matrix becomes Ek . . . E2 E1 A = In . Since we do the same
row operations with the right half, it becomes Ek . . . E1 In (= Ek . . . E1 ). But, (Ek . . . E1 ) A = In
implies that A is invertible and Ek . . . E1 = A−1 by Theorem 3.23. Hence the right half of the
extended matrix ends up being A−1 . 
 
2 8 3
Example 3.27. Find A−1 using row operations for A = 1 3 2.
2 7 4
44 3. SYSTEMS OF LINEAR EQUATIONS

First we form the extended matrix (A | I3 ), then use row operations to bring the left half to In
column-by-column:
    R2 → R2 − 2R1
2 8 3 1 0 0 R ↔ R2
1 3 2 0 1 0 R3 → R3 − 2R1
 1 3 2 0 1 0  −−−1−−−− →  2 8 3 1 0 0  −−−−−−−−−−−→
2 7 4 0 0 1 2 7 4 0 0 1
   
1 3 2 0 1 0 R2 ↔ R3
1 3 2 0 1 0 R3 → R3 − 2R2
 0 2 −1 1 −2 0  −−−−−−−→  0 1 0 0 −2 1  −−−−−−−−−−−→
0 1 0 0 −2 1 0 2 −1 1 −2 0
  R3 → −R3  
1 3 2 0 1 0 R1 → R1 − 2R3
1 3 0 2 5 −4 R1 → R1 − 3R2
 0 1 0 0 −2 1  −−−−−−−−−−−→  0 1 0 0 −2 1  −−−−−−−−−−−→
0 0 −1 1 2 −2 0 0 1 −1 −2 2
 
1 0 0 2 11 −7
 0 1 0 0 −2 1 .
0 0 1 −1 −2 2
 
2 11 −7
Hence A is invertible and A−1 =  0 −2 1.
−1 −2 2
    
2 8 3 2 11 −7 1 0 0
Check: A A−1 = 1 3 2  0 −2 1 = 0 1 0 = I3 X.
2 7 4 −1 −2 2 0 0 1
(By Theorem 3.23 it is enough to check that A B = In to show that B = A−1 .)

3.4. Rank of a matrix


Let A be a matrix of size m × n. By Theorem 3.16, any echelon form of A has the same number
r of non-zero rows. Therefore the following definition makes sense.
Definition 3.28 (Rank of a matrix). Given an m × n matrix A, the number r, of non-zero
rows in some (any) row echelon form of A is said to be the rank of A, denoted rank(A).
   
x1 0
 ..   .. 
Consider the system, A x = 0, where x =  .  and 0 =  .  ∈ Rm . By Theorem 3.15, the
xn 0
number of free variables in the solution of this system is equal to the difference between n and
the number of non-zero rows in anechelon  form of the augmented matrix (A | 0) (clearly this
0
system is always consistent, as x =  ...  ∈ Rm is a solution). Since the right most column in the
 

0
augmented matrix is zero, its row echelon form is (A0 | 0), where A0 is a row echelon form of A.
Thus rank ((A | 0)) = rank(A), and, by Theorem 3.15,
 
number of free variables in
(3.4) = n − rank((A | 0)) = n − rank(A).
the solution of A x = 0
Note 3.29. We have used Theorem 3.16 to explain why rank(A) is well-defined, however, we
have not given a proof of Theorem 3.16. This can be partially remedied by noticing that, in view
of (3.4), the statement “rank(A) = n” is equivalent to saying that the number of free variables in
the solution of the system A x = 0 is zero, i.e., this system has a unique solution x = 0 ∈ Rn .
Obviously the uniqueness of the solution only depends on the system itself, and does not depend
on the way we solve the system. Hence the statement “rank(A) = n” is well-defined, i.e., any row
echelon form of A has exactly n non-zero rows.
3.4. RANK OF A MATRIX 45
 
1 2 3
Example 3.30. Find the rank of A = 4 5 6.
7 8 9
First bring A to a row echelon form:
  R2 → R2 − 4R1    
1 2 3 R3 → R3 − 7R1
1 2 3 R3 → R3 − 2R2
1 2 3
4 5 6 −−− −−−−−−−−→  0 −3 −6 −−−−−−−−−−−→ 0 −3 −6 .
7 8 9 0 −6 −12 0 0 0
| {z }
row echelon form of A

As we see the number of non-zero rows in the above row echelon form of A is 2, so rank(A) = 2.
The next theorem characterizes invertible matrices as the square matrices of maximal possible
rank.
Theorem 3.31 (Characterization of invertibility in terms of rank). If A is an n × n matrix,
then the following statements are equivalent to each other:
(a) A is invertible;  
0
 .. 
(b) the equation A x = 0 has only the trivial solution x =  .  ∈ Rn ;
0
(c) rank(A) = n.
Proof. Note that (a) ⇔ (b) by Theorem 3.22. Now, if (b) holds then the same theorem implies
that A has In as its reduced echelon form. Therefore rank(A) = n, i.e., (b) ⇒ (c).
On the other hand, (c) ⇒ (b) because if rank(A) = n, then the number of free variables in the
solution of A x = 0 is n − rank(A) = 0 (by Theorem 3.15), i.e., x = 0 is the unique solution and
(b) is true.
Thus (a) ⇔ (b) ⇔ (c). 
Example 3.32. Let A be the matrix from Example 3.30. Since rank(A) = 2 < 3 and A is a
3 × 3 matrix, using Theorem 3.31 we can conclude that A is not invertible.
Theorem 3.33 (Properties of matrix rank). Let A be an m × n matrix, let B be any matrix.
Then:
(i) if B is row equivalent to A (i.e., B can be obtained from A by applying a finite sequence
of row operations), then rank(A) = rank(B).
(ii) rank(λ A) = rank(A), for any λ ∈ R, λ 6= 0.
(iii) rank(A) ≤ min{m, n}.
(iv) rank(A + B) ≤ rank(A) + rank(B) (provided B is also an m × n-matrix).
(v) rank(A B) ≤ min{rank(A), rank(B)} (if B is an n × k matrix for some k).
Proof. (i) is obvious, as row equivalent matrices can be brought to the same row echelon form.
(ii) follows from (i) as λ A is row equivalent to A (one just needs
to multiply each rowof A by λ).
∗ ∗ ... ... ∗ ∗
0 ∗ ∗ . . . ∗ ∗
 
To prove (iii), note that a row echelon form of A is A0 =  ... ... ... .. ..  . The

 . . 
0 0 0 . . . ∗ ∗
0 0 0 . . . 0 0 m×n
rank of A is equal to the number of non-zero rows in A0 , by definition, which is also is equal to the
number of pivots in A0 . But the number of rows in A0 is m and the number of pivots in A0 cannot
exceed the number n, of columns in it, as each column contains at most one pivot (by definition of
a row echelon form). Hence rank(A) ≤ min{m, n}.
The proofs of (iv) and (v) are omitted, as they are beyond our present scope. 
46 3. SYSTEMS OF LINEAR EQUATIONS

Theorem 3.34. Let A be an m × n matrix and b ∈ Rm . The system A x = b is consistent


(i.e., it has at least one solution x ∈ Rn ) if and only if the rank of the augmented matrix (A | b) is
the same as the rank of A.
Proof. Exercise (use Theorem 3.15). 
CHAPTER 4

Determinants

For any natural number n ∈ N, let Mn (R) denote the set of all n × n matrices with real entries.
Given a square matrix A ∈ Mn (R), various real numbers can be associated to it. For example,
one useful quantity is the trace of A, defined as the sum of all diagonal entries. This chapter will
introduce another such quantity, called the determinant. Determinants are an important tool in
Mathematics, and they naturally occur in many different subjects (e.g., in Calculus they appear as
Jacobians).

4.1. Axiomatic definition of determinant


We start with giving an axiomatic definition of the determinant, which is essentially due to
German mathematician Karl Weierstrass (1815-1897). Many sources give an equivalent definition,
in terms of cofactor expansions, which appears as Theorem 4.18 below. However, in these notes we
opt for the former, as the axiomatic description is much more convenient for establishing various
properties and computing the determinants via row operations.
Definition 4.1 (Determinant). A function D : Mn (R) → R is said to be a determinant if it
satisfies the following 3 conditions.
(D1) For each i, 1 ≤ i ≤ n, D is a linear function with respect to the i-th row. This means that
(i) if a matrix B is obtained from a matrix A ∈ Mn (R) by multiplying the i-th row by
some λ ∈ R then D(B) = λD(A);
(ii) if A and A0 in Mn (R) are two matrices which agree everywhere except the i-th row,
and A00 is the matrix which agrees with A (and A0 ) everywhere except the i-th row and
the i-th row of A00 is the sum of i-th rows of A and A0 , then D(A00 ) = D(A) + D(A0 )
a1 a1 a1
     
 ..   ..   .. 
. .  . 
In other terms, if A =  ai  , A0 =  a0i  and A00 = ai + a0i  ,
     
. .  . 
 ..   ..   .. 
an n×n an n×n an n×n
where a1 , . . . , an , ai are row vectors from R , then D(A ) = D(A) + D(A0 ).
0 n 00

(D2) If A ∈ Mn (R) has two equal rows, then D(A) = 0.


(D3) D(In ) = 1, where In is the n × n identity matrix.
Example 4.2 (n = 3).
   
a b c a b c
by (D1).(i)
D(λd λe λf ) ======== λ D(d e f ).
g h i g h i
     
a b c a b c a b c
by (D1).(ii)
D( d e f ) ========= D( d e f ) + D( d e f ).
x1 + y1 x2 + y2 x3 + y3 x1 x2 x3 y1 y2 y3
Note 4.3. Let A be an n × n matrix with a zero row, then D(A) = 0.
Indeed, suppose the i-th row vector ai = 0. Then, since 0 = 0 0 we have
47
48 4. DETERMINANTS

a1 a1
  
..  .. 
 .  (D1).(i) .
D(A) = D( 0 ) ====== 0 · D( 0 ) = 0 · D(A) = 0.
   
. .
 ..   .. 
an an
Theorem 4.4 (Basic properties of determinant). Let A, B ∈ Mn (R) (i.e., A and B are n × n
matrices).
(a) If B is obtained from A by adding λ · j-th row to i-th row, for some λ ∈ R and i 6= j, then
D(A) = D(B).
(b) If B is obtained from A by interchanging two rows then D(A) = −D(B).
(c) If B is obtained from A by multiplying a row of A by some λ ∈ R, λ 6= 0, then D(B) =
λD(A) (and so D(A) = λ1 D(B)).
 
a1
 .. 
Proof. Let A =  .  , where a1 , . . . , an ∈ Rn are row vectors of A.
an n×n
(a) Without loss of generality, assume that j > i, as the case when i > j is similar.
a1 a1 a1
     
..  ..   .. 
. .  . 
 
 
ai + λaj   ai  λaj 
     
.. (D1).(ii)
 ... ) + D( .. )
     
B= , hence D(B) == = == = = D(
 .     . 
a 
 aj   aj 
   
 j
 ..  .  . 
 .   ..   .. 
an an an
| {z }
A
a1
 
 .. 
.
 aj 
 
(D1).(i)  .  (D2)
 .. ) ==== D(A) + 0 = D(A).
====== D(A) + λD( 
 aj 
 
.
 .. 
an
(b) Observe that B can be obtained from A as follows:
Ri → Ri + Rj Rj → Rj + (−1) × Ri Ri → Ri + Rj Rj → (−1) × Rj
A −−−−−−−−−→ A0 −−−−−−−−−−−−−→ A00 −−−−−−−−−→ A000 −−−−−−−−−−→ B.
Diagram:
a1 a1 a1 a1 a1
         
..  . .  . . .
 .   .. 
.  .   .   .  .
a a + a a + a a  aj 
         
 i
 . j
 i  i
 . j
   j 
.
 ..  −→  .  −→  .  −→  .  −→  ..  .
 .   
   .   .   .  .
 −ai  −ai 
a 
a  ai 
       
 j
 .j 
 
.  .   .  .
 ..   ..   ..   ..   .. 
an an an an an
| {z } | {z } | {z } | {z } | {z }
A A0 A00 A000 B
4.1. AXIOMATIC DEFINITION OF DETERMINANT 49

By part (i) we know that D(A) = D(A0 ) = D(A00 ) = D(A000 ), and, by (D1).(i), D(B) =
−D(A000 ) = −D(A). Thus (b) holds.
Finally, (c) is true by (D1).(i). 
Theorem 4.4 tells us that we can keep track of the determinant while performing row operations:
Corollary 4.5. Assume that an n × n matrix B is obtained from another n × n matrix A by
performing a single row operation. Then

Ri → Ri + λRj
 1 if A −−−−−−−−−→ B, for some λ ∈ R


Ri ↔ Rj
D(B) = α · D(A), where α = −1 if A −−−−−→ B

 Ri → µRi
µ if A −−−−−−→ B, for some µ ∈ R, µ 6= 0

(in particular, α 6= 0).


Recall that the operation Ri → Ri + λRj is called (row) replacement, the operation Ri ↔ Rj –
(row) interchange, and the operation Ri → µRi – (row) scaling.
Recall that a matrix A ∈ Mn (R) is upper triangular if all entries below the diagonal are zero,
i.e., aij = 0 if i > j, where A = (aij ). In other words, an upper triangular matrix has the form
a11 ∗ . . . ∗
 
.. 
 0 a22 . 

A= . . .
 .. 0 .. ∗ 
0 . . . 0 ann
Theorem 4.6. Let A be an upper triangular n × n matrix, A = (aij ). Then the determinant of
A is equal to the product of the diagonal entries of A:
D(A) = a11 a22 · · · ann .
Proof. First, assume that aii 6= 0 for every i = 1, 2, . . . , n. Then
   
a11 a12 · · · a1n Ri → a1 Ri , 1 b12 · · · b1,n
ii
 0 a22
 (for each i = 1, 2, . . . , n) 0 1 · · · b2,n 
· · · a2n   
A =  .. ..  −−−−−−−−−−−−−−−−→  .. ..  = B

.. .. .. . .
 . . . .  . . . . 
0 0 · · · ann 0 0 ··· 1
where bij = aij /aii (as aii 6= 0) for all i = 1, . . . , n, and j > i.
Theorem 4.4.(c) implies that
1
D(B) = D(A), hence D(A) = (a11 a22 · · · ann ) D(B),
a11 a22 · · · ann
so it remains to show that D(B) = 1.
1 b12 . . . b1,n−1 0
   
1 b12 . . . b1,n−1 b1n
0 1 . . . b2,n−1 b2n   R i → R i − b in n R ,  0 1 . . . b2,n−1 0
 (for each i = 1, 2, . . . , n − 1)
. .. . . .. .. 
 .. .. . . .. ..
B = .  −−−−−−−−−−−−−−−−−−→  .. . . . .
  
. . . .

0 0 . . . 1 bn−1,n
  . ..

  0 0 1 0
0 0 ... 0 1 0 0 ... 0 1

1 0 ... 0 0
 
sequence of similar 0 1 . . . 0 0
row replacements
. . . . .
−−−−−−−−−−−−−→  .. . . . . .. ..  = In .
 
. . . 
 .. .. . . 1 0
0 0 ... 0 1
50 4. DETERMINANTS

Since In is obtained from B by applying row replacement operations only, we know that D(B) =
D(In ) by Theorem 4.4. But D(In ) = 1 by (D3), hence D(B) = 1, so D(A) = a11 a22 · · · ann .
Now suppose that some diagonal entry of A is 0. Let us show that then D(A) = 0 (= a11 · · · ann
as one of aii = 0). Choose the maximal i, 1 ≤ i ≤ n, such that aii = 0. Then ajj 6= 0 if i < j ≤ n.
Now, if i = n, i.e., ann = 0, then the n-row of A is zero, so D(A) = 0 by Note 4.3. Otherwise, if
ann 6= 0, we can perform (n − 1) row replacements to make sure that all entries above ann become
0:
a11 . . . a1i a1,i+1 . . . a1n
 
 .. ..
. ..
. .. ..  a
 . . .  Rj → Rj − a jn Rn ,
nn
 0 ... 0 ai,i+1 . . . ain  . . . , n − 1)
  (for every j = 1, 2,
A=  −−−−−−−−−−−−−−−−−−−→
 0 . . . 0 ai+1,i+1 . . . ai+1,n 
 . .. .. .. .. 
 .. . . . . 
0 ... 0 0 . . . ann

a11 . . . a1i a1,i+1 . . . 0


 
 .. .. .
. ..
.. .. 
 . . . 
 0 ... 0 ai,i+1 . . . 0 
 
.
 0 . . . 0 ai+1,i+1 . . . 0 

 . .. .. .. .. 
 .. . . . . 
0 ... 0 0 . . . ann
Next, if i < n − 1, we can again apply a sequence of row replacements to make the (n − 1)-st
column of A zero everywhere except for the diagonal entry an−1,n−1 . Doing this (n − i) times we
obtain the matrix

a11 . . . a1i 0 ... 0


 
 .. ..
. .
.. .. .. 
 . . . 
 0 ... 0 0 ... 0 
 
C= .
 0 . . . 0 ai+1,i+1 ... 0 
 . .. .. .. .. 
 .. . . . . 
0 ... 0 0 ... ann
Clearly the i-th row of C is zero, hence D(C) = 0 by Note 4.3. But D(C) = D(A) by The-
orem 4.4, since C is obtained from A by performing finitely many row replacements. Therefore
D(A) = 0 = a11 · · · ann , as claimed. 

Now we can present an algorithm for computing the determinant of any square matrix:
Algorithm 4.7. Given any square matrix A, use row operations to bring A to an upper tri-
angular form, keeping track of how the determinant changes at each step (using Theorem 4.4).
Once this is done, the determinant of the upper triangular matrix can be calculated according to
Theorem 4.6 (by taking the product of all diagonal entries).
Example 4.8. (a) Let A = (a) be a 1 × 1 matrix. Then D(A) = a D(1) = a 1 = a, as (1) = I1
and D(I1 ) = 1 by(D3).
a b
(b) Let A = ∈ M2 (R). Let us show that D(A) = ad − bc. To prove this we will
c d 2×2
consider several cases.
Case 1: a 6= 0 Then
   
a b R2 →R2 − ac R1 a b
 c 
D(A) = D( ) ========== D( c ) = a d − b = ad − bc.
c d 0 d − ab a
↑ ↑
Thm. 4.4 Thm. 4.6
4.1. AXIOMATIC DEFINITION OF DETERMINANT 51

Case 2: a = 0, but c 6= 0. In this case


   
0 b R1 ↔R2 c d
D(A) = D( ) ====== −D( ) = −cb = ad − bc (as a = 0).
c d 0 b
↑ ↑
Thm. 4.4 Thm. 4.6

Case 3: a = 0 and c = 0. Then


 
0 b
D( ) = 0 · d = ad − bc (as a = 0, c = 0).
0 d

Thm. 4.6

 Further
Notation.
 on instead of D(A) we will write det(A) or |A|. Thus, by Example 4.8,
a b a b
= ad − bc.
det =
c d c d
Example 4.9. Calculate the determinant of the given matrix using row operations (i.e., using
Algorithm 4.7).
 
4 1 1
(a) A =  1 2 3.
−1 3 0
1 2 3 R2 → R2 − 4R1

4 1 1 1 2 3
R ↔R2 R 3 → R 3 + R 1
det(A) = 1 2 3 ==1==== − 4 1 1 ============= − 0 −7 −11
−1 3 0 ↑ −1 3 0 ↑ 0 5 3
Thm. 4.4.(b) Thm. 4.4.(a)

1 2 3   
R3 →R3 + 57 R2 34
========== − 0 −7 −11 ===== − 1 · (−7) · − = −34.
34 7
↑ 0 0 −7 ↑
Thm. 4.4.(a) Thm. 4.6

So, det(A) = −34.


 
2 4 6
(b) B = 3 3 2 .
5 4 −1
R2 → R2 − 3R1
2 4 6 1
1 2 3 1 2 3
R1 → 2 R1 R3 → R3 − 5R1
det(B) = 3 3
2 ======= 2 3 3
2 ============= 2 0 −3 −7

5 4 −1 ↑ 5 4 −1 ↑ 0 −6 −16
Thm. 4.4.(c) Thm. 4.4.(a)

1 2 3
R3 →R3 −2R2
========== 2 0 −3 −7 ===== 2 (1 · (−3) · (−2)) = 12.

↑ 0 0 −2 ↑
Thm. 4.4.(a) Thm. 4.6

So, det(B) = 12.


 
5 3 1 −2
4 3 1 1
(c) C = 
0
.
3 6 0
0 −4 −4 5

5
3 1 −2

1
0 0 −3

1
0 0 −3

4 3 1 1 R1 →R1 −R2 4
3 1 1 R2 →R2 −4R1 0
3 1 13
det(C) =
========= ==========
0 3 6 0 0 3 6 0 0 3 6 0
0 −4 −4 5 0 −4 −4 5 0 −4 −4 5
52 4. DETERMINANTS

1
0 0 −3

1
0 0 −3 R3 → R3 − 3R2

1
0 0 −3
1
R3 → 3 R3 0 3 1 13 R2 ↔R3
0 1 2 0 R3 → R4 + 4R2
0 1 2 0
======= 3 ==== == −3 == = ==== === === −3
0 1 2 0 0 3 1 13 0 0 −5 13

0 −4 −4 ↑
0 −4 −4

Thm. 4.4.(c) 5
Thm. 4.4.(b) 5 0 0 4 5

1 0
0 −3
R4 →R4 + 45 R3 0 1 2 0 77
========== −3 = −3 · 1 · 1 · (−5) · = 231.
0 0 −5 13 5

0 77

0 0
5
So, det(C) = 231.
As we see from the above examples, each time we get an integer answer. The reason for this
may not be immediately clear from Algorithm 4.7, so let us make the following general remark:
Note 4.10. The determinant of a square matrix with integer entries is also an integer (even
though non-integer numbers may appear in intermediate steps during the calculation of the deter-
minant).
Proof. For 2 × 2 matrices this follows from Example 4.8. In general, this statement is an easy
consequence of Theorem 4.18 below. 
The following note may be helpful for concluding that the determinant of a matrix is zero
without actually computing it.
Note 4.11. If a row (a column) of a square matrix A is a multiple of another row (respectively
column) of A, then det(A) = 0.
Proof. If the Ri , the i-th row of a matrix A is equal to λRj , where Rj is the j-th row (i 6= j)
of A, then by applying the row operation Ri → Ri − λRj we get a matrix B, which has the same
determinant as A (by Theorem 4.4.(a)), and whose i-th row is zero. Hence, according to Note 4.3,
det(A) = det(B) = 0.
The statement for columns can be obtained from the statement for rows by taking the transpose
of the matrix (so that the columns become rows) and using Theorem 4.20 below. 
Example 4.12.
1 −3 −2 1

2 6 −1 −2
=0
4
9 3 −3
5 −12 4 4
by Note 4.11, as the second column equals (−3) × 4th column.
Let us now briefly discuss the issue of the existence of a determinant. Even though we have
already computed several determinants, we have not actually shown that the determinant function
exists in every dimension.  
a b
For 2 × 2 matrices we can define D : M2 (R) → R by setting D( ) = ad − bc. It is
c d
straightforward to check that this determinant
 enjoys
 all the required properties (D1)-(D3). For
a b
example, D(I2 ) = 1, so (D3) holds, and D( ) = ab − ba = 0, so (D2) also holds. Moreover,
a b
this is the only way to define the determinant of a 2 × 2 matrix, by Example 4.8.(b).
The fact that determinant exists in higher dimensions as well is not so elementary:
Theorem 4.13 (Existence and uniqueness of determinant). For every natural number n ∈ N
there exists a unique determinant D : Mn (R) → R satisfying axioms (D1), (D2), (D3) from
Definition 4.1.
Proof. The existence will be proved in Linear Algebra II (using a different formula for the
determinant). Uniqueness follows from Algorithm 4.7, as it tells us that the value of the determinant
can be calculated for any square matrix, using only the properties (D1)-(D3). 
4.3. CALCULATING DETERMINANTS USING COFACTORS 53

4.2. Determinants and invertibility


There is an important connection between the determinant of a square matrix A and the
existence of A−1 :
Theorem 4.14 (Determinant criterion for invertibility of a matrix). An n × n matrix A is
invertible if and only if det(A) 6= 0.
Proof. Let’s first prove that if A is invertible then det(A) 6= 0 (direction “⇒”).
So, suppose that A is invertible. Then by Theorem 3.22, A can be brought to In via a sequence
of row operations. Now, by Corollary 4.5, det(In ) = β · det(A) for some non-zero number β ∈ R.
Therefore det(A) = β1 det(In ) = β1 6= 0.
“⇐” Assume that det(A) 6= 0. Let us prove that A is invertible by contrapositive: supposing that
A is not invertible, we will show that det(A) = 0.
Indeed, if A−1 does not exist then rank A < n by Theorem 3.31. This means that some (any)
row echelon form B of A has a row of zeros (cf. Definition 3.28). But A can be brought to this
echelon form B via a finite sequence of row operations, so det(B) = β · det(A) for some β 6= 0 (by
Corollary 4.5). It follows that det(A) = β1 det(B) = 0, as B has a zero row (cf. Note 4.3). 

Example 4.15. The matrices A, B and C from Example 4.9 are all invertible by Theorem 4.14,
as their determinants are non-zero.

4.3. Calculating determinants using cofactors


Definition 4.16 (Minors and cofactors). Let A = (aij ) be an n × n matrix. The (i, j)-th minor
of A, Mij , is the determinant of the (n − 1) × (n − 1) matrix, obtained from A by removing the
i-th row and the j-th column.
The (i, j)-th cofactor of A, cij , is the real number defined by cij = (−1)i+j · Mij .
a11 · · · a1j · · · a1n
 
 .. .. .. 
 . . . 
In other words, if A =  ai1 · · · aij · · · ain  then its (i, j)-th minor is
 
 . .. .. 
 .. . . 
an1 · · · anj · · · ann n×n
a11 · · · a1,j−1 a1,j+1 · · · a1n
 
 .. .. .. .. 
 . . . . 
a · · · a a · · · a
 
Mij = det  i−1,1 i−1,j−1 i−1,j+1 i−1,n 
.

ai+1,1 · · · ai+1,j−1 ai+1,j+1 · · · ai+1,n 

 . .. .. .. 
 .. . . .
an1 · · · an,j−1 an,j+1 · · · ann (n−1)×(n−1)
 
1 2 3
5 6

5 6
Example 4.17. Let A = 4 5 6. Then M11 = 1+1
, so c11 = (−1) M11 = =
8 9 8 9
7 8 9
1 2 2+3
1 2
5 · 9 − 6 · 8 = −3. M23 = , so c23 = (−1) M23 = − = −(1 · 8 − 2 · 7) = 6.
7 8 7 8
Theorem 4.18. If A = (aij ) is an n × n matrix then for any i, j, 1 ≤ i, j ≤ n, the following
two equalities hold:
1) det(A) = ai1 ci1 + ai2 ci2 + . . . + ain cin (cofactor expansion by row i);
2) det(A) = a1j c1j + a2j c2j + . . . + anj cnj (cofactor expansion by column j).
Proof. Omitted. 
54 4. DETERMINANTS

Theorem 4.18 allows to calculate determinants inductively, as each ckl = (−1)k+l · Mkl and Mkl
is a determinant of a matrix of smaller size. A computation using cofactor expansions is especially
effective when the matrix contains many zeros.
Example 4.19. Calculate det(A) using cofactor expansion for
 
0 4 0 −2
2 −4 2 1
A=  .
0 3 −1 0
0 7 0 6 4×4
Since there are many zero’s in the 1st column, we will use expansion by 1st column:

0
4 0 −2
2 −4 2 1
= 0 · (−1)1+1 · M11 + 2 · (−1)2+1 · M21 + 0 · (−1)3+1 · M31 + 0 · (−1)4+1 · M41
0
3 −1 0
0 7 0 6

4 0 −2
= 2 · (−1)3 · 3 −1

0
7 0 6
 
1+2 3 0 2+2 4 −2 3+2 4 −2

= −2 · 0 · (−1) · + (−1) · (−1) · + 0 · (−1) ·
7 6 7 6 3 0

expand by col. 2
 
4 −2
= −2 · −
= 2 · (4 · 6 − (−2) · 7) = 2 · 38 = 76.
7 6

Ex. 4.8.(b)

So, det(A) = 76.


Theorem 4.20 (Determinant of the transpose). If A is an n×n matrix then det(A) = det(AT ).
Proof. Let A = (aij ). We will prove the statement by induction on n.
Base case: n = 1, i.e., A is a 2 × 1 matrix (a11 ). Then AT = (a11 ) = A, so det(A) = det(AT ) (= a11
by Example 4.8.(a)).
Inductive hypothesis: suppose that the statement has already been proved for n = k (where k is
some natural number). I.e., det(B) = det(B T ) for any k × k matrix B.
Step of induction: prove that the statement holds for n = k + 1. So, suppose A is a (k + 1) × (k + 1)
matrix. Let Mij be the (i, j)-th minor of A, and let Mij0 be the (i, j)-th minor of AT .
According to definition, Mij0 is the determinant of a k × k matrix Fij0 obtained from AT by
removing the 1st row and the j th column.
0 = F T , where F
Observe that F1j th row and
j1 j1 is the matrix obtained from A by removing the j
the 1st column. Thus M1j 0 = det(F 0 ) and M = det(F ), ∀j = 1, 2, . . . , n.
1j j1 j1
Now, by the induction hypothesis, we know that det(F1j 0 ) = det(F T ) = det(F ). Hence
j1 j1
0
M1j = Mj1 for all j = 1, 2, . . . , n.
Using the cofactor expansion by row 1 for det(AT ) we obtain:

0 0 0
det(AT ) = (AT )11 · (−1)1+1 · M11 + (AT )12 · (−1)1+2 · M12 + . . . + (AT )1n (−1)1+n · M1n
= a11 · (−1)1+1 · M11 + a21 (−1)2+1 · M21 + . . . + an1 (−1)n+1 Mn1 = det(A),
where the last equality follows from expansion of det(A) by the 1st column.
Hence we have proved that det(AT ) = det(A) for n = k + 1.
Conclusion: by induction, det(AT ) = det(A) holds for all natural numbers n. 
4.4. DETERMINANT OF A PRODUCT 55

4.4. Determinant of a product


The goal of this section is to prove the following fundamental property of determinants:
Theorem 4.21 (Determinant of a product is the product of determinants). If A and B are
n × n matrices then det(A B) = det(A) det(B).
To prove this theorem we will need an auxiliary lemma (recall that an n×n elementary matrix is
a matrix obtained from the identity matrix In by using a single row operation – see Definition 3.18).
Lemma 4.22. Let E be an n × n elementary matrix and let B be any n × n matrix. Then
det(E B) = det(E) det(B).
Proof. As we know, by definition, E is obtained from In by performing a single row operation
RO. So, we have 3 cases.
Case 1: RO is a row replacement. Then, by Theorem 4.4.(a), det(E) = det(In ) = 1 and det(E B) =
det(B) because E B is obtained from B using row replacement (see Note 3.19). Hence det(E B) =
det(E) det(B) in this case.
Case 2: RO is a row interchange. Then, by Theorem 4.4.(b), det(E) = − det(In ) = −1 and
det(E B) = − det(B) (again, because E B is obtained from B using row interchange by Note 3.19).
So, det(E B) = − det(B) = det(E) det(B) in this case.
Case 3: RO is row scaling by some λ ∈ R, λ 6= 0. Then Theorem 4.4.(c) implies that det(E) =
λ det(In ) = λ and det(E B) = λ det(B) (by Note 3.19). Therefore det(E B) = det(E) det(B) in
this case as well. 
Now we are ready to prove Theorem 4.21, stated in the beginning of the section.
Proof of Theorem 4.21. Let A, B be two square n × n matrices.
Case 1: det(A) = 0. Then A is not invertible by Theorem 4.14. Let us show that in this case A B
cannot be invertible either. Indeed, if (AB)−1 existed, then (A B) (A B)−1 = In , so A C = In , where
C = B (A B)−1 (by associativity of matrix product), i.e., A would be invertible (by Theorem 3.23),
contradicting our assumption. Hence, A B is not invertible, so det(AB) = 0 by Theorem 4.14.
Thus det(AB) = 0 = 0 · det(B) = det(A) det(B), and the statement holds in this case.
Case 2: det(A) 6= 0. Then A is invertible by Theorem 4.14. So, according to Theorem 3.22.(iv),
we can find elementary matrices E1 , . . . , Ek ∈ Mn (R) such that A = E1 E2 . . . Ek . Therefore
A B = (E1 E2 . . . Ek ) B = E1 (E2 (. . . (Ek B))), by associativity of matrix multiplication. Now,
applying Lemma 4.22 k times, we get:
det(A B) = det(E1 (E2 (. . . (Ek B)))) = det(E1 ) det(E2 (. . . (Ek B))) (by Lemma 4.22)
(4.1) = det(E1 ) det(E2 ) det(E3 . . . Ek B) = . . . (by Lemma 4.22)
= det(E1 ) det(E2 ) · · · det(Ek ) det(B) (by Lemma 4.22).
On the other hand, A = (E1 . . . Ek ) In , so, by doing the same for det(A), we get

(4.2) det(A) = det(E1 ) det(E2 ) · · · det(Ek ) det(In ) = det(E1 ) · · · det(Ek ).



(D3)

Combining equations (4.1) and (4.2) together, we see that det(A B) = det(A) det(B) holds in this
case as well. 
Exercise. Prove Theorem 4.20 using elementary matrices, similarly to the proof of Theo-
rem 4.21 above.
Corollary 4.23. Let A be an n × n matrix. Then for any s ∈ N = {1, 2, 3, . . .}, det(As ) =
(det(A))s . If, in addition, A is invertible (i.e., det(A) 6= 0), then det(A−1 ) = (det(A))−1 and
det(As ) = (det(A))s for all s ∈ Z, where Z = {0, ±1, ±2, . . .} denotes the set of integers.
56 4. DETERMINANTS

Proof. Exercise (use Theorem 4.21 and Proposition 2.29). 


Example 4.24. Let
   
0 4 0 −2 1 0 0 0
 and B = 1/3
√ √2 0 0
2 −4 2 1 
A= .
0 3 −1 0  3
√ 7 −4 0
3
0 7 0 6 5 −4 −10 −2

Calculate det(A2 B −2 ).
First, recall that det(A) = 76 (see Example 4.19). On the other hand, observe that B is a lower
triangular matrix, so B T is upper triangular. So, according to Theorem 4.20,
√ √
3
1 1/3
√3 5
0 2 7 −4 Thm. 4.6
det(B) = det(B T ) = ======= 1 · 2 · (−4) · (−2) = 16.
0 0 −4 −10
0 0 0 −2

Now, det(A2 · B −2 ) = det(A2 ) det(B −2 ) = det(A)2 · (det(B))−2 , where we used Theorem 4.21
and Corollary 4.23 (note that B is invertible by Theorem 4.14, as det(B) = 16 6= 0).
Hence det(A2 · B −2 ) = (76)2 · (16)−2 = 361/256.

4.5. Inverting a matrix using cofactors


Let A = (aij ) be an n × n matrix, and let cij denote the (i, j)-th cofactor of A (see Defini-
tion 4.16), 1 ≤ i, j ≤ n.
Definition 4.25. (Matrix of cofactors and adjugate matrix.)
(1) The matrix of cofactors of A is the matrix C = (cij ), where cij is the (i, j)-th cofactor of
A.
(2) The adjugate (or adjunct) matrix of A, adj(A), is the transpose of the matrix of cofactors,
i.e., (adj(A))ij = cji for all i, j, 1 ≤ i, j ≤ n.
Theorem 4.26. Suppose that an n × n matrix A has det(A) 6= 0. Then A is invertible and
A−1 = det(A)
1
· adj(A).

Note 4.27. the fact that det(A) 6= 0 implies that A is invertible has already been proved in
Theorem 4.14. So, Theorem 4.26 only provides an explicit formula for the inverse matrix A−1 ,
which can be used as an alternative to Algorithm 3.25 (for finding the inverse matrix).
 
1
Proof of Theorem 4.26. By Theorem 3.23, it is enough to show that A |A| adj(A) = In ,
where |A| = det(A). This is obviously equivalent to
 
|A| 0 . . . 0
 0 |A| . . . 0 
A adj(A) = |A| In =  .. .
 
.. ..
 . . . 
0 0 . . . |A|
Take any pair of indices i, j, 1 ≤ i, j ≤ n.
Case 1: i = j. Then
n n n
X X i=j X Thm. 4.18
(A adj(A))ij === aik · (adj(A))kj === aik cjk ==== aik cik ======== |A|.
↑ k=1 ↑ k=1 k=1
Def. 2.8 Def. 4.25 | {z }
expansion of |A| by row i
4.5. INVERTING A MATRIX USING COFACTORS 57

n
P
Case 2: i 6= j. As we have just seen, (A · adj(A))ij = aik cjk . Consider an auxiliary matrix B,
k=1
which is obtained from A by replacing the j-th row of A by the i-th row (here the diagram assumes
that i < j):

a11 . . . . . . a1n
 
 .. .. 
 . . 
 ai1 . . . . . . ain 
 
row i →
 . .. 
B=  . .
 . . 

row j →  ai1 . . . . . . ain 
 
 . .. 
 .. . 
an1 . . . . . . ann n×n
Since B differs from A only in its j-th row, the (j, k)-th minor of B coincides with the (j, k)-th
n
P
minor of A for any k, so det(B) = aik cjk by Theorem 4.18 (expansion by row j). But det(B) = 0
k=1
n
P
by (D2) from Definition 4.1, hence (A adj(A))ij = aik cjk = 0 if i 6= j.
k=1
Cases 1 & 2 together yield that A adj(A) = |A| In ,
thus
 
1 1
A· adj(A) = (A · adj(A)) = In ,
|A| |A|
as required. 
 
0 2 0
Example 4.28. Let A = 1 −1 1. Calculate A−1 using the adjugate matrix.
0 0 −1
First, let’s find the matrix of cofactors.

1+1
−1 1 1+2
1 1 1+3
1 −1
c11 = (−1) · = 1, c12 = (−1) · = 1, c13 = (−1) · = 0,
0 −1 0 −1 0 0

2 0 0 0 0 2
c21 = (−1)2+1 · = 2, c22 = (−1)2+2 · = 0, c23 = (−1)2+3 · = 0,
0 −1 0 −1 0 0

3+1 2 0
0 0 0 2
= (−1)3+2 · = (−1)3+3 ·

c31 = (−1) · = 2, c32 = 0, c33 = −2.
−1 1 1 1 1 −1

 
1 1 0
So, C = 2 0 0 is the matrix of cofactors. Expanding det(A) by row 1, we get det(A) =
2 0 −2
0 · c11 + 2 · c12 + 0 · c13 = 2. Since det(A) 6= 0 we know that A−1 exists. Therefore
   
1 2 2 1/2 1 1
1
adj(A) = C T = 1 0 0 and A−1 = adj(A) = 1/2 0 0 .
0 0 −2 |A| 0 0 −1
Check:     
0 2 0 1/2 1 1 1 0 0
A A−1 = 1 −1 1 1/2 0 0 = 0 1 0 = I3 X.
0 0 −1 0 0 −1 0 0 1
Note 4.29. While this method does allow to compute the inverse of a matrix, it usually takes
much longer than the method described in Algorithm 3.25. It is only efficient if sufficiently many
entries of the matrix are zero.
CHAPTER 5

Linear transformations

Linear maps are particularly nice and amenable functions, which can be studied using matrices.
In this chapter all of the material we have learned so far comes together in developing the theory
of linear transformations.

5.1. Basic definitions and properties


Definition 5.1 (Linear transformation). Let n, m ∈ N and let T : Rn → Rm be a function.
Then T is a linear transformation (or, simply, T is linear ) if for all u, v ∈ Rn and for all λ ∈ R the
following two conditions are satisfied:
(1) T (u + v) = T (u) + T (v);
(2) T (λ u) = λ T (u).
Example 5.2. (a) Suppose that n = m = 1, so T : R → R. Let T (x) = −2x. Let us check that
T is a linear transformation:
(1) if x1 , x2 ∈ R, then T (x1 + x2 ) = −2(x1 + x2 ) = −2x1 + (−2x2 ) = T (x1 ) + T (x2 ).
(2) if x ∈ R and λ ∈ R then T (λ x) = −2(λ x) = λ(−2 x) = λT (x) (by associativity and
commutativity in R).
Thus T satisfies both conditions from Definition 5.1, so it is a linear transformation.
(b) Now try F : R → R, F (x) = x + 1. This is not linear: F (1 + 1) = (1 + 1) + 1 = 3, but
F (1) + F (1) = 2 + 2 = 4, so condition (1) from Definition 5.1 does not hold.
(c) Of course, G : R → R given by G(x) = x2 is not linear either. (Check yourself!)
Main examples of linear transformations come from multiplication by matrices. Think of Rn
as the set of column vectors (n × 1 matrices). If A is an m× nmatrix, then we have a function
x1
TA : Rn → Rm , defined by TA (x) = A x for all x ∈ Rn x =  ...  = (x1 , . . . , xn )T . Indeed:
   
| {z }
xn row vector transposed
A has size m × n, x has size n × 1, so TA (x) = A x is an m × 1 column vector from Rm .
Lemma 5.3. The function TA : Rn → Rm described above is a linear transformation.
Proof. Let u, v ∈ Rn and λ ∈ R. Then
TA (u + v) = A (u + v) (by definition of TA )
= Au + Av (by distributivity of matrix product)
= TA (u) + TA (v) (by definition of TA ).

TA (λ u) = A (λu) (by definition of TA )


= λAu (by property of matrix multiplication – see Theorem 2.13.(iv))
= λ TA (u) (by definition of TA ).
Hence TA is a linear transformation. 
The above lemma gives rise to many examples of linear transformations.
58
5.1. BASIC DEFINITIONS AND PROPERTIES 59
 
1 2
Example 5.4. (a) Let A = , so TA : R2 → R2 and
3 4 2×2
        
x x 1 2 x x + 2y
TA (x) = TA ( )=A = = .
y y 3 4 y 3x + 4y
| {z }
x
 
1 0 0
(b) B = , so TB : R3 → R2 and
0 1 0 2×3
 
  x  
1 0 0   x
TB (x) = B x = y = .
0 1 0 y
z
| {z }
x

In other words, TB is the orthogonal projection of R3 onto the xOy plane.


(c) The zero transformation T0 : Rn → Rm is defined by T0 (x) = 0 for all x ∈ Rn . The matrix of
T0 is the m × n zero matrix.
(d) The identity transformation Id = IdRn : Rn → Rn is defined by Id(x) = x for all x ∈ Rn . The
matrix of Id is In (the n × n identity matrix).
Lemma 5.5. Let A = (aij ) be an m × n matrix, and let c1 , . . . , cn bethe 
column vectors of A
a1i x1
(so ci =  ...  ∈ Rm , i = 1, 2, . . . , n). Then for any vector x =  ...  ∈ Rn we have
   

ami m×1 xn n×1


A x = x1 c1 + · · · + xn cn .
Proof.
     
a11 ... a1n x1 a11 x1 + . . . + a1n xn
 .. ..   ..  ..
Ax =  . =
 
.   .  . 
am1 . . . amn m×n xn n×1 am1 x1 + . . . + amn xn m×1
       
a11 x1 a1n xn a11 a1n
=  ...  + · · · +  ...  = x1  ...  + · · · + xn  ... 
       

am1 x1 m×1 amn xn m×1 am1 m×1 amn m×1


= x1 c1 + · · · + xn cn .

Recall that the standard basis of Rn consists of vectors e1 , . . . , en , where
ei = (0, . . . , 0, 1 , 0, . . . , 0)T .

i-th entry

Lemma 5.3 tells us that every matrix gives rise to a linear transformation. We will now formulate
and prove the converse statement:
Theorem 5.6. Suppose that T : Rn → Rm is a linear transformation. Let {e1 , . . . , en } be the
standard basis of Rn and let A be the m × n matrix with column vectors T (e1 ), . . . , T (en ), i.e.,
A = (T (e1 ) . . . T (en ))m×n .
Then T (x) = A x for all x ∈ Rn . Moreover, A is the unique matrix with this property (that
T (x) = A x for all x ∈ Rn ).
60 5. LINEAR TRANSFORMATIONS
  
x1
Proof. Let x = (x1 , . . . , xn )T =  ...  be any vector in Rn . Then
  

xn
     
  1 0 0
x1 0 1 0
x =  ...  = x1  ..  + x2  ..  + · · · + xn  ..  = x1 e1 + · · · + xn en .
       
. . .
xn
0 0 1
Since T is a linear transformation, we have
T (x) = T (x1 e1 + · · · + xn en ) = T (x1 e1 ) + · · · + T (xn en )
Lemma 5.5 
= x1 T (e1 ) + · · · + xn T (en ) ======== T (e1 ) . . . T (en ) x = A x.
| {z }
m×n matrix A
Now let’s prove the uniqueness of A. Suppose that A0
is another m × n matrix such that
T (x) = A x for all x ∈ R . Then T (ei ) = A ei = ci , where c0i is the i-th column vector of A0 ,
0 n 0 0

i = 1, 2, . . . , n. Hence A0 has the same column vectors as A, yielding that A0 = A. 


   
x 3x + 2y
Example 5.7. Let T : R3 → R3 be the linear transformation given by T (y ) =  x − z .
z x − 4y + z
Find the matrix A corresponding to T .
To do this, let’s evaluate T (e1 ), T (e2 ) and T (e3 ):
           
1 3 0 2 0 0
T (e1 ) = T (0) = 1 , T (e2 ) = T (1) =  0 and T (e3 ) = T (0) = −1 .
0 1 0 −4 1 1
 
3 2 0
Hence, by Theorem 5.6, A = 1 0 −1 .
1 −4 1
    3×3
x 3x + 2y
Check: A x = A y =    x − z  = T (x) for all x ∈ R3 X.
z x − 4y + z
Definition 5.8 (Matrix of a linear transformation). For a linear transformation T : Rn → Rm ,
the matrix A, given by Theorem 5.6, is said to be the matrix of the linear transformation T .
Proposition 5.9. Let T : Rn → Rm be a linear transformation with matrix A (of size m × n).
Then    
0 0
 ..   .. 
(i) T ( .  ) = . (i.e., T maps the zero vector of Rn to the zero vector of Rm );
0 n×1 0 m×1
(ii) if m = n and det(A) 6= 0 then T maps straight lines to straight lines;
(iii) if m = n and det(A) 6= 0 then T maps parallel lines to parallel lines.
Proof. (i) Since 0 = 0 0 in Rn , by linearity of T we have T (0) = T (0 0) = 0 T (0) = 0 ∈ Rm .
(ii) Exercise.
(iii) Let L1 , L2 be two lines in Rn parallel to vectors a1 , a2 respectively (a1 , a2 ∈ Rn , a1 , a2 6= 0).
If L1 is parallel to L2 , then a1 is parallel to a2 , i.e., there exists µ 6= 0 such that a2 = µ a1 . From
the argument for claim (ii) we know that for each i = 1, 2, T (Li ) is a line parallel to the vector
T (ai ). Since T (a2 ) = T (µ a1 ) = µ T (a1 ), we can conclude that T (L2 ) is parallel to T (L1 ). 
Proposition 5.9 implies that under linear transformations parallelograms are sent to parallelo-
grams (possibly degenerate if det(A) = 0).
5.2. LINEAR TRANSFORMATIONS OF R2 61

5.2. Linear transformations of R2


Example 5.10. Theorem 5.6 tells us that any linear transformation
  is 
completely
 determined
1 0
by the images of the vectors from the standard basis: e1 = , e2 = in the case when
0 1
T : R2 → R2 . These basic vectors e1 , e2 form the unit square in R2 . The image of the unit square
under a linear transformation T : R2 → R2 is a parallelogram, and by sketching this image we can
sketch the effect of T on R2 .      
0 −1 x −y
For example, suppose that T is given by the matrix (i.e., T ( )= ) then
1 0 y x
   
0 −1
T (e1 ) = , T (e2 ) = . Hence we can sketch the effect of T on R2 as on Figure 5.1.
1 0

Figure 5.1

Clearly T is a rotation about the origin by π ◦


2 (90 ) anti-clockwise.
Other kinds of linear transformations of R2 : rotations about the origin, reflections in lines
through the origin, stretching/shrinking in various directions, shears, etc.
Example 5.11 (Rotation about the origin by angle θ anti-clockwise). Let T : R2 → R2 be the
rotation about the origin by angle θ anti-clockwise (see Figure 5.2).

Figure 5.2

cos(θ + π2 )
     
cos θ − sin θ
Then T (e1 ) = , T (e2 ) = = . Hence, by Theorem 5.6, the
sin θ sin(θ + π2 ) cos θ
 
cos θ − sin θ
matrix of T is .
sin θ cos θ
Example 5.12 (Reflection in a line through the origin). Now, suppose that T : R2 → R2 is
the reflection in the line L line making angle θ ∈ R with the positive x-axis (so L : y = tan θ x,
provided cos θ 6= 0). Figure 5.3 assumes that L passes through the first quadrant and 0 < θ < π/4.
 
cos(2θ)
Then the angle between T (e1 ) and the positive x-axis is 2θ, hence T (e1 ) = . On the
sin(2θ)
other hand, the angle between the positive x-axis and T (e2 ) is −( π2 − 2θ) as we have to measure
angles anti-clockwise
 starting from  the
 positive x-axis.
cos(2θ − π2 )
  
sin(2θ) cos(2θ) sin(2θ)
Hence T (e2 ) = = . Thus the matrix of T is .
sin(2θ − π2 ) − cos(2θ) sin(2θ) − cos(2θ)
62 5. LINEAR TRANSFORMATIONS

Figure 5.3

 
cos θ − sin θ
Note that det = cos2 θ + sin2 θ = 1, i.e., the determinant of the matrix of
sin θ cos θ
 
cos(2θ) sin(2θ)
a rotation is always equal to 1. On the other hand, det = − cos2 (2θ) −
sin(2θ) − cos(2θ)
sin2 (2θ) = −1, i.e., the determinant of the matrix corresponding to a reflection is −1.
Example 5.13. Let T be the reflection in the line L, passing through the origin and the second
quadrant, making angle 5π
6 with x-axis (anti-clockwise).
Thus L has equation y = tan( 5π √x
6 )x, i.e., y = − 3 .

According to Example 5.12, T is given by the


√ matrix
cos( 5π 5π
 
3 ) sin( 3 ) 1/2
√ − 3/2
A= = .
sin( 5π 5π
3 ) − cos( 3 ) − 3/2 −1/2

 
x
So, the image of any point ∈ R2 under T can be computed by
y
     √    √ 
x x 1/2
√ − 3/2 x x/2
√ − 3y/2
T( )=A = = ∈ R2 .
y y − 3/2 −1/2 y − 3x/2 − y/2 2×1
√ !
2+ 3
 
2
E.g., T ( ) = 1−22√3 .
−1
2
 
2 2 λ 0
Example 5.14. If T : R → R has matrix , for some λ, µ > 0, then T stretches/shrinks
0 µ
the plane by λ in the direction of x-axis and by µ in the direction of y-axis.

For example, here is the


diagram demonstrating
the effect
 of 
3 0
A = on the
0 12
unit square:
   
1 λ 1 0
Example 5.15 (Shear). If A = or A = then T = TA is called a shear. It
0 1 µ 1
leaves one of the axes fixed and moves all other points in the direction parallel to this axis. Indeed,
5.3. COMPOSITION OF LINEAR TRANSFORMATIONS 63
       
1 λ x x + λy 1 0 x
= , so all points on the x-axis are fixed (when y = 0). Or, =
0 1 y y µ 1 y
 
x
, i.e., this transformation fixes the y-axis (when x = 0).
µx + y
         
1 1 1 1 1 2
For example, let A = , then TA (e1 ) = , TA (e2 ) = = e1 + e2 , TA ( )= .
0 1 0 1 1 1
This allows us to sketch the effect of this transformation on the unit square: see Figure 5.4.

Figure 5.4

5.3. Composition of linear transformations


Definition 5.16 (Composition). Given two transformations T : Rn → Rm and S : Rm → Rk ,
the composition Q = S ◦ T is the transformation from Rn to Rk (Q : Rn → Rk ) defined by the
formula Q(x) = (S ◦ T )(x) = S(T (x)) for all x ∈ Rn .

Rm
Diagram: T S

Q=S◦T
Rn Rk
Lemma 5.17. The composition of linear transformations is again a linear transformation.
Proof. Exercise. (Hint: one needs to check that if Q = S ◦ T , then for all u, v ∈ Rn , λ ∈ R,
Q(u + v) = Q(u) + Q(v) and Q(λu) = λQ(u)). 
Theorem 5.18 (Matrix of the composition). Let T : Rn → Rm and S : Rm → Rk be linear
transformations with matrices A and B respectively. Then S ◦ T is a linear transformation with
matrix B A (product of B and A).
Proof. The fact that S ◦ T : Rn → Rk is a linear transformation is given by Lemma 5.17, thus
we only need to prove the claim about its matrix.
Note that according to the assumptions, T (x) = A x for all x ∈ Rn and S(y) = B y for all
y ∈ Rm . Therefore, for all x ∈ Rn we have

(S ◦ T )(x) = S T (x) (by definition of S ◦ T )
= S(A x) (as A is the matrix of T )
= B (A x) (as B is the matrix of S and y = A x ∈ Rm )
= (B A) x (by associativity of matrix multiplication).
Hence (S ◦ T )(x) = (B A) x for all x ∈ Rn , and we can apply Theorem 5.6 to conclude that B A is
the matrix of S ◦ T , as claimed. 
Note 5.19. Theorem 5.18 is actually the reason why the multiplication of matrices is defined
as in Definition 2.8.
Example 5.20. Let S, T : R2 → R2 be reflections in lines L1 , L2 in R2 , making angles φ and θ
with the positive x-axis respectively. What is S ◦ T ?
64 5. LINEAR TRANSFORMATIONS

Of course S ◦ T : R2 → R2 is a linear transformation by Lemma 5.17, soit is completely deter- 


cos(2θ) sin(2θ)
mined by its matrix. According to Example 5.12, the matrix A, of T , is A = ,
sin(2θ) − cos(2θ)
 
cos(2φ) sin(2φ)
and the matrix B, of S, is B = . Hence, by Theorem 5.18, Q = S ◦ T has
sin(2φ) − cos(2φ)
the matrix
 
cos(2φ) cos(2θ) + sin(2φ) sin(2θ) cos(2φ) sin(2θ) − sin(2φ) cos(2θ)
C =BA=
sin(2φ) cos(2θ) − cos(2φ) sin(2θ) sin(2φ) sin(2θ) + cos(2φ) cos(2θ)
 
trig. identities cos(2φ − 2θ) − sin(2φ − 2θ)
========== .
sin(2φ − 2θ) cos(2φ − 2θ)
This is the matrix of the rotation by the angle 2φ − 2θ about the origin. So, S ◦ T is this rotation.
Similarly to the above example one can show that the composition of two rotations of R2 about
the origin is again a rotation about the origin. On the other hand, the composition of a rotation
with a reflection (in any order) is a reflection.

5.4. The inverse of a linear transformation


Definition 5.21 (Inverse of a linear transformation). Let T : Rn → Rn be a linear transfor-
mation. A linear transformation S : Rn → Rn is said to be the inverse of T if S ◦ T = Id and
T ◦ S = Id, where Id : Rn → Rn is the identity map. In other words, S is the inverse of T if
(S ◦ T )(x) = x and (T ◦ S)(x) = x, for all x ∈ Rn .
If such S exists, the transformation T is said to be invertible and we write S = T −1 .
Actually, a linear transformation is invertible if and only if it is invertible as a function:
Note 5.22. Suppose that T : Rn → Rn is a linear transformation. Then T is invertible if and
only if T is bijective (as a function from Rn to Rn ).
Proof. Of course, if T is invertible and S = T −1 then S : Rn → Rn is the inverse function to
T , by definition. Hence T must be a bijection.
Conversely, assume that T is a bijection. Then there is an inverse function S : Rn → Rn such
that S ◦ T = Id and T ◦ S = Id. So, we only need to show that this function S is also a linear
transformation. To this end, consider any u, v ∈ Rn and any λ ∈ R.
Let a = S(u) ∈ Rn and b = S(v) ∈ Rn . Since T ◦ S = Id, we see that T (a) = T (S(u)) = u
and T (b) = T (S(v)) = v. Therefore
linearity of T S ◦ T = Id def. of a, b
S(u + v) = S(T (a) + T (b)) ========== S(T (a + b)) ======== a + b ======== S(u) + S(v).
And
linearity of T S ◦ T = Id def. of a
S(λ u) = S(λ T (a)) ========== S(T (λ a)) ======== λ a ======= λ S(u).
Thus we have shown that S satisfies the definition of a linear transformation (see Definition 5.1),
and the proof is complete. 
If T has matrix A (n × n), then its inverse must have matrix A−1 by Theorem 5.18.
Not all linear transformations from Rn to Rn are invertible. In fact, T is invertible if and only
if its matrix A is invertible (⇔ det(A) 6= 0) – see Proposition 5.24 below.
Example 5.23. (a) The inverse of the rotation by angle θ anti-clockwise about the origin, is
the rotation by angle −θ. In terms of matrices:
     
cos(−θ) − sin(−θ) cos(θ) − sin(θ) cos(θ) sin(θ) cos(θ) − sin(θ)
=
sin(−θ) cos(−θ) sin(θ) cos(θ) − sin(θ) cos(θ) sin(θ) cos(θ)
| {z }| {z }
matrix of rotation by −θ matrix of rotation by θ

cos (θ) + sin2 (θ)


 2   
0 1 0
= = = I2 .
0 sin2 (θ) + cos2 (θ) 0 1
5.4. THE INVERSE OF A LINEAR TRANSFORMATION 65

(b) The inverse of the reflection in a line through the origin is itself (Exercise: check this).
   
1 λ 1 −λ
(c) The inverse of the shear T with matrix is the shear S with matrix (because
0 1 0 1
       
1 −λ 1 λ 1 0 1 λ 1 −λ
= = I2 and = I2 ).
0 1 0 1 0 1 0 1 0 1
 
2 1
(d) If A = and T = TA : R2 → R2 , then T is not invertible by Proposition 5.24 below, as
0 0
A is not invertible (because det(A) = 0).
Proposition 5.24. Let T : Rn → Rn be a linear transformation and let A ∈ Mn (R) be its
matrix. Then T is invertible if and only if A is invertible.
And if T is invertible then A−1 is the matrix of T −1 . (In particular, T −1 is unique!)
Proof. “⇒” Suppose that T is invertible, i.e., there is a linear transformation S : Rn → Rn
such that S ◦ T = Id = T ◦ S, where Id is the identity map from Rn to Rn . Let B be the matrix
of S (it exists by Theorem 5.6). Then Theorem 5.18 implies that B A = In and A B = In , as In is
the matrix of Id : Rn → Rn . Hence A is invertible and B = A−1 .
“⇐” Suppose that A is invertible. Then we can let S : Rn → Rn be the linear transformation
given by A−1 : S(x) = A−1 x for all x ∈ Rn (S is linear by Lemma 5.3). It follows that
assoc.
(S ◦ T )(x) = S(T (x)) = S(A x) = A−1 (A x) ===== (A−1 A) x = In x = x, for all x ∈ Rn .
Similarly, (T ◦ S)(x) = (A A−1 ) x = In x = x for all x ∈ Rn . Thus S ◦ T = Id and T ◦ S = Id. So,
T is invertible and S is its inverse. 
Note 5.25. Let T : Rn → Rn and S : Rn → Rn be linear transformations such that S ◦ T = Id
where Id is the identity transformation from Rn to itself. Then both S and T are invertible and
S = T −1 , T = S −1 .
Proof. Exercise. (Hint: see Theorem 3.23.) 
Example 5.26. Let T : R2 → R2 be the linear transformation given by the formula
   
x −2x + y
T( )= .
y 5x − 3y
Determine whether T is invertible and find the formula for its inverse (if it exists).
       
1 −2 0 1
T (e1 ) = T ( )= , T (e2 ) = T ( )= .
0 5 1 −3
 
−2 1
So the matrix of T is A = . Observe that det(A) = (−2) · (−3) − 1 · 5 = 1 6= 0, so,
5 −3 2×2
by Theorem 2.25, A is invertible and
   
−1 1 −3 −1 −3 −1
A = = .
1 −5 −2 −5 −2
Hence, by Proposition 5.24, T is invertible and the matrix of T −1 is A−1 . Therefore
        
−1 x −1 x −3 −1 x −3x − y
T ( )=A = = .
y y −5 −2 y −5x − 2y
CHAPTER 6

Subspaces of Rn

Subspaces of Rn are the non-empty subsets that are closed under vector addition and scaling.
Basic examples of subspaces are lines and planes passing through the origin. Each subspace can
be thought of as a copy of Rm inside Rn , for some m ≤ n. Subspaces occur naturally in Linear
Algebra, as null spaces and ranges of linear transformations.
The goal of this chapter is to introduce the concepts of subspaces, spans and linear independence
in the case of Rn , preparing the reader for a more general and abstract discussion of these notions
in Linear Algebra II.

6.1. Definition and basic examples


Definition 6.1 (Subspace). A subset V of Rn is said to be a subspace if the following three
conditions hold:
(i) V 6= ∅ (V is non-empty);
(ii) for all u, v ∈ V , u + v ∈ V (V is closed under addition);
(iii) for all u ∈ V and all λ ∈ R, λ u ∈ V (V is closed under multiplication by scalars).

Example 6.2. (Basic examples of subspaces)


(1) Rn itself is a subspace.
(2) The zero subspace V0 = {0} ⊂ Rn . Let us check that it satisfies the 3 conditions from
Definition 6.1:
(i) V0 6= ∅ as 0 ∈ V0 X.
(ii) If u, v ∈ V0 then u = v = 0, so u + v = 0 + 0 = 0 ∈ V0 X.
(iii) If u ∈ V0 and λ ∈ R, then u = 0, so λ u = λ 0 = 0 ∈ V0 X.
(3) Let a ∈ Rn be a non-zero vector. The line L, passing through the origin and parallel
to a, can be identified with the subset V = {µ a | µ ∈ R} ⊂ Rn (think that each µ a is the
position vector of a point of L). Let us check the 3 defining conditions of a subspace:
(i) Take µ = 0, then 0 = 0 a ∈ V , hence V 6= ∅.
(ii) If u, v ∈ V then there exist µ1 , µ2 ∈ R such that u = µ1 a and , v = µ2 a. Therefore
u + v = µ1 a + µ2 a = (µ1 + µ2 ) a ∈ V , as µ1 + µ2 ∈ R.
(iii) Suppose that u ∈ V and λ ∈ R. Then there exists µ ∈ R such that u = µ a, so
λ u = λ (µ a) = (λµ) a ∈ V , as λµ ∈ R.
Hence the conditions (i), (ii) and (iii) all hold, so V is a subspace.
(4) Let Π be a plane passing through the origin in R3 . Then Π has a Cartesian equation
ax + by + cz = 0, for some a,b,c, ∈ R (not all zero). Thus Π can be identified with the set
x
of (position) vectors V = {y  ∈ R3 | ax + by + cz = 0}.
z
Again,letus check that V is a subspace of R3 .
0
(i) 0 = 0 ∈ V as a · 0 + b · 0 + c · 0 = 0, so V 6= ∅.
0
66
6.2. NULL SPACES 67
   
x1 x2
(ii) If u, v ∈ V then u =  y1 , v =  y2  are such that axi + byi + czi = 0 for i = 1, 2.
z z2
 1 
x1 + x2
Therefore u + v =  y1 + y2 , and, using standard properties of real numbers, we
z1 + z2
have
a(x1 + x2 ) + b(y1 + y2 ) + c(z1 + z2 ) = (ax1 + by1 + cz1 ) + (ax2 + by2 + cz2 ) = 0.
Thus u +v ∈V , i.e., V is closed under addition.  
x1 λx1
(iii) Let u =  y1  ∈ V and λ ∈ R. Then ax1 + by1 + cz1 = 0 and λ u =  λy1 . So,
z1 λz1
using standard properties of real numbers, we get
a(λx1 ) + b(λy1 ) + c(λz1 ) = λ(ax1 + by1 + cz1 ) = λ · 0 = 0.
It follows that λ u ∈ V , i.e., V is closed under multiplication by scalars.
Since the conditions (i)-(iii) all hold, we can conclude that V is a subspace of R3 .
Observe that any subspace must contain the zero vector:
Note 6.3. If V is a subspace of Rn then 0 ∈ V .
Proof. Indeed, by condition (i), V is non-empty, so there is some vector a ∈ V . Now, by
condition (iii), 0 a ∈ V , but 0 a = 0, hence 0 ∈ V . 
Note 6.3 implies thatonly a line (plane)
  passing
  through
 the origin gives rise to a subspace.
x x 1 3
E.g., the line U = { ∈ R2 | = +µ , µ ∈ R} is not a subspace because
y y 1 −2
     
0 0 1 + 3µ
0= ∈/ U . (Indeed, otherwise, 0 = = for some µ ∈ R, so µ = − 13 and µ = 12 –
0 0 1 − 2µ
a contradiction; hence no such µ ∈ R exists).
Similarly the plane Π : −3x + 5y − 6z = 7 is not a subspace by Note 6.3, as −3 · 0 + 5 · 0 − 6 · 0 =
0 6= 7.

6.2. Null spaces


Definition 6.4 (Null space). (a) Let T : Rn → Rm be a linear transformation. Then the null
space (kernel ) of T is the subset of Rn defined by

N(T ) = {x ∈ Rn | T (x) = 0 ∈ Rm }.
(b) Let A be an m × n matrix. The null space of A is the subset of Rn given by
N(A) = {x ∈ Rn | A x = 0 ∈ Rm }.
Note 6.5. Clearly, if T : Rn → Rm is a linear transformation and A is its matrix (of size m×n),
then N(T ) = N(A) ⊆ Rn .
Proposition 6.6. The null space of a linear transformation T : Rn → Rm (or of an m × n
matrix A) is a subspace of Rn .
Proof. Let us check the 3 conditions from Definition 6.1.
(i) 0 ∈ N(T ), as T (0) = 0 ∈ Rm by Proposition 5.9.(i). Therefore N(T ) 6= ∅.
(ii) If u, v ∈ N(T ), then
linearity of T as u, v ∈ N(T )
T (u + v) ========== T (u) + T (v) ========== 0 + 0 = 0.
Hence u + v ∈ N(T ).
68 6. SUBSPACES OF Rn

(iii) If u ∈ N(T ) and λ ∈ R, then


linearity of T as u ∈ N(T )
T (λ u) ========== λ T (u) ========= λ 0 = 0,
so λ u ∈ N(T ).
Therefore N(T ) is a subspace of Rn . The proof that N(A) is a subspace is left as an exercise. 
Example 6.7. Find the null space of the linear transformation T : R4 → R3 given by the
formula
 
x1  
x2  x1 + x2 + x3
T (
x3 ) =
  2x1 + x2 + x4 .
−x1 + x2 + 3x3 − 2x4
x4
The matrix of T is  
1 1 1 0
A= 2 1 0 1 .
−1 1 3 −2 3×4
Thus N(T ) = {x ∈ R4 | T (x) = 0} = {x ∈ R4 | A x = 0}, and we need to find the solution of the
equation A x = 0. Let us apply Gaussian elimination:
  R2 → R2 − 2R1  
1 1 1 0 0 R3 → R3 + R1
1 1 1 0 0 R3 → R3 + 2R2
(A | 0) =  2 1 0 1 0  −−−−−−−−−−−→  0 −1 −2 1 0  −−−−−−−−−−−→
−1 1 3 −2 0 0 2 4 −2 0
     
1 1 1 0 0 R2 → −R2
1 1 1 0 0 R1 → R1 − R2
1 0 −1 1 0
 0 −1 −2 1 0  −−−−−−−−→  0 1 2 −1 0  −−−−−−−−−−−→  0 1 2 −1 0  .
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x1 − x3 + x4 = 0
Thus the solution satisfies . Let x3 = α, x4 = β, where α, β ∈ R (x3 , x4
x2 + 2x3 − x4 = 0

x1 = x3 − x4 = α − β
are free variables). Then , so
x2 = −2x3 + x4 = −2α + β
       
x1 α−β 1 −1
x2  −2α + β  −2  1
 =  = α   + β   , α, β ∈ R.
x3   α   1  0
x4 β 0 1
We see that, geometrically, N(T ) = N(A) is a plane passing through the origin in R4 . Formally,
the answer is      

 1 −1 

−2  1
   
N(T ) = α   + β   α, β ∈ R .
   

 1 0 

0 1
 

Note 6.8. Let A be an n × n matrix. Then A is invertible if and only if N(A) = {0} (i.e., the
null space of A consists only of the zero vector).
Proof.
A is invertible
⇐⇒ x = 0 is the only vector in Rn satisfying A x = 0 (by Theorem 3.22)
⇐⇒ N (A) = {0} (by definition of N(A)).

6.3. LINEAR SPAN 69

6.3. Linear span


If we look at the above examples of subspaces, all of them have a description of the form
{c1 v1 + c2 v2 + . . . + ck vk | c1 , . . . , ck ∈ R},
where v1 , . . . , vk are some vectors. This motivates the following definition.
Definition 6.9 (Linear combinations and span). Let v1 , v2 , . . . , vk be some vectors in Rn .
(a) Any vector u = c1 v1 + c2 v2 + . . . + ck vk ∈ Rn , for some c1 , . . . , ck ∈ R, will be called a linear
combination of vectors v1 , v2 , . . . , vk .
(b) The (linear ) span of {v1 , . . . , vk }, denoted span{v1 , . . . , vk }, is the subset of Rn consisting of
all possible linear combinations of v1 , . . . , vk . In other words,
span{v1 , . . . , vk } = {c1 v1 + . . . + ck vk | c1 , . . . , ck ∈ R} ⊆ Rn .
Convention. span{∅} = {0}, i.e., the span of the empty set of vectors consists only of the
zero vector.
Note 6.10. Observe that vi ∈ span{v1 , . . . , vk }, for each i = 1, 2, . . . , k, as we can take ci = 1
and cj = 0 for j 6= i.
     
1 3 2 −1
Example 6.11. (a) If v1 = and v2 = in R , then u = is a linear combination
2 4 0
of v1 , v2 because u = 2 v1 − v2 (c1 = 2, c2 = −1). Thus u ∈ span{v1 , v2 }.
   
1 −1
−2  1 4
(b) The null space N(T ) from Example 6.7 is the span of the vectors   1 and  0 in R .
  

0 1
(c) A line L, passing through the origin and parallel to a non-zero vector a ∈ Rn can be identified
with {α a | α ∈ R} = span{a}.
Proposition 6.12. For any k ∈ N and arbitrary vectors v1 , . . . vk ∈ Rn , the linear span
V = span{v1 , . . . , vk } is a subspace of Rn .
Proof. As before, we must show that the 3 defining conditions of a subspace are satisfied.
(i) Take c1 = c2 = . . . = ck = 0. Then 0 v1 + . . . + 0 vk = 0 ∈ span{v1 , . . . , vk } = V . Hence
V 6= ∅.
(ii) Let u, v ∈ V . Then there exist c1 , . . . ck , d1 , . . . , dk ∈ R such that u = c1 v1 + . . . + ck vk and
v = d1 v1 + . . . + dk vk . Therefore, using standard properties of vectors, we have

u + v = (c1 v1 + . . . + ck vk ) + (d1 v1 + . . . + dk vk )
= (c1 + d1 ) v1 + . . . + (ck + dk ) vk ∈ span{v1 , . . . , vk } = V, since c1 + d1 , . . . , ck + dk ∈ R.
(iii) Exercise.

Example 6.13. (a) Let e1 , . . . , en be the standard basis of Rn . Then for all x = (x1 , . . . , xn )T ∈
Rn , we have x = x1 e1 + . . . + xn en . It follows that span{e1 , . . . , en } = Rn .
(b) Let a, b ∈ Rn be non-zero vectors that are not parallel to each other. Then span{a, b} =
{α a + β b | α, β ∈ R} gives rise a plane in Rn (passing through the origin and parallel to a, b).
Note 6.14. Let v1 , . . . , vk be arbitrary vectors in Rn . If V is any subspace of Rn such that
vi ∈ V for each i = 1, . . . , k, then span{v1 , . . . , vk } ⊆ V .
It follows that span{v1 , . . . , vk } is the smallest subspace of Rn containing these vectors.
Proof. Exercise. 
70 6. SUBSPACES OF Rn

6.4. Range and column space


Definition 6.15 (Column space of a matrix). Suppose that A is an m × n matrix with column
vectors v1 , . . . , vn ∈ Rm . The column space (range) of A, denoted col(A), is the set of all linear
combinations of v1 , . . . , vn in Rm . In other words, col(A) = span{v1 , . . . , vn } ⊆ Rm .
Note 6.16. By Proposition 6.12, col(A) is a subspace of Rm .
Definition 6.17 (Range of a linear transformation). Let T : Rn → Rm be a linear transforma-
tion. The range (image) of T , denoted R(T ), is the subset of Rm defined by R(T ) = {T (x) | x ∈
Rn }.
Example 6.18. If T : R → R, is given by T (x) = −2x for all x ∈ R, then R(T ) = {−2x | x ∈
R} = R. I.e., the range of T is all of R.
Proposition 6.19. If T : Rn → Rm is a linear transformation then R(T ) is a subspace of Rm .
Proof. Let us check that the 3 conditions from Definition 6.1 are satisfied.
(i) Since T (0) = 0 ∈ Rm , by Proposition 5.9.(i), we see that 0 ∈ R(T ) 6= ∅.
(ii) Let u, v ∈ R(T ). Then there exist x1 , x2 ∈ Rn such that u = T (x1 ) and v = T (x2 ). Therefore,
in view of linearity of T , we get u + v = T (x1 ) + T (x2 ) = T (x1 + x2 ) ∈ R(T ), as x1 + x2 ∈ Rn .
(iii) Let u ∈ R(T ) and λ ∈ R. Then there exists x1 ∈ Rn such that u = T (x1 ). Once again, by
linearity of T , λ u = λ T (x1 ) = T (λ x1 ) ∈ R(T ), as λ x1 ∈ Rn .
Thus properties (i)-(iii) from Definition 6.1 all hold, so R(T ) is indeed a subspace of Rm . 
Note 6.20. If T : Rn → Rm is linear and A is its matrix (of size m × n), then R(T ) = col(A).
Proof. For every x = (x1 , . . . , xn )T ∈ Rn , we know that x = x1 e1 + . . . + xn en . So, by
linearity of T , T (x) = x1 T (e1 ) + . . . + xn T (en ) ∈ span{T (e1 ), . . . , T (en )} = col(A), because
T (e1 ), . . . , T (en ) are the column vectors of A (see Theorem 5.6). Hence R(T ) ⊆ col(A).
The proof of the reverse inclusion col(A) ⊆ R(T ) is an exercise. The two inclusions together
show that col(A) = R(T ). 
 
0 −3 7
Example 6.21. Let A = . Then
1 4 2 2×3
             
0 −3 7 0 −3 7
col(A) = span , , = α +β +γ α, β, γ ∈ R ⊆ R2 .
1 4 2 1 4 2
         
7 7 −3 34 0 7 0
However, note that =− + , i.e., is a linear combination of and
2 3 4 3 1 2 1
 
−3
. So, for any α, β, γ ∈ R,
4
                 
0 −3 7 0 −3 7 −3 34 0 0 −3
α +β +γ =α +β − γ + γ = α0 + β0 ,
1 4 2 1 4 3 4 3 1 1 4
where α0 = α + 34
3 γ
0 =
∈ R, β β− 7
3 γ∈ R.    
0 −3 7 0 −3
Therefore span , , ⊆ span , .
1 4 2 1 4
         
0 −3 0 −3 7
However, evidently, span , ⊆ span , , (as we can always take
1 4 1 4 2
γ = 0), hence          
0 −3 7 0 −3
col(A) = span , , = span , .
1 4 2 1 4
     
1 0 −3
In fact, it is not hard to see that e1 = ∈ span , , which implies that
0 1 4
       
0 1 0 −3
span , ⊆ span , .
1 0 1 4
6.5. LINEAR INDEPENDENCE 71
   
0 1
On the other hand, we know that span , = R2 (see Example 6.13.(a)). Thus col(A) =
1 0
   
0 −3
span , = R2 .
1 4
Note 6.22. Let w1 , . . . , wl , u ∈ Rn . Then the following are equivalent:
(1) span{w1 , . . . , wl , u} = span{w1 , . . . , wl }
(2) u is a linear combination of w1 , . . . , wl .
Proof. Exercise. [Hint: the inclusion span{w1 , . . . , wl } ⊆ span{w1 , . . . , wl , u} is easy because
{w1 , . . . , wl } ⊆ {w1 , . . . , wl , u}. See Example 6.21 for the idea how to show the reverse inclusion.]

Definition 6.23 (Minimal spanning set). Suppose that V is a subspace of Rn .
(a) If V = span{v1 , . . . , vk } for some v1 , . . . , vk ∈ Rn , then we will say that {v1 , . . . , vk } is a
spanning set of V .
(b) {v1 , . . . , vk } is a minimal spanning set of V if V = span{v1 , . . . , vk } and for any i, 1 ≤ i ≤ k,
{v1 , . . . , vi−1 , vi+1 , . . . , vk } is no longer a spanning set of V (i.e., none of these vectors is
redundant).
Example 6.24. Let A be the matrix from Example 6.21.
       
0 −3 7 7
(1) Then , , is a spanning set of col(A), but not a minimal one, as
1 4 2 2
   
0 −3
can be removed: col(A) = , .
1 4
       
0 −3 0 −3
(2) , is a minimal spanning set of col(A), as and each span a line
1 4 1 4
in R2 , and not all of R2 (but col(A) = R2 as we saw in Example 6.21).
(3) The (minimal) spanning set is (almost) never unique. E.g.,
           
0 −3 0 −3 1 0
col(A) = span , = span , = span , = ...
1 4 2 4 0 1

6.5. Linear independence


Given a set of vectors v1 , . . . , vk ∈ Rn , how can we tell if one of these vectors is a linear
combination of the others? This is equivalent to the following question:
Question 6.25. Given that V = span{v1 , . . . , vk }, is {v1 , . . . , vk } a minimal spanning set of
V?
To tackle these questions we need the notion of linear independence.
Definition 6.26 (Linear independence). Let v1 , . . . , vk be any vectors in Rn .
(a) We will say that these vectors are linearly independent if the equality
c1 v1 + . . . + ck vk = 0, with c1 , . . . ck ∈ R,
implies that c1 = c2 = . . . = ck = 0.
(b) The vectors v1 , . . . , vk are linearly dependent if there exist c1 , . . . , ck ∈ R, not all zero, such
that c1 v1 + . . . + ck vk = 0. This equation is called a (linear ) dependence relation among
v1 , . . . , vk .
Example 6.27. (a) If vi = 0 for some i, 1 ≤ i ≤ k, then v1 , . . . , vk are linearly dependent.
Indeed: just take c1 = . . . = ci−1 = ci+1 = . . . = ck = 0 and ci = 1(6= 0). Then
c1 v1 + . . . + ci−1 vi−1 + ci vi + ci+1 vi+1 + . . . + ck vk = ci vi = 0.
Hence v1 , . . . , vk are linearly dependent.
72 6. SUBSPACES OF Rn

(b) If v ∈ Rn , is a non-zero vector then v is linearly independent. Indeed: c v = 0 implies that


c = 0 because v 6= 0.

(c) Two vectors v1 , v2 ∈ Rn are linearly dependent if and only if they are parallel, i.e., if one of
them is a multiple of the other.
     
0 −3 7
(d) Vectors , , and are linearly dependent in R2 , because, as we saw in Example 6.21,
1 4 2
       
7 −3 34 0 7 0
− + = .
3 4 3 1 2 0

Here c1 = 73 , c2 = − 34
3 and c3 = 1.

Theorem 6.28 (Characterization of linear dependence/independence). Vectors v1 , . . . , vk ∈ Rn


are linearly dependent if and only if the equation A x = 0 has a non-zero solution x = c ∈ Rk ,
where A is the matrix with column vectors v1 , . . . , vk (i.e., A = (v1 . . . vk )n×k ). Equivalently,
v1 , . . . , vk are linearly independent if and only if the equation A x = 0 has only the trivial solution
x = 0 ∈ Rk .

Proof.

v1 , . . . , vk ∈ Rn are linearly dependent


⇐⇒ ∃ c1 , . . . , ck ∈ R, not all zero, such that c1 v1 + . . . + ck vk = 0 (Definition 6.26)
 
c1
 .. 
⇐⇒ ∃ c1 , . . . , ck ∈ R, not all zero, such that A  .  = 0 (Lemma 5.5)
ck
⇐⇒ ∃ c = (c1 , . . . , ck )T ∈ Rk , c 6= 0, such that A c = 0.

Corollary 6.29 (Linear independence of n vectors in Rn ). Let v1 , . . . , vn ∈ Rn , and let


A = (v1 . . . vn )n×n be the square matrix with column vectors v1 , . . . , vn . Then v1 , . . . , vn are
linearly independent if and only if det(A) 6= 0.

Proof.

v1 , . . . , vn ∈ Rn are linearly independent


⇐⇒ the equation A x = 0 has only the trivial solution x = 0 ∈ Rn (Theorem 6.28)
⇐⇒ the matrix A is invertible (Theorem 3.22)
⇐⇒ det(A) 6= 0 (Theorem 4.14).


     
−1 2 −1
Example 6.30. (a) To check whether the vectors  1 , 4 and  4 are linearly depen-
1 2 3
 
−1 2 −1
dent or independent, we can form the matrix A =  1 4 4 and calculate its determinant.
1 2 3

−1 2 −1 R2 → R2 + R1 −1 2 −1

R3 →R3 − 23 R2 −1 2 −1 Thm. 4.6

R3 → R3 + R1
det(A) = 1 4
4 ============ 0 6
3 ========== 0 6
3 ======= 0.
1 2 3 0 4 2 0 0 0
6.5. LINEAR INDEPENDENCE 73
   
−1 2
Since det(A) = 0, we can apply Corollary 6.29 to conclude that the vectors  1 , 4 and
1 4
 
−1
 4 are linearly dependent.
3
(b) The standard basis e1 , . . . , en of Rn consists of linearly independent vectors. Indeed: A =
(e1 . . . en ) = In is the n × n identity matrix. Since det(In ) = 1 6= 0, the vectors e1 , . . . , en are
linearly independent (by Corollary 6.29).
More generally, when k 6= n we have the following:
Theorem 6.31 (Characterization of linear independence in terms of rank). Let v1 , . . . , vk be
some vectors in Rn , and let A = (v1 . . . vk )n×k be the matrix with column vectors v1 , . . . , vk .
Then v1 , . . . , vk are linearly independent if and only if rank(A) = k.
Proof.
v1 , . . . , vk ∈ Rn are linearly independent
⇐⇒ the equation A x = 0 has a unique solution x = 0 ∈ Rk (Theorem 6.28)
⇐⇒ the system A x = 0 is consistent and has no free variables
⇐⇒ the number of non-zero rows in a row echelon form of (A | 0) is k (Theorem 3.15)
⇐⇒ the number of non-zero rows in a row echelon form of A is k
⇐⇒ rank(A) = k (Definition 3.28).

Corollary 6.32. If v1 , . . . , vk ∈ Rn and k > n, then these vectors are linearly dependent.
Proof. Indeed, if k > n then the matrix A = (v1 . . . vk ) has size n × k, so rank(A) ≤ n by
the definition of the rank (see also Theorem 3.33.(iii)). Hence, k > n implies that rank(A) ≤ n < k,
so v1 , . . . , vk must be linearly dependent by Theorem 6.31. 
Note 6.33. Corollary 6.32 can be re-formulated by saying that the size of a set of linearly
independent vectors in Rn cannot exceed n.
       
1 3 −1 2
For example, the vectors 1 , 2, −4 and 5 are linearly dependent in R3 , because
1 0 5 6
4 > 3 (by Corollary 6.32).
     
1 2 4
 2 −1 1 4
Example 6.34. Let v1 =  −1 , v2 =  3, v3 = 3 in R .
    

5 0 6
(a) In orderto check whether
 v 1 , v 2 , v 3 are linearly independent we need to calculate rank(A),
1 2 4
 2 −1 1
where A =  −1
 . To this end, we need to find a row echelon form of this matrix:
3 3
5 0 6 4×3
  R2 → R2 − 2R1    
1 2 4 R3 → R3 + R1 1 2 4 R3 → R3 + R2 1 2 4
 2 −1 1  R4 → R4 − 5R1  0 −5 −7  R4 → R4 − 2R2  0 −5 −7 
  −−−−−−−−−−−→   −−−−−− −− −− −→  .
 −1 3 3   0 5 7   0 0 0 
5 0 6 0 −10 −14 0 0 0
| {z }
row echelon form of A
74 6. SUBSPACES OF Rn

So, rank(A) = 2 < 3, hence v1 , v2 , v3 are linearly dependent by Theorem 6.31.


(b) Now, suppose that we need to find a linear dependence relation among v1 , v2 , v3 (i.e., find
c1 , c2 , c3 ∈ R, not all zero, such that c1 v1 + c2 v2 + c3 v3 = 0 ∈ R4 ). This is equivalent to finding
a non-zero solution of the system A x = 0, where A is the matrix with column vectors v1 , v2 , v3 .
We can use Gaussian elimination for this:
  same row  
1 2 4 0 ops. as in 1 2 4 0
 2 −1 1 0  part (a)  0 −5 −7 0  R2 → − 15 R2
(A | 0) =  −1
 −−−−−−−→   −−−−−−−−−→
3 3 0   0 0 0 0 
5 0 6 0 0 0 0 0
   
1 2 4 0 1 0 6/5 0
 0 1 7/5 0  R1 → R1 − 2R2  0 1 7/5 0 
  −−−−−− −− −− −→   – the reduced row echelon form of (A | 0).
 0 0 0 0   0 0 0 0 
0 0 0 0 0 0 0 0
 
x1
(
x1 + 65 x3 = 0
Thus the matrix equation A x = 0, where X = x2  is equivalent to . Since x3
x3 x2 + 75 x3 = 0
is a free variable, we can set it to be any real number. If we choose x3 = 5 then x1 = − 56 x3 = −6,
x2 = − 57 x3 = −7. So, the equation x1 v1 + x2 v2 + x3 v3 = 0 becomes
(6.1) − 6 v1 − 7 v2 + 5 v3 = 0,
which is a linear dependence relation among v1 , v2 , v3 .
       
−6 −14 20 0
−12  7  5 0
Check: −6 v1 − 7 v2 + 5 v3 =   6 + −21 + 15 = 0 X.
      

−30 0 30 0
Of course, if we chose a different value for x3 , we would get a different linear dependence relation.
Finally, note that the dependence relation (6.1) can be used to express any of the vectors involved
in it as a linear combination of the others. For example, v1 = − 76 v2 + 65 v3 , v2 = − 67 v1 + 57 v3 .
Proposition 6.35. Let v1 , . . . , vk ∈ Rn be any vectors. Then v1 , . . . , vk are linearly dependent
if and only if one of these vectors is a linear combination of the others.
Proof. Exercise. 
Algorithm 6.36 (Finding a linear dependence relation). Suppose you need to check whether
some vectors v1 , . . . , vk ∈ Rn are linearly independent, and find a linear dependence relation among
them in the case if they are linearly dependent. Then proceed as in Example 6.34.(b). I.e., form
the augmented matrix (A | 0), where A = (v1 . . . vk )n×k , bring it to a row echelon form. If
this row echelon form has exactly k non-zero rows, then v1 , . . . , vk are linearly independent (by
Theorem 6.31).
Otherwise a non-zero solution of A x = 0 exists, say x1 = c1 , . . . , xk = ck (you can find this
solution by bringing A to a reduced row echelon form). Then the vectors v1 , . . . , vk are linearly
dependent and c1 v1 + . . . + ck vk is a dependence relation among them.
Algorithm 6.37 (Checking linear independence). If you are given a collection of vectors
v1 , . . . , vk ∈ Rn and are only asked to determine whether they are linearly independent, then the
following approach is most efficient:
(1) if k > n then the vectors are linearly dependent by Corollary 6.32.
(2) if k ≤ n, form the matrix A = (v1 . . . vk )n×k , and calculate its rank. Then, by Theo-
rem 6.31,
• if rank(A) = k then the vectors are linearly independent;
• if rank(A) < k then the vectors are linearly dependent.
6.5. LINEAR INDEPENDENCE 75

Let us finish this section by giving two more examples.


Example 6.38. Determine whether the given vectors are linearly dependent or independent.
If they
 are linearly
 dependent,
  expressone ofthe vectors
 as a linear combination of the others:
−1 −1 2 3
 3   3   −6   −7 
v1 =  −2  , v2 =  −1  , v3 =  5  , v4 =  0 .
      

1 4 1 2
Following Algorithm 6.36, we need to solve the system A x = 0, where A is the 4 × 4 matrix
formed from the given vectors.
  R2 → R2 + 3R1  
−1 −1 2 3 0 R3 → R3 − 2R1 −1 −1 2 3 0
 3 3 −6 −7 0  R4 → R4 + R1  0 0 0 2 0  R ↔ R3
 −−−2−−−−
(A | 0) = 
 −2 −1
 −−− −−−−−−−−→  →
5 0 0   0 1 1 −6 0 
1 4 1 2 0 0 3 3 5 0
   
−1 −1 2 3 0 −1 −1 2 3 0
 0 1 1 −6 0  R4 → R4 − 3R2  0 1 1 −6 0  R4 → R4 − 23
2
R3
  −−−−−−−−−−−→   −−− −−−−−−− −−→
 0 0 0 2 0   0 0 0 2 0 
0 3 3 5 0 0 0 0 23 0
   
−1 −1 2 3 0 R1 → −R1 1 1 −2 −3 0 R1 → R1 + 3R3
 0 1 1 −6 0  R3 → 12 R3  0 1 1 −6 0  R2 → R2 + 6R3
  −−−−−−−−→   −−−−−−−−−−−→
 0 0 0 2 0   0 0 0 1 0 
0 0 0 0 0 0 0 0 0 0
   
1 1 −2 0 0 1 0 −3 0 0 
 0 1  x1 − 3x3 = 0
1 0 0  R1 → R1 − R2  0 1
  1 0 0  

 0 0 −−−−−−−−−−−→  ⇐⇒ x2 + x3 = 0 .
0 1 0  0 0 0 1 0 
x4 = 0

0 0 0 0 0 0 0 0 0 0
So, taking x3 = 1 we obtain x1 = 3, x2 = −1, x4 = 0, i.e., the given vectors v1 , v2 , v3 , v4 are
linearly dependent, with the dependence relation
       
−1 −1 2 0
 3   3   −6   0 
3 −2  −  −1  +  5  =  0  , that is, 3v1 − v2 + v3 = 0.
      

1 4 1 0
Thus any of the vectors v1 , v2 or v3 can be expressed as a linear combination of the other vectors.
For example, v3 = −3 v1 + v2 .
Example 6.39. Check whether the vectors v1 , v2 , v4 from Example 6.38 are linearly indepen-
dent or dependent. In the latter case express one of these vectors as a linear combination of the
others.
We proceed as before, following Algorithm 6.36.
  R2 → R2 + 3R1  
−1 −1 3 0 R3 → R3 − 2R1 −1 −1 3 0
 3 3 −7 0  R4 → R4 + R1  0
 −−−−−−−−−−−→  0 2 0  R ↔ R3
 −−−2−−−−

 −2 −1 →
0 0   0 1 −6 0 
1 4 2 0 0 3 5 0
   
−1 −1 3 0 −1 −1 3 0
 0 1 −6 0  R4 → R4 − 3R2  0 1 −6 0  R4 → R4 − 23
2
R3
  −−−−−−−−−−−→   −−−−−−−−−− − −→
 0 0 2 0   0 0 2 0 
0 3 5 0 0 0 23 0
76 6. SUBSPACES OF Rn
 
−1 −1 3 0
 0
 1 −6 0  .
 0 0 2 0 
0 0 0 0
Thus we found a row echelon form of the matrix with 3 non-zero rows, which is precisely the
number of vectors v1 , v2 , v4 . Hence these vectors are linearly independent by Algorithm 6.36.

6.6. Bases
Definition 6.40 (Basis). Let V be a subspace of Rn and let v1 , . . . , vk ∈ V be some vectors
in V . The set {v1 , . . . , vk } is called a basis of V if the following two conditions are satisfied:
(i) v1 , . . . , vk are linearly independent;
(ii) V = span{v1 , . . . , vk } (i.e., v1 , . . . , vk span V ).
Example 6.41. (a) The standard basis e1 , . . . , en of Rn is a basis: e1 , . . . , en are linearly
independent by Example 6.30.(b) and Rn = span{e1 , . . . , en } by Example 6.13.(a).
   
1 4
(b) Let v1 = 2 , v2 = 5 and V = span{v1 , v2 }. Then (ii) is automatically satisfied.
3 6
To check if v1 , v2 are linearly independent, we use Algorithm 6.37:
  R2 → R2 − 2R1    
1 4 R3 → R3 − 3R1
1 4 R3 → R3 − 3R2
1 4
2 5 −−−−−−−−−−−→ 0 −3 −−−−−−−−−−−→ 0 −3 .
3 6 0 −6 0 0
 
1 4
Thus rank(2 5) = 2, as it has two non-zero rows in a row echelon form. So, by Theorem 6.31,
3 6
the vectors v1 , v2 are linearly independent. (Alternatively, we could notice that v1 is not parallel
to v2 , so they are linearly independent by Example 6.27.(iii).) Therefore the set {v1 , v2 } is a basis
of V = span{v1 , v2 }.
The following observation immediately follows from Definition 6.40:
Note 6.42. Suppose that v1 , . . . , vk are linearly independent. Then {v1 , . . . , vk } is a basis of
V = span{v1 , . . . , vk }.
Theorem 6.43 (Any n linearly independent vectors form a basis of Rn ). Let v1 , . . . , vn be n
vectors in Rn . Then {v1 , . . . , vn } is a basis of Rn if and only if these vectors are linearly independent
Proof. “⇒” If {v1 , . . . , vn } is a basis then v1 , . . . , vn must be linearly independent by the
definition of a basis.
“⇐” Suppose that the vectors v1 , . . . , vn are linearly independent. Let A ∈ Mn (R) be the n × n
matrix whose column vectors are v1 , . . . , vn . Then, according to Corollary 6.29, det(A) 6= 0, so A
is invertible (by Theorem 4.14).
Now, to show that {v1 , . . . , vn } is a basis of Rn we need to prove that these vectors span Rn ,
i.e., for any b ∈ Rn there exist x1 , . . . , xn ∈ R such that
Lemma 5.5
b = x1 v1 + · · · + xn vn ======== A x, where x = (x1 , . . . , xn )T .
Thus we see that Rn = span{v1 , . . . , vn } if and only if the equation Ax = b has a solution for every
b ∈ Rn . But the latter is true since A is invertible, by Theorem 3.17 (we can take x = A−1 b).
Therefore the vectors v1 , . . . , vn span Rn . Since they are linearly independent by the assump-
tion, we can conclude that {v1 , . . . , vn } is a basis of Rn . 
The notion of a basis will play an important role in Linear Algebra II and will be studied in
much more detail in that module. In particular, the following two important statements will be
proved.
6.6. BASES 77

Theorem 6.44. If V is a subspace of Rn then V has a basis and any two bases of V have the
same number of vectors (i.e., if {v1 , . . . , vk } and {u1 , . . . , ul } are bases of V then k = l).
Theorem 6.44 tells us that the number of vectors in any basis of a subspace V is equal to the
same integer k. This k is called the dimension of V .
Theorem 6.45. Any basis of Rn has exactly n vectors. In fact, if v1 , . . . , vk ∈ Rn and k < n,
then v1 , . . . , vk do not span Rn (i.e., span{v1 , . . . , vk } is a proper subspace of Rn ).
The above theorems can be used to determine whether or not a given collection of vectors forms
a basis.
Example 6.46. Determine whether the following vectors form a basis of Rn .
     
2 5 −3
(a) 0 , 6 ,
     7 in R3 .
0 0 8
 
2 5 −3
Let us start with checking linear independence: A = 0 6 7, so det(A) = 2·6·8 = 96 6= 0
0 0 8
(cf. Theorem 4.6). Therefore thesevectors are
     linearly independent by Corollary 6.29. Now,
2 5 −3
applying Theorem 6.43, we see that 0 , 6 ,  7 form a basis of R3 .
0 0 8
     
−1 2 −1
(b)  1 , 4 and  4 in R3 .
1 2 3
These vectors are linearly dependent by Example 6.30.(a), so they cannot form a basis of R3
(by Definition 6.40).
   
−1 2
(c)  1 , 4 in R3 .
 
1 2
These two vectors are clearly linearly independent, as they are not multiples of each other.
However, they do not form a basis of R3 because any basis of R3 must consist of 3 vectors (see
Theorem 6.45).
CHAPTER 7

Eigenvalues, eigenvectors and applications

Eigenvalues and eigenvectors are important tools for studying matrices and linear transfor-
mations. They play a crucial role in differential equations, stochastic analysis, in the study of
networks, etc. The aim of this chapter is to introduce the first methods for computing eigenvalues
and eigenvectors of a matrix, and to describe an application of these notions in Google’s page
ranking algorithm.

7.1. Eigenvalues and eigenvectors


Definition 7.1 (Eigenvalues and eigenvectors). Let A be an n × n matrix.
(a) A number λ ∈ R is said to be an eigenvalue of A if there is a non-zero vector v ∈ Rn such
that A v = λ v.
(b) Any non-zero vector u ∈ Rn such that A u = µ u, for some µ ∈ R, is called an eigenvector of
A with eigenvalue µ.

Geometric meaning: v ∈ Rn is an eigenvector if A stretches (or shrinks) it, but preserves the
line L passing through v (see Figure 7.1).

Figure 7.1

   
1 α 1 0
Example 7.2. Let T : R2 → R2be a shear (so its matrix A is of the form or ).
0 1 β 1
Then T actually fixes one of the coordinate vectors. Figure 7.2 below assumes that α > 0. E.g.,

Figure 7.2

        
1 α −1 −1 −1 1 α
= , so is an eigenvector of with eigenvalue λ = 1.
0 1 0 0 0 0 1

78
7.1. EIGENVALUES AND EIGENVECTORS 79
   
7 −15 3
Example 7.3. Let A = . Then is an eigenvector of A corresponding to eigen-
2 −4 1
value λ = 2. Indeed, this can be easily checked:
      
7 −15 3 6 3
= =2 .
2 −4 1 2 1
Question 7.4. Given a square matrix, how can we find its eigenvalues and eigenvectors?
Proposition 7.5 (Determinant criterion for eigenvalues). Let A be an n × n matrix. Then
λ ∈ R is an eigenvalue of A if and only if det(A − λIn ) = 0.
Proof. First observe that for any vector v ∈ Rn and λ ∈ R,
(7.1) A v = λ v ⇐⇒ (A − λ In ) v = 0.
Indeed, using standard properties of matrices, we have
(A − λ In ) v = A v − (λ In ) v = A v − λ (In v) = A v − λ v,
and the equation A v − λ v = 0 is obviously equivalent to A v = λ v.
We can now prove the claim of the proposition:
λ is an eigenvalue of A
⇐⇒ ∃ v ∈ Rn , v 6= 0, such that A v = λ v (by Definition 7.1.(a))
n
⇐⇒ ∃ v ∈ R , v 6= 0, such that (A − λ In ) v = 0 (by (7.1))
n
⇐⇒ ∃ v ∈ R which is a non-zero solution of (A − λ In ) v = 0
⇐⇒ the matrix B = A − λ In is not invertible (by Theorem 3.22)
⇐⇒ det(B) = det(A − λ In ) = 0 (by Theorem 4.14).

Proposition 7.5 suggests a method for finding eigenvalues of a matrix.
 
2 1
Example 7.6. Find the eigenvalues of A = .
1 2
     
2 1 1 0 2−λ 1
We start with forming the matrix A − λ I2 = −λ = . Next, we
1 2 0 1 1 2−λ
calculate the determinant of this matrix, set it equal zero and solve the resulting equation:
det(A − λ I2 ) = (2 − λ) (2 − λ) − 1 = λ2 − 4λ + 3.
Hence
det(A − λI2 ) = 0 ⇐⇒ λ2 − 4λ + 3 = 0 ⇐⇒ λ = 1 or λ = 3.
Thus, by Proposition 7.5, A has two eigenvalues λ1 = 1 and λ2 = 3.
(We can check that det(A − λI2 ) = 0 for λ = 1, 3:
 
1 1
λ1 = 1, A − λ1 I2 = A − I2 = , so det(A − λ1 I2 ) = 0 X.
1 1
 
−1 1
λ2 = 3, A − λ2 I2 = A − 3 I2 = , so det(A − λ2 I2 ) = 0 X.)
1 −1
As we see from Example 7.6, det(A−λI2 ) = λ2 −4λ+3 turned out to be a quadratic polynomial
in λ. In fact, for any n × n matrix A, det(A − λ In ) is always a polynomial in λ, of degree n.
Definition 7.7 (Characteristic polynomial). Let A be an n × n matrix. The characteristic
polynomial of A is defined as pA (λ) = det(A − λ In ).
The equation pA (λ) = 0 (⇔ det(A − λ In ) = 0) is called the characteristic equation of A.
So, to find the eigenvalues we first compute the characteristic polynomial, and then we find its
roots.
80 7. EIGENVALUES, EIGENVECTORS AND APPLICATIONS

Note 7.8. If A ∈ Mn (R), the characteristic polynomial of A will have degree n. So it cannot
have more than n distinct roots, hence A cannot have more than n eigenvalues.

Algorithm 7.9 (Finding eigenvalues). To find the eigenvalues of an n × n matrix A, form


the new matrix A − λIn (treat λ as an unknown), calculate the characteristic polynomial pA (λ) as
det(A − λ In ) and find the roots of the equation pA (λ) = 0. These roots will be the eigenvalues of
A (by Proposition 7.5).

How do we find the eigenvectors corresponding to these eigenvalues?

Algorithm 7.10 (Finding eigenvectors). If λ is an eigenvalue of an n × n matrix A, to find


an eigenvector v corresponding to this λ, one needs to find any non-zero solution of the equation
(A − λ In ) v = 0.
 
2 1
Example 7.11. Let A = be the matrix from Example 7.6. Find the eigenvectors of A
1 2
corresponding to λ1 =1 and  λ2 = 3.       
1 1 1 1 x 0 x
λ1 = 1. A − λ1 I2 = . So, we solve = . It is easy to guess that =
1 1 1 1 y 0 y
   
1 1
is a non-zero solution, so is an eigenvector of A with eigenvalue λ1 = 1.
−1 −1
        
1 2 1 1 1 1
To justify our guess, let us check: A = = = λ1 X.
−1 1 2 −1 −1 −1
    
−1 1 −1 1 x
λ2 = 3. A − λ2 I2 = . Thus we look for a non-zero solution of =
1 −1 1 −1 y
       
0 x 1 1
. Again, we can easily guess that = is a solution, so is an eigenvector of A
0 y 1 1
corresponding to the eigenvalue λ2 =  3.        
1 2 1 1 3 1
To justify our guess, we check: A = = = λ2 X.
1 1 2 1 3 1
 
−3 −1 −2
Example 7.12. Let A =  1 −1 1. Find the eigenvalues and the corresponding eigen-
1 1 0
vectors of A.
We start with computing the characteristic polynomial of A:

−3 − λ −1 −2
expand by row 3
pA (λ) = det(A − λI3 ) = 1 −1 − λ 1 ============
1 1 −λ

3+1
−1 −2 3+2
−3 − λ −2 3+3
−3 − λ −1
(−1) ·1·
+ (−1) ·1·
+ (−1) · (−λ) ·

−1 − λ 1 1 1 1 −1 − λ
= (−1 − 2 − 2λ) − (−3 − λ + 2) − λ((−3 − λ)(−1 − λ) + 1)
= −3 − 2λ + 1 + λ − λ(λ2 + 4λ + 4) = −(λ3 + 4λ2 + 5λ + 2).

To find the eigenvalues, we solve the equation pA (λ) = 0:

−(λ3 + 4λ2 + 5λ + 2) = 0 ⇐⇒ λ3 + 4λ2 + 5λ + 2 = 0.

Observe that λ = −1 is a root (usually in the case of a 3 × 3 matrix you will be given one of the
eigenvalues, so you will know one of the roots. Also, for integer roots you can test possible divisors
of the last term of the polynomial: 2 in this case. Its divisors are ±1, ±2).
7.1. EIGENVALUES AND EIGENVECTORS 81

λ2 + 3λ + 2
To find the remaining roots, factorize the poly- λ + 1 λ3 + 4λ2 + 5λ + 2
nomial. For example, by using long division: −1 λ3 + λ2
is a root, so λ3 + 4λ2 + 5λ + 2 must be divisible 3λ2 + 5λ + 2
by λ + 1. 3λ2 + 3λ
Thus we obtain that
2λ + 2
λ3 + 4λ2 + 5λ + 2 = (λ + 1)(λ2 + 3λ + 2).
2λ + 2
0
Now, we factorize λ2 +3λ+2. Since λ = −1, −2 are its roots, we have λ2 +3λ+2 = (λ+1)(λ+2).
Therefore pA (λ) = −(λ + 1)2 (λ + 2).
Hence λ1 = −1 and λ2 = −2 are the eigenvalues of A. Note that there are only two of them,
but λ1 = −1 is of multiplicity 2.
Let us now find thecorrespondingeigenvectors.    
−2 −1 −2 x 0
λ1 = −1. A − λ1 I3 =  1 0 1. Thus we need to solve (A − λ1 I3 )  y  = 0.
1 1 1 z 0
    R2 → R2 + 2R1
−2 −1 −2 0 R1 ↔ R2
1 0 1 0 R3 → R3 − R1
 1 0 1 0  −−−−−−−→  −2 −1 −2 0  −−−−−−−−−−−→
1 1 1 0 1 1 1 0
   
1 0 1 0 R → R + R2
1 0 1 0 
 0 −1 0 0  −−−3−−−−3−−−− x+z =0
→  0 −1 0 0  ⇐⇒
−y = 0
0 1 0 0 0 0 0 0
   
x 1
Taking z = −1, we get that y  =  0 is an eigenvector corresponding to the eigenvalue
z −1
λ1 = −1.    
−1 −1 −2 x
λ2 = −2. A − λ2 I3 =  1 1 1 , and we look for a non-zero solution of (A − λ2 I3 ) y  =

1 1 2 z
     
0 x 1
0. We can guess that y  = −1 works, but, to be rigorous, we should check it:
0 z 0
      
−3 −1 −2 1 −2 1
 1 −1 1 −1 =  2 = −2 −1 X.
1 1 0 0 0 0
 
1
So −1 is an eigenvector corresponding to the eigenvalue λ2 = −2.
0
Example 7.13. Instead of long division you can use other ways to factorize the polynomial
λ3 + 4λ2 + 5λ + 2 (knowing that λ = −1 is a root):
(1) Work backwards (keeping in mind that the polynomial is divisible by λ + 1):
λ3 + 4λ2 + 5λ + 2 = λ3 + 4λ2 + 3λ + 2(λ + 1)
= λ3 + λ2 + 3(λ2 + λ) + 2(λ + 1)
= λ2 (λ + 1) + 3λ(λ + 1) + 2(λ + 1)
= (λ + 1)(λ2 + 3λ + 2) = (λ + 1)(λ + 1)(λ + 2).
82 7. EIGENVALUES, EIGENVECTORS AND APPLICATIONS

(2) Write λ3 + 4λ2 + 5λ + 2 = (λ + 1)(aλ2 + bλ + c), for some unknown a, b, c ∈ R. Expand


the right-hand side and compare coefficients at the same powers of λ to find a, b, c:
λ3 + 4λ2 + 5λ + 2 = aλ3 + bλ2 + cλ + aλ2 + bλ + c = aλ3 + (a + b)λ2 + (b + c)λ + c

a=1 
 a=1


a+b=4

⇐⇒ ⇐⇒ b = 3 . Hence λ3 + 4λ2 + 5λ + 2 = (λ + 1)(λ2 + 3λ + 2).
b+c=5
c=2

 
c=2

7.2. More examples


Let’s look at a few more examples of calculating eigenvalues and eigenvectors.
 
a11 a12 · · · a1n
 0 a22 · · · a2n 
Proposition 7.14. Let A =  .. ..  be an n × n upper triangular matrix. Then
 
.. ..
 . . . . 
0 · · · 0 ann
a11 , . . . , ann is the list of all eigenvalues of A.
Proof. We need to show that a number λ is an eigenvalue of A if and only if λ ∈ {a11 , . . . , ann }.
λ is an eigenvalue of A
⇐⇒ det(A − λIn ) = 0 (by Proposition 7.5)
 
a11 − λ a12 ··· a1n
 0 a22 − λ · · · a2n 
⇐⇒ det  .. =0
 
. . . . ..
 . . . . 
0 ··· 0 ann − λ

⇐⇒ (a11 − λ)(a22 − λ) · · · (ann − λ) = 0 (by Theorem 4.6)


⇐⇒ λ = aii for some i ∈ {1, . . . , n}.

Note 7.15. Let A be a square matrix. Then A and AT have the same eigenvalues.
Proof. Exercise (see the proof of Lemma 7.23 below). 
 
2 6 −2
Example 7.16. (a) Let A = 0 3 0. Then, according to Proposition 7.14, λ = 2, 3, 5
0 0 5
are the eigenvalues of A.
   
1 0 1 3
(b) If B = then B T = is upper triangular, so λ = 1 is the only eigenvalue of B.
3 1 0 1
 
0 −1
Example 7.17 (Eigenvalues may not be real numbers). Let A = . Then pA (λ) =
1 0
−λ −1
det(A − I2 ) = = λ2 + 1, so the equation pA (λ) = 0 ⇔ λ2 + 1 = 0 has no real roots. The
1 −λ
(complex) solutions of this equation are λ = ±i, where i ∈ C is the imaginary unit.
It is not hard to see that all of the theory of eigenvectors and eigenvalues developed above still
applies and we can find
 the eigenvectors
 corresponding to these eigenvalues using Algorithm
   7.10.

−i −1 −i −1 x 0
λ1 = i. A − λ1 I2 = . So, we need to find a non-zero solution of = .
1 −i 1 −i y 0
     
x 1 1
Clearly = works, so is an eigenvector of A with eigenvalue λ1 = i. (Check:
y −i −i
        
1 0 −1 1 i 1
A = = =i X.)
−i 1 0 −i 1 −i
7.2. MORE EXAMPLES 83
      
i −1 i −1 x 0
λ2 = −i. A − λ2 I2 = . So, we need to find a non-zero solution of = .
1 i 1 i y 0
     
x 1 1
Evidently = works, so is an eigenvector of A corresponding to the eigenvalue
y i i
        
1 0 −1 1 −i 1
λ2 = −i. (Check: A = = = (−i) X.)
i 1 0 i 1 i
 
4 0 4
Example 7.18 (Eigenvalues can be zero). Let A = 0 4 4.
4 4 8

4 − λ 0 4
expansion by row 1
pA (λ) = det(A − λ I3 ) = 0 4−λ 4 =============
4 4 8 − λ

1+1
4 − λ 4 1+3
0 4 − λ
(−1) · (4 − λ) · + (−1) 4 ·
4 8 − λ 4 4
factor out (4 − λ)
= (4 − λ)((4 − λ)(8 − λ) − 16) + 4(−(4 − λ) · 4) ============
(4−λ)((4−λ)(8−λ)−16−16) = (4−λ)(32−12λ+λ2 −32) = (4−λ)(λ2 −12λ) = −λ(λ−4)(λ−12).
So, the characteristic polynomial of A is pA (λ) = −λ(λ − 4)(λ − 12), and thus the eigenvalues of A
are λ1 = 4, λ2 = 0 and λ3 = 12.
Now let us find eigenvector corresponding to λ2 = 0: (A − λ2 I3 ) v = 0 ⇐⇒ A v = 0 ⇐⇒
   
4 0 4 0 R → R − R1
4 0 4 0 R3 → R3 − R2
 0 4 4 0  −−−3−−−−3−−−− →  0 4 4 0  −−−−−−−−−−−→
4 4 8 0 0 4 4 0
  R1 → 1 R1  
4 0 4 0 4
R2 → 14 R2
1 0 1 0
 0 4 4 0  −−−−−−−−→  0 1 1 0  .
0 0 0 0 0 0 0 0
   
1 1
Therefore v =  1 is a non-zero solution. Thus  1 is an eigenvector of A corresponding to
−1 −1
the eigenvalue λ
 2 = 0.      
4 0 4 1 0 1
(Check:  0 4 4   1 = 0 = 0 ·  1 X.)
4 4 8 −1 0 −1
 
3 2
Example 7.19 (Eigenvalues can be irrational). Consider the matrix A = . The
2 −3
3 − λ 2
characteristic polynomial of A is pA (λ) = = λ2 − 13. So the eigenvalues of A,
2 −3 − λ
√ √
λ1 = 13 and λ2 = − 13, are not rational (even though all entries of A are integers). Let us find
the eigenvectors
√ of A.
λ1 = 13.
√ √

     
3 − 13 2√ x 0 (3 − 13)x√+ 2y = 0 .
(A− 13 I2 ) v = 0 ⇐⇒ = ⇐⇒
2 −3 − 13 y 0 2x − (3 + 13)y = 0
Since we know that this system has a non-zero solution v = (x, y)T (as λ1 is an eigenvalue), the two
equations will be proportional to each other (this only true as we work with a 2×2 example
  here), so 
x −2

any solution of the first equation is also a solution of the second one. Clearly, =
y 3 − 13
√ √
is a solution (as expected, it satisfies the second equation as well: 2(−2) − (3 + 13)(3 − 13) =
84 7. EIGENVALUES, EIGENVECTORS AND APPLICATIONS

0 X). Therefore v1 = (−2, 3 − 13)T is an eigenvector of A corresponding to the
−4 − (9 − 13) = √
eigenvalue λ1 = 13. √

    
3 + 13 2√ x 0
λ2 = − 13. Then we have the system = . It is easy to see
2 −3 + 13 y 0

 
−2

that v2 = is the resulting eigenvector, corresponding to λ2 = − 13.
3 + 13

7.3. Application of eigenvectors in Google’s page ranking algorithm


This section is based on the article “The $25, 000, 000, 000 eigenvector: the linear algebra behind
Google”, by Kurt Bryan and Tany Leise (Siam Rev. 48(2006), no. 3, pp. 569-581).
In this section we will discuss how Google’s page ranking works: why, whenever we search for
something on Google, the desired web-page usually appears among the top three in the list? This
is due to a clever algorithm, which ranks the web-pages.
The score of a page is a quantitative rating of page’s importance, so we’re interested in assign-
ing to each web-page some score (non-negative real number), so that we can then compare their
relevance/importance by looking at these scores (the higher the score, the more important the page
is).
For example, let us suppose that we are given a web of 4 pages:

An arrow from page 1 to page 2 indicates


that there is a link from page 1 to page 2,
etc.

Let x1 , x2 , x3 , x4 denote the scores of pages 1, 2, 3, 4 respectively, xi ≥ 0 for every i = 1, 2, 3, 4.


Currently these are unknowns, and we want to find them. An inequality xi > xj would indicate
that page i is more important than page j.
We expect that the score of a page would not just count the number of links to this page, but
would also take into account the importance (i.e., the score) of the pages that link to it. In other
words, the score of page 1, x1 , should be determined by x3 and x4 , etc.
Since we do not want a single page to gain influence by merely linking multiple pages we make
sure that each page’s total influence adds up to 1. Thus we want that
X xj
(7.2) xk = ,
nj
j∈Lk

where nj is the number of outgoing links from page j and Lk is the set of pages which link to page
x3 x4 x3 x4
k. In our example, if k = 1, the equation (7.2) becomes x1 = + = + .
n3 n4 1 2
x1 x1 x2 x4 x1 x2
Similarly, x2 = , x3 = + + , x4 = + . This is equivalent to the following
3 3 2 2 3 2
matrix equation:
   
1
0 0 1 2 x1
1   
3 0 0 0 x2 
(7.3) A x = x, where A = 
1 1 1

x  .
and x =  
3 2 0 2  3
1 1
3 2 0 0 x4
The matrix A = (aij ) constructed this way is called the link matrix of the web. Thus, by
definition,

0 if page j does not link to page i
aij = 1
if page j links to page i ,
nj
7.4. SYMMETRIC MATRICES 85

where nj is the total number of links from page j.


The vector x is the vector of scores, which we want to find. Obviously we want it to be non-zero
to have all entries non-negative. Equation (7.3) shows that x is an eigenvector of the matrix A
corresponding to the eigenvalue λ = 1. It is now natural to ask
Question 7.20. Why does A always have such an eigenvector?
Observe that the sum of entries in each column of A is 1.
Definition 7.21 (Column-stochastic matrix). A square matrix is called column-stochastic if
all of its entries are non-negative and the sum of entries in each column is 1.
Note 7.22. The link matrix of any web is column-stochastic as long as there are no ‘dead-end’
web-pages (that have no outgoing links).
Lemma 7.23. Each column-stochastic matrix A has 1 as an eigenvalue.
Proof. Let A be an n × n column-stochastic matrix. Note that AT is row-stochastic, i.e., the
sum of entries in each row is 1. So, if AT = (cij )n×n and v = (1, 1, . . . , 1)T ∈ Rn , then
     
  1 c11 + . . . + c1n 1
c11 c12 . . . c1n   
T  .. . . 1
    21 c + . . . + c2n  A is row-stochastic 1
 T
 
A v= . .
. .
.  . =  ..  ==============  ..  = v.
 ..   .  .
cn1 cn2 . . . cnn
1 cn1 + · · · + cnn 1
Thus v is an eigenvector corresponding to the eigenvalue 1 for AT . Therefore det(AT − In ) = 0
by Proposition 7.5. Observe that AT − In = (A − In )T , because (A − In )T = AT − InT = AT − In ,
by Theorem 2.8. Hence, recalling Theorem 4.20, we get
det(A − In ) = det (A − In )T = det(AT − In ) = 0,


which implies that λ = 1 is an eigenvalue of A (by Proposition 7.5). 


 
0 0 1 21
1 
 3 0 0 0
Thus, we know that matrix A =  1 1 
1
 from our example, has an eigenvector x
3 2 0 2
1 1
3 2 0 0
corresponding to λ = 1. We can find x by solving (A − In ) x = 0. E.g., x = (12, 4, 9, 6)T ∈ R4
works.
We scale this vector so that its coordinates add up to 1, to obtain the vector ( 12 4 9 6 T
31 , 31 , 31 , 31 ) .
So, the importance score of page 1 is the largest (12/31) and of page 2 is the smallest (4/31).
Generally Google works with webs containing millions of pages, so the size of the link matrix A
is huge. In such situations the usual method for finding eigenvectors of A is not efficient or practical
(from the computational viewpoint).
They use several tools to make the computation more efficient. First, they modify the link
matrix A slightly, to make sure that all of its entries are strictly positive. Then they take vector
x0 = ( n1 , n1 , . . . , n1 ) ∈ Rn (n is very large!) and calculate A x0 , A (A x0 ) = A2 x0 , A (A (A x0 ))) =
A3 x0 , etc. It turns out that as k → ∞, the vectors Ak x0 converge to some vector q ∈ Rn , such
that A q = q, and the sum of coordinates of q is 1.
Thus q is the vector of importance score sought, and Ak x0 , for a sufficiently large k (e.g.,
k = 100 would usually be enough), is a good approximation of q.

7.4. Symmetric matrices


is symmetricif AT = A (i.e., (A)ij = (A)ji for all i, j). For
Recall that a square matrix A 
  0 0 1  
1 2 1 1
example, the matrices and 0 2 −3 are symmetric, while the matrix is not.
2 3 0 1
1 −3 5
86 7. EIGENVALUES, EIGENVECTORS AND APPLICATIONS

The following is an important property of symmetric matrices.


Theorem 7.24. Let A ∈ Mn (R) be an n × n symmetric matrix. Then A has n linearly inde-
pendent eigenvectors, and all eigenvalues of A are real numbers.
Proof. Omitted. 
 
1 ... 0
 .. . . .. 
Example 7.25. (a) Let A = In =  . . . . Observe that In is symmetric and the only
0 ... 1
eigenvalue of In is λ = 1 (e.g., by Proposition 7.14). Evidently any non-zero vector v ∈ Rn is an
eigenvector of In corresponding to this eigenvalue, as In v = v. In particular, the standard basis
vectors e1 , . . . , en are eigenvectors of In .
 
1 1 1 1 − λ 1 1

(b) A = 1 1 1. A is symmetric, pA (λ) = det(A−λ I3 ) = 1 1−λ 1 = −λ3 +3λ2 =
1 1 1 1 1 1 − λ
−λ2 (λ − 3). So the eigenvalues of A are λ1 = 0 and λ2 = 3.
λ1 = 0. (A − 0 · I3 ) v = 0 ⇐⇒
  R2 → R2 − R1  
1 1 1 0 R → R − R1
1 1 1 0
 1 1 1 0  −−−3−−−−3−−−−
→  0 0 0 0  ⇐⇒ x + y + z = 0.
1 1 1 0 0 0 0 0
Thus y, z are freevariables,
 and we can get two linearly independent eigenvectors: let y = 1, z = 0,
−1
then x = −1, so  1 is an eigenvector corresponding to λ1 = 0.
0
 
−1
Now, let y = 0, z = 1, then x = −1, so  0 is also an eigenvector corresponding to λ1 = 0.
1
Clearly these two eigenvectors (−1, 1, 0) and (−1, 0, 1)T are linearly independent (as they are not
T

multiples of each other).


λ2 = 3. (A − 3 I3 ) v = 0 ⇐⇒
    R2 → R2 + 2R1
−2 1 1 0 R1 ↔ R2
1 −2 1 0 R3 → R3 − R1
 1 −2 1 0  −−−−−−−→  −2 1 1 0  −−−−−−−−−−−→
1 1 −2 0 1 1 −2 0
   
1 −2 1 0 R3 → R3 + R2
1 −2 1 0 R2 → − 1 R2
 0 −3 3 0  −−−−−−−−−−−→  0 −3 3 0  −−−−−−−3−−→
0 3 −3 0 0 0 0 0
   
1 −2 1 0 R1 → R1 + 2R2
1 0 −1 0
 0 1 −1 0  −−−−−−−−−−−→  0 1 −1 0  .
0 0 0 0 0 0 0 0
 
1
Thus only z is a free variable, so there is one linearly independent eigenvector v = 1
1
corresponding to eigenvalue λ2 = 3.
Note 7.26. If λ is an eigenvalue of an n × n matrix A, then the number of linearly independent
eigenvectors of A corresponding to λ is equal to the number of free variables in the solution of the
system (A − λ In ) v = 0.
7.4. SYMMETRIC MATRICES 87
 
1 1 1
As we saw above, the matrix A = 1 1 1 from Example 7.25.(b) has 3 linearly independent
1 1 1
     
−1 −1 1
eigenvectors  1 ,  0 and 1. Observe that the scalar products of the vectors correspond-
0 1 1
| {z } | {z }
λ1 =0 λ2 =3
ing to λ1 and λ2 are zero: (−1, 1, 0)T · (1, 1, 1)T = 0 and (−1, 0, 1)T · (1, 1, 1)T = 0, i.e., these vectors
are orthogonal. In fact, this holds in general:
Theorem 7.27 (Eigenvectors of a symmetric matrix are pairwise orthogonal). If A is a symmet-
ric matrix then any two eigenvectors of A corresponding to different eigenvalues are orthogonal.
Proof. Suppose that A ∈ Mn (R), A = AT , and u, v ∈ Rn are eigenvectors of A: A u = µ u
and A v = λ v for some µ, λ ∈ R, where µ 6= λ.
Using properties of scalar product from Theorem 1.8, we have
(7.4) (A u) · v = (µ u) · v = µ (u · v).
On the other hand, note that u · v = uT v (i.e., the scalar product of u and v is equal to the
product of the row vector uT with column vector v, considered as matrices). Therefore
Thm. 2.9.(iv) assoc. A = AT
(7.5) (A u) · v = (A u)T v ========== (uT AT ) v ===== uT (AT v) ======
| {z } | {z }
scalar product matrix product
Av = λv props. of matrices
uT (A v) ======= uT (λ v) ============= λ (uT v) = λ (u · v).
Combining equations (7.4) and (7.5) together, we obtain µ (u · v) = (A u) · v = λ · (u · v). Hence
(µ − λ) (u · v) = 0, so u · v = 0 as µ 6= λ by assumptions. Thus u is orthogonal to v, as claimed. 
Lemma 7.28. Suppose that v1 , . . . , vk are pairwise orthogonal non-zero vectors in Rn (i.e.,
vi · vj = 0 if i 6= j, 1 ≤ i, j ≤ n). Then these vectors are linearly independent.
Proof. Assume that c1 v1 + . . . + ck vk = 0 for some c1 , . . . , ck ∈ R. We need to show that
c1 = c2 = . . . = ck = 0. Take any i, 1 ≤ i ≤ n, and note that (c1 v1 + · · · + ck vk ) · vi = 0 · vi = 0.
Thus, using standard properties of scalar product, we have (c1 v1 ) · vi + · · · + (ck vk ) · vi = 0
c1 (v1 · vi ) + · · · + ck (v1 · vi ) = 0. But vj · vi = 0 unless j = i by the assumptions, so ci (vi · vi ) = 0
for all i ∈ {1, . . . , k}.
Recall that vi · vi = kvi k2 > 0 as vi 6= 0 (see property SP4 in Theorem 1.8). It follows that
ci = 0. Since the latter is true for each i = 1, 2, . . . , k, we can conclude that c1 = · · · = ck = 0.
Thus v1 , . . . , vk are linearly independent. 
Corollary 7.29. If v1 , . . . , vk are eigenvectors corresponding to distinct eigenvalues of a sym-
metric matrix A, then v1 , . . . , vk are linearly independent.
Proof. This claim follows immediately from Theorem 7.27 and Lemma 7.28. 
The next theorem states that any n × n matrix which has n linearly independent eigenvectors
is diagonalizable:
Theorem 7.30. Let A be an n×n matrix. Suppose that v1 , . . . , vn ∈ Rn are linearly independent
eigenvectors of A, corresponding to eigenvalues λ1 , . . . , λn (not necessary distinct) respectively. Let
P = (v1 . . . vn )n×n be the matrix with column vectors v1 , . . . , vn . Then P is invertible and
 
λ1 0 · · · 0
 0 λ2 · · · 0 
P −1 A P =  ..
 
.. . . .. 
. . . .
0 0 · · · λn
is an n × n diagonal matrix with diagonal entries λ1 , . . . , λn .
88 7. EIGENVALUES, EIGENVECTORS AND APPLICATIONS

Proof. Note that, by definition of matrix multiplication,


A P = A (v1 . . . vn ) = (A v1 . . . A vn )n×n .
 
a1
 .. 
Indeed, if A =  .  , where a1 , . . . , an are row vectors of A, then
an n×n
 
  a1 · v1 . . . a1 · vn
a1 a · v . . . a2 · vn 
A P =  ...  (v1 . . . vn ) =  ..
 2 1
..  = (A v1 . . . A vn ).
  
..
 . . . 
an
an · v1 . . . an · vn
Recalling that A vi = λi vi , we can conclude that A P = (A v1 . . . A vn ) = (λ1 v1 . . . λn vn ).
Now, since the column vectors of P are linearly independent, det(P ) 6= 0 by Corollary  6.29,
u1
−1 −1  .. 
hence P is invertible by Theorem 4.14. Let u1 , . . . , un be row vectors of P : P = .  .
un n×n
−1
Since P P = In we know that

1 if i = j
(7.6) ui · v j = .
0 if i 6= j
It follows that
 
u1
P −1 A P = P −1 (λ1 v1 . . . λn vn ) =  ...  (λ1 v1 . . . λn vn ) =
 

un
 
u1 · (λ1 v1 ) u1 · (λ2 v2 ) . . . u1 · (λn vn )
 u · (λ1 v ) u · (λ2 v ) . . . u · (λn v ) 
 2 1 2 2 2 n  SP3
 .. .. ..  ====
 . . . 
un · (λ1 v1 ) un · (λ2 v2 ) . . . un · (λn vn )
   
λ 1 u1 · v 1 λ 2 u1 · v 2 · · · λ n u1 · v n λ1 0 · · · 0
 λ1 u · v λ2 u · v · · · λn u · v  (7.6)  0 λ2 · · · 0 
2 1 2 2 2 n
 ====  .. ..  ,
  
 .. .. .. .. . .
 . . .   . . . . 
λ 1 un · v 1 λ 2 un · v 2 · · · λ n un · v n 0 0 · · · λn
as claimed. 
 
1 1 1
Example 7.31. (a) Let A = 1 1 1 be as in Example 7.25.(b).
1 1 1
     
−1 −1 1
Then  1 ,  0 and 1 are eigenvectors of A. It is easy to check that these vectors
0 1 1
| {z } | {z }
λ1 =0 λ2 =3
 
−1 −1 1
are linearly independent, so we set P =  1 0 1. Calculating P −1 (using Algorithm 3.25),
0 1 1
 1 2 1

−3 3 −3
 
0 0 0
we get P −1 = − 31 − 13 2  . A simple computation now shows that P −1 A P = 0 0 0,

3
1 1 1 0 0 3
3 3 3
which is in line with the claim of Theorem 7.30.
7.4. SYMMETRIC MATRICES 89
 
4 0 4
(b) Let A = 0 4 4 be the symmetric matrix from Example 7.18. The eigenvectors of A are
4 4 8
     
1 1 1
v1 = −1 (corresponding to λ1 = 4), v2 =  1 (corresponding to λ2 = 0) and v3 = 1
0 −1 2
(corresponding to λ3 = 12).  v1 , v2 , v3 are linearly independent by Corollary 7.29, so,
 The vectors
1 1 1
for P = (v1 v2 v3 )3×3 = −1 1 1, Theorem 7.30 claims that P is invertible and P −1 A P =
0 −1 2
   
λ1 0 0 4 0 0
 0 λ2 0  = 0 0 0 .
0 0 λ3 0 0 12

Note that vi · vj = 0 if i 6= j, by Theorem 7.27. This fact can simplify finding P −1 , because it
immediately tells us that
    
1 −1 0 1 1 1 2 0 0
P T P = 1 1 −1 −1 1 1 = 0 3 0 ,
1 1 2 0 −1 2 0 0 6
where 2 = v1 · v1 = kv1 k2 , 3 = v2 · v2 = kv2 k2 and 6 = v3 · v3 = kv3 k2 . This shows that P T is
very close to being the inverse of P . Namely, we can get P −1 by dividing each row vector of P T
1 1

2 −2 0
by the square of its norm. Indeed, P −1 =  31 1 1  by Theorem 3.23, as
3 −3

1 1 1
6 6 3
1
− 12
   
2 0 1 1 1 1 0 0
1 1
− 13  −1 1 1 = 0 1 0 = I3 .
   
3 3
1 1 1
6 6 3 0 −1 2 0 0 1
CHAPTER 8

Orthonormal sets and quadratic forms

This chapter will introduce the notions of orthogonal and orthonormal sets of vectors and
will discuss diagonalization of symmetric matrices in orthonormal bases. It will also introduce
the Gram-Schmidt process for obtaining an orthonormal set from any linearly independent set of
vectors. The chapter will conclude with discussions of applications of Linear Algebra in the study
of quadratic forms and in the theory of conic sections.

8.1. Orthogonal and orthonormal sets


Definition 8.1 (Orthogonal and orthonormal sets). Let v1 , . . . , vk be some vectors in Rn .
(a) We will say that {v1 , . . . , vk } is an orthogonal set if vi · vj = 0 for all i 6= j, 1 ≤ i, j ≤ k.

0 if i 6= j
(b) {v1 , . . . , vk } is said to be an orthonormal set if vi · vj = , for all i, j, 1 ≤ i, j ≤ k.
1 if i = j
In other words {v1 , . . . , vk } is an orthonormal set if it is orthogonal and kvi k = 1 for each
i = 1, 2, . . . , k.
Example 8.2. (a) The standard basis {e1 , . . . , en } is an orthonormal set in Rn .
   
1 1
(b) , is an orthogonal set in R2 (because (1, 1)T · (1, −1)T = 0), but not an orthonor-
1 −1

mal one, as k(1, 1)T k = 2 6= 1.
(c) If v1 , . . . , vk are eigenvectors of a symmetric n×n matrix A, corresponding to distinct eigenvalues
of A, then {v1 , . . . , vk } is an orthogonal set in Rn by Theorem 7.27.
Lemma 7.28 from Chapter 7 can be reformulated as follows:
Note 8.3. If {v1 , . . . , vk } is an orthogonal set of non-zero vectors in Rn , then v1 , . . . , vk are
linearly independent.
Proposition 8.4. Let {v1 , . . . , vn } be an orthogonal set of non-zero vectors in Rn , and let
P = (v1 . . . vn ) be the n × n matrix with these vectors as its column vectors. Then P is invertible
and P −1 can be calculated by the formula
 1
vT

kv1 k2 1
P −1 = 
 .. 
.
. 
1 T
kv k2 n
v
n n×n

In other words, the row vectors of P −1 are 1


kv1 k2
vT1 , kv1 k2 vT2 , . . . , kv1 k2 vTn .
2 n

Proof. Observe that


   
1 T 1 1 1
2 v 1 2 (v1 · v1 ) kv1 k2
(v1 · v2 ) . . . kv1 k2
(v1 · vn )
 kv11 k T   kv11 k 1 1

 kv2 k2 v2 
  
 kv2 k2 (v2 · v1 ) kv2 k2
(v2 · v2 ) . . . kv2 k2
(v2 · vn ) 
(v v . . . v ) =

 .
..
 1 2 n  .. .. .. 





 . . . 

1 T 1 1 1
kv k2 n
v kv k2
(vn · v1 ) kvn k2
(vn · v2 ) . . . kvn k2
(vn · vn )
n n

90
8.2. GRAM-SCHMIDT ORTHONORMALIZATION PROCESS 91
 
1 0 ··· 0
0 1 · · · 0
=  .. .. . . ..  = In ,
 
. . . .
0 0 ··· 1
1
vT1
 
 kv1 k2
0 if i 6= j ..
because vi ·vj = . By Theorem 3.23, P is invertible and P −1 =  . 
 
2
kvi k if i = j .
1
kvn k2
vTn

Corollary 8.5. If {v1 , . . . , vn } is an orthonormal set of vectors and P = (v1 . . . vn )n×n , then
P is invertible and P −1 = P T .
 T
v1
−1  .. 
Proof. By Proposition 8.4, P =  .  = P T (as kvi k2 = 1 and, in particular, vi 6= 0, for
vTn
each i = 1, 2, . . . , n). 
Definition 8.6 (Orthogonal matrix). An n × n matrix P is called orthogonal if P T = P −1
(i.e., if P T P = In ).
Note 8.7. An n×n matrix P is orthogonal if and only if its column vectors form an orthonormal
set.
Proof. If v1 , . . . , vn is an orthonormal set of vectors in Rn then the n×n matrix P = (v1 . . . vn )
is orthogonal by Corollary 8.5. The proof of the opposite implication is an exercise. 

8.2. Gram-Schmidt orthonormalization process


The Gram-Schmidt process starts with a collection of linearly independent vectors v1 , . . . , vk in
Rn and produces an orthonormal set of non-zero vectors w1 , . . . , wk ∈ Rn such that span{w1 , . . . , wk } =
span{v1 , . . . , vk }. This process works in two stages.
Algorithm 8.8 (Gram-Schmidt process). Let v1 , . . . vk ∈ Rn be linearly independent vectors.
Stage 1 (produce an orthogonal set of non-zero vectors u1 , . . . , uk ∈ Rn such that span{u1 , . . . , uk } =
span{v1 , . . . , vk }.) Let
     
v 2 · u1 v3 · u1 v 3 · u2
u1 = v1 , u2 = v2 − u1 , u3 = v3 − u1 − u2 , . . .
u1 · u1 u1 · u1 u2 · u2
vk · uk−1
   
vk · u1
uk = vk − u1 − . . . − uk−1 .
u1 · u1 uk−1 · uk−1
 
v·u
(Recall that u is the projection of a vector v ∈ Rn along a vector u ∈ Rn – see
u·u
Definition 1.25.)
1 1
Stage 2 (normalize vectors u1 , . . . , uk ). Set w1 = û1 = u1 , . . . , wk = ûk = u .
ku1 k kuk k k
Definition 8.9 (Orthonormal basis). Suppose that V is a subspace of Rn and v1 , . . . vk ∈ V
are some vectors. We will say that {v1 , . . . , vk } is an orthonormal basis of V if {v1 , . . . , vk } is an
orthonormal set which also is a basis of V (in the sense of Definition 6.40).
For example, the standard basis e1 , . . . , en is an orthonormal basis of Rn . Orthonormal bases
are much easier to work with than arbitrary bases, so given any subspace V , of Rn , it is often useful
to find an orthonormal basis of V .
Note 8.10. Since the vectors forming an orthonormal set are automatically linearly independent
(by Note 8.3), an orthonormal set {v1 , . . . , vk } is an orthonormal basis of a subspace V if and only
if these vectors span V , i.e., V = span{v1 , . . . , vk }.
92 8. ORTHONORMAL SETS AND QUADRATIC FORMS

Theorem 8.11. Starting with any collection of linearly independent vectors v1 , . . . , vk ∈ Rn ,


the Gram-Schmidt process will produce an orthonormal set of vectors {w1 , . . . , wk } in Rn , such
that span{v1 , . . . , vk } = span{w1 , . . . , wk }. Thus {w1 , . . . , wk } will be an orthonormal basis of
V = span{v1 , . . . , vk }.

Proof. (Sketch) By definition, u1 = v1 and

 
v 2 · u1
u2 = v 2 − u1 = v2 −(projection of v2 along u1 )
u1 · u1

We need to check that u1 is orthogonal to u2 , these vectors are non-zero and span{v1 , v2 } =
span{u1 , u2 }.
1) Using standard properties of scalar product (see Theorem 1.8), we obtain
     
v2 · u1 v2 · u1
u2 · u1 = v2 − u1 u1 = v 2 · u1 − (u1 · u1 ) = v2 · u1 − v2 · u1 = 0.
u1 · u1 u1 · u1

Hence u1 · u2 = 0
2) Observe that u1 6= 0 as u1 = v1 6= 0 (because otherwise v1 , . . . , vk would be linearly
dependent by Example 6.27.(i)).  
v2 · u1
Arguing by contradiction, suppose that u2 = 0. Since u2 = v2 − u1 = 0, we
  u1 · u1
v ·u
see that c v1 + 1 v2 + 0 v3 + · · · + 0 vk = 0, where c = − u2 ·u1 ∈ R. Thus v1 , . . . , vk are
1 1
linearly dependent, contradicting our assumption. Therefore u2 6= 0.
3) Note that, by construction, u1 = v1 and u2 is a linear combination of v2 and u1 = v1
so, span{u1 , u2 } ⊆ span{v1 , v2 } (by Note 6.14, as span{v1 , v2 } is a subspace of Rn by
Proposition 6.12).  
v2 · u1
On the other hand, v1 = u1 ∈ span{u1 , u2 } and v2 = u2 + u1 ∈ span{u1 , u2 },
u1 · u1
so span{v1 , v2 } ⊆ span{u1 , u2 } by Note 6.14 (as span{u1 , u2 } is a subspace of Rn by
Proposition 6.12). Therefore span{v1 , v2 } = span{u1 , u2 }, as claimed.
Thus we have shown that {u1 , u2 } is an orthogonal set of non-zero vectors in Rn satisfying
span{u1 , u2 } = span{v1 , v2 }, so the theorem is proved in the case k = 2. The general case (for any
k ∈ N) can be proved by induction on k.
Now, let us proceed to stage 2. From stage 1 we know that ui · uj = 0 if i 6= j. So, when
we set wi = ûi , i = 1, . . . , k, we have kwi k = 1 (wi · wi = kwi k2 = 1) and wi · wj = ûi · ûj =
1 n
ku k ku k (ui · uj ) = 0 if i 6= j, 1 ≤ i, j ≤ k. Thus {w1 , . . . , wk } is an orthonormal set in R .
i j
Clearly span{w1 , . . . , wk } = span{u1 , . . . , uk }, so span{w1 , . . . , wk } = span{v1 , . . . , vk } = V .
Therefore {w1 , . . . , wk } is an orthonormal basis of V (by Note 8.10). 
     
 1 3 2 
Example 8.12. Let V = span 0 , −1 , 2 in R3 . Use the Gram-Schmidt process
2 1 1
 
to find an orthonormal basis of V .      
1 3 2
We start with the vectors v1 = 0 , v2 = −1 and v3 = 2. First we check that
    
2 1 1
these vectors are linearly independent (exercise). Next, we apply the Gram-Schmidt process (Al-
gorithm 8.8).
8.3. ORTHOGONAL DIAGONALIZATION OF SYMMETRIC MATRICES 93
       
1   3 1 2
v2 · u1 5
Stage 1. u1 = v1 = 0, u2 = v2 − u1 = −1 − 0 = −1 (check: u2 · u1 =
2 u1 · u1 1 5 2 −1
2 − 2 = 0 X).
       13   
    2 1 2 15 2
v 3 · u1 v3 · u2   4   1    13  13  
u3 = v 3 − u1 − u2 = 2 − 0 − −1 =  6  =  5 .
u1 · u1 u2 · u2 5 6 30
1 2 −1 − 13
30 −1
 
2
To simplify calculations, let us take u3 =  5 (since it is still orthogonal to u1 , u2 : u1 · u3 =
−1
0 = u2 · u3 ).    
1 2
1   1  
Stage 2. Now we normalize: w1 = û1 = √ 0 , w2 = û2 = √ −1 and w3 = û3 =
5 2 6 −1
        
2 1 2 2
1    1   1   1  
√ 5 . Thus √ 0 ,√ −1 , √ 5 is an orthonormal basis of V .
30 −1  5
2 6 −1 30 −1 

Corollary 8.13. Let V be a subspace of Rn . Then V has an orthonormal basis.


Proof. By Theorem 6.44, V has some basis {v1 , . . . , vk }; in particular, V = span{v1 , . . . , vk }.
Now, according to Theorem 8.11, we can apply the Gram-Schmidt process to construct an orthonor-
mal basis {w1 , . . . , wk } of V . 

8.3. Orthogonal diagonalization of symmetric matrices


Theorem 8.14 (Symmetric matrices are orthogonally diagonalizable). Let A ∈ Mn (R) be a
symmetric matrix.
(a) Then A has an orthonormal set of eigenvectors {w1 , . . . , wn } which forms a basis of Rn .
(b) Let P = (w1 . . . wn )n×n . Then P is an orthogonal matrix (i.e., P T = P −1 ) and
 
λ1 0 . . . 0
 0 λ2 . . . 0 
P T A P =  .. ,
 
.. . . .. 
. . . .
0 0 . . . λn n×n
where λi is the eigenvalue of A corresponding to the eigenvector wi , for every i = 1, 2, . . . , n.
Proof. (In the special case when all eigenvalues λ1 , . . . , λn are distinct). Recall that by
Theorem 7.24, A has n linearly independent eigenvectors v1 , . . . , vn ∈ Rn , so that Avi = λi vi ,
i = 1, . . . , n. Note that in this special case {v1 , . . . , vn } is an orthogonal set in Rn , by Theo-
rem 7.27.
For each i = 1, 2, . . . , n, set wi = v̂i = kv1 k vi and observe that
i
   
1 1 1 1
A wi = A v = (A vi ) = (λi vi ) = λi v = λi w i
kvi k i kvi k kvi k kvi k i
(we have used properties of matrix multiplication and the fact that A vi = λi vi ). Thus each wi
is an eigenvector of A, corresponding to eigenvalue λi , i = 1, . . . , n. Now, if i 6= j then wi · wj =
   
1 1 1
v
kv k i · kv k j = kv k kv k (vi · vj ) = 0, because {v1 , . . . , vn } is an orthogonal set. Therefore
v
i j i j
{w1 , . . . , wn } is an orthonormal set in Rn . It is a basis of Rn by Note 8.3 and Theorem 6.43. Thus
(a) is proved.
94 8. ORTHONORMAL SETS AND QUADRATIC FORMS

To prove(b), let P =(w1 . . . wn )n×n . Then P is an orthogonal matrix by Corollary 8.5 and
λ1 ... 0
−1  .. .. ..  by Theorem 7.30. Thus (b) follows, as P −1 = P T by definition of an
P AP =  . . . 
0 . . . λn
orthogonal matrix. 
 
5 −3
Example 8.15. (a) Let A = . Find an orthonormal basis of R2 formed of eigenvec-
−3 5
tors of A and find an orthogonal 2 × 2 matrix P such that P T A P is a diagonal matrix.
First we need to find the eigenvectors of A.

5 − λ −3
PA (λ) = det(A − λ I2 ) =
= (5 − λ)2 − 9 = λ2 − 10λ + 16 = (λ − 8)(λ − 2).
−3 5 − λ
So λ1 = 2, λ2 = 8 are the eigenvalues of A, and we proceed by finding the corresponding eigenvec-
tors.         
3 −3 x 0 x 1
λ1 = 2. (A − 2 I2 ) v = 0 ⇐⇒ = . Evidently we can take = . So
−3 3 y 0 y 1
    
1 5 −3 1
v1 = is an eigenvector of A corresponding to λ1 = 2. (Check: A v1 = =
1 −3 5 1
   
2 1
=2 X.)
2 1
        
−3 −3 x 0 x 1
λ2 = 8. (A − 8 I2 ) v = 0 ⇐⇒ = . Clearly we can take = . Thus
−3 −3 y 0 y −1
    
1 5 −3 1
v2 = is an eigenvector of A corresponding to λ2 = 8. (Check: A v2 = =
−1 −3 5 −1
   
8 1
=8 X.)
−8 1
 
1 1
Observe that v1 · v2 = 0 (as expected from Theorem 7.27), so we let w1 = v̂1 = √2
1
 
1
and w2 = v̂2 = √12 . Then {w1 , w2 } will be an orthonormal basis of R2 formed out of the
−1
eigenvectors of A. !
√1 √1
Finally, from Theorem 8.14 we know that P = 2 2 is an orthogonal matrix such that
√1 − √12
  2
2 0
PT AP = .
0 8
 
2 1 1
(b) B = 1 2 1.
1 1 2
The standard calculations show that pB (λ) = −(λ − 1)2 (λ − 4), so λ1 = 1 and λ2 = 4 are the
eigenvalues (λ1 is of multiplicity 2).
The following fact (which we present without proof) will be useful:
Fact 8.16. If A is a symmetric matrix and λ is a root of the characteristic polynomial pA (λ)
of multiplicity k, then there are precisely k linearly independent eigenvectors of A corresponding to
λ.
Let us find 3 eigenvectors of
 B.    
1 1 1 x 0
λ1 = 1. (B − I3 ) v = 0 ⇐⇒ 1 1 1  y  = 0 ⇐⇒ x + y + z = 0. So the solution of
1 1 1 z 0
this system has 2 free variables y and z, which means that we can find two linearly independent
8.4. QUADRATIC FORMS 95
 
1
eigenvectors (as expected from Fact 8.16). If y = −1 and z = 0 we get x = 1, so so v1 = −1.
0
 
1
If y = 0 and z = −1, we have x = 1, so v2 =  0. Evidently v1 , v2 are linearly independent,
−1
and both of them are eigenvectors of B corresponding to the eigenvalue λ1 = 1.
Problem: v1 · v2 = 1 6= 0, i.e., these vectors are not orthogonal. We need to apply the Gram-
Schmidt process (Algorithm 8.8) to them (it is a fact that applying this process will result in w1 , w2 ,
which are still eigenvectors of B for the same eigenvalue λ1 = 1).
     1
1 1
 
1   2
v2 · u1   1  
Thus we set u1 = v1 = −1, u2 = v2 − u1 =  0 − · −1 =  12 .
 
0 u1 · u1 2
−1 0 −1
(Check: u1 · u2 = 0 X)
Normalizing u1 and u2 , we obtain
   1  
1 2 1
1   2  1 1  
w1 = û1 = √ −1 and w2 = û2 = √  2  = √  1 .
2 6 6
0 −1 −2
(It is not hard to check that w1 and w2 are linearly independent eigenvectors of B corresponding
to the eigenvalue λ1 = 1.)
  row  
−2 1 1 0 operations
1 0 −1 0
λ2 = 4. (B − 4 I3 ) v = 0 ⇐⇒  1 −2 1 0  −−−−−−−−→  0 1 −1 0 . As expected
1 1 −2 0 0 0 0 0
from Fact 8.16, the solution has 1 free variable (as the multiplicity
  of λ 2 is 1), so there is only one
1
linearly independent eigenvector for λ2 = 4. E.g., v3 = 1. From Theorem 7.27 we know that
1
v3 is orthogonal to eigenvectors corresponding to λ1 = 1, hence v3  must
 be orthogonal to w1 , w2
1
1
(v3 · w1 = 0, v3 · w2 = 0 X). So, it is enough to set w3 = v̂3 = √ 1.
3 1
      
 1 1 1 1 
1 1
Thus √ −1 , √  1 , √ 1 is an orthonormal basis of R4 , made out of eigen-
 2 6 −2 3 1 
0
 1 1 1

√ √ √
2 6 3
vectors of B. Finally, we take P = (w1 w2 w3 ) = − √12 √1 √1 . From Theorem 8.14 we

6 3
0 − √26 √1
3
know that P an orthogonal matrix such that
 
1 0 0
P T A P = 0 1 0  .
0 0 4

8.4. Quadratic forms


Definition 8.17 (Quadratic form). Let A be an n × n matrix. The quadratic form QA is a
function QA : Rn → R satisfying QA (x) = xT A x for all x ∈ Rn .
Example 8.18. (a) Let A = In . Then QIn (x) = xT In x = xT x = x · x = kxk2 , for all x ∈ Rn .
I.e., QIn is the function which associates to each vector x ∈ Rn the square of its norm.
96 8. ORTHONORMAL SETS AND QUADRATIC FORMS
        
3 0 x 2 3 0 x 3x
(b) A = , x = ∈ R . Then QA (x) = (x y) = (x y) =
0 −5 y 0 −5 y −5y
3x2 − 5y 2 .
 
a b
(c) Observe that if A = is any 2 × 2 matrix, then
c d
b+c
     
a b x 2 2 2 b+c 2 a 2 x
QA (x) = (x y) = ax +bxy+cxy+dy = ax +2· xy+dy = (x y) b+c ,
c d y 2 2 d y
b+c
 
a 2
where b+c is a symmetric 2 × 2 matrix.
2 d
It can be similarly shown that in fact any quadratic form Q : Rn → R corresponds to some
symmetric n × n matrix. Therefore further on we will focus on the case when A is a symmetric
matrix.    
a b x
Thus, if n = 2, then A = and QA (x) = QA ( ) = ax2 + 2bxy + dy 2 , for all
b d y
 
x
x= ∈ R2 .
y
Definition 8.19 (Definiteness type). Let Q : Rn → R be a quadratic form associated to some
symmetric matrix A. Then Q and A are said to be
(a) positive definite if Q(x) > 0 for all non-zero x ∈ Rn ;
(b) positive semi-definite if Q(x) ≥ 0 for all x ∈ Rn ;
(c) negative definite if Q(x) < 0 for all x ∈ Rn ;
(d) negative semi-definite if Q(x) ≤ 0 for all non-zero x ∈ Rn ;
(e) indefinite if none of (a)–(d) is satisfied, i.e., if there exist x1 , x2 ∈ Rn such that Q(x1 ) > 0 and
Q(x2 ) < 0.
Note 8.20. Every positive definite quadratic form/matrix is also positive semi-definite. And
every negative definite quadratic form/matrix is also negative semi-definite.
    
2 0 T 2 0 x
Example 8.21. (a) If A = , then QA (x) = x Ax = (x y) = 2x2 + 5y 2 .
0 5 0 5 y
   
x 0
Clearly QA (x) > 0 if x = 6= , so A and QA are positive definite.
y 0
   
−3 0 0 x
(b) If A =  0 0 0, then QA (x) = −3x2 − z 2 , for all x = y  ∈ R3 . Clearly QA (x) ≤ 0 for
0 0 −1 z
 
0
all x ∈ R3 and QA (1) = 0, so A and QA are negative semi-definite but not negative definite.
0
      
1 3 1 3 x 2 2 1
(c) If A = , then QA (x) = (x y) = x +6xy+y . Notice that QA ( ) = 1 > 0,
3 1 3 1 y 0
 
1
QA ( ) = −4 < 0. So A and QA are indefinite.
−1
Let us formulate a useful criterion to determine the definiteness type of a given quadratic form.
Theorem 8.22. Let A ∈ Mn (R) be a symmetric matrix, and let QA be the corresponding
quadratic form (i.e., QA (x) = xT A x for all x ∈ Rn ). Then
(i) A and QA are positive definite if and only if all eigenvalues of A are strictly positive;
(ii) A and QA are positive semi-definite if and only if all eigenvalues of A are non-negative;
(iii) A and QA are negative definite if and only if all eigenvalues of A are strictly negative;
(iv) A and QA are negative semi-definite if and only if all eigenvalues of A are non-positive;
8.4. QUADRATIC FORMS 97

(v) A and QA are indefinite if and only if A has both positive and negative eigenvalues.
Proof. Recall that, by Theorem 7.24, all eigenvalues of the symmetric matrix A are real
numbers, so (i)–(v) cover all possibilities. We will only prove claim (i), as proofs of the other
statements are similar.
(i) “⇒” Suppose that QA is positive definite. Let λ ∈ R be an eigenvalue of A. This means
that there is some v ∈ Rn , v 6= 0, such that A v = λ v. Then, using standard properties of matrix
multiplication, we have
1
QA (v) = vT A v = vT (λ v) = λ (vT v) = λ kvk2 , so λ = QA (v)
kvk2
as kvk2 > 0 because v 6= 0 (see property SP4 of scalar product). Now, QA (v) > 0 as QA is positive
definite and v 6= 0, hence λ > 0, as claimed.
“⇐” Suppose that each eigenvalue of A is strictly positive. By Theorem 8.14, there exists an
orthogonal matrix P ∈ Mn (R) such that
 
λ1 . . . 0
(8.1) P T A P =  ... . . . ...  ,
 

0 . . . λn n×n
where λ1 , . . . , λn are eigenvalues of A (so λi > 0 for each i, 1 ≤ i ≤ n, by our assumption). Let us
multiply both sides of equation (8.1) by P on the left and by P T on the right:
 
λ1 . . . 0
P (P T A P ) P T = P  ... . . . ...  P T .
 

0 . . . λn
Recall that P is orthogonal, so P T P = P P T = In . Consequently, by associativity  of matrixmul-
λ1 . . . 0
T T T T  .. . . .  T
tiplication, P (P A P ) P = (P P ) A (P P ) = In A In = A. Thus A = P  . . ..  P .
0 . . . λn
Now, for every x ∈ R we have
     
λ1 . . . 0 λ1 . . . 0
QA (x) = xT A x = xT P  ... . . . ...  P T  x ===== (xT P )  ... . . . ...  (P T x)
    assoc.  

0 . . . λn 0 . . . λn
   
λ1 . . . 0 λ1 . . . 0
============= (P T x)T  ... . . . ...  (P T x) = yT  ... . . . ...  y,
props. of transpose    

0 . . . λn 0 . . . λn
 
y1
T n  .. 
where y = P x ∈ R . So, if y =  . , then
yn
  
λ1 . . . 0 y1
 .. . . . . 2 2
QA (x) = (y1 . . . yn )  . . ..   ..  = λ1 y1 + . . . + λn yn > 0,
 

0 . . . λn yn
 
0
 .. 
provided y 6=  . , because λi > 0 for all i by the assumption.
0
Now, it remains to check that if x 6= 0, then y 6= 0. Indeed: y = P T x, and P T is invertible
(P = P −1 , so (P T )−1 = (P −1 )−1 = P ). Hence the only solution of P T x = 0 is x = 0 (see
T
98 8. ORTHONORMAL SETS AND QUADRATIC FORMS

Theorem 3.22). Therefore if x 6= 0 then y = P T x 6= 0, so QA (x) = λ1 y12 + . . . + λn yn2 > 0, as


claimed. 
Example
 8.23.
 Determine the definiteness type of the following matrices/quadratic forms:
1 3
(a) A = (thus QA (x) = x2 + 6xy + 2y 2 ). We will find eigenvalues of A and then we will
3 2
apply Theorem 8.22. The characteristic polynomial of A is

1 − λ 3 = λ2 − 3λ − 7.

PA (λ) = det(A − λI2 ) =
3 2 − λ
√ √
3± 37 3+ 37
So, PA (λ) = 0 is equivalent to λ2 − 3λ − 7 = 0, i.e., λ = 2 . Thus λ1 = 2 > 0,

3− 37
λ2 = 2 < 0. Hence A and QA are indefinite by Theorem 8.22.(v).
 
x
(b) Q : R3 → R, Q(x) = Q(y ) = −2x2 − 3y 2 − 3z 2 + 2yz.
z
 
−2 0 0
First, note that Q corresponds to the symmetric matrix A =  0 −3 1. (Check:
0 1 −3
      
−2 0 0 x x x
(x y z)  0 −3 1 y  = −2x2 − 3y 2 − 3z 2 + 2yz = Q(y ), for all y  ∈ R3 X.)
0 1 −3 z z z
Now, we need to find the eigenvalues of A:

−2 − λ 0 0

PA (λ) = det(A − λI3 ) = 0 −3 − λ 1
0 1 −3 − λ

expand by row 1 1+1
−3 − λ 1
============ (−1) (−2 − λ)
= −(2 + λ)((3 + λ)2 − 1)
1 −3 − λ
= −(λ + 2)(λ2 + 6λ + 8) = −(λ + 2)(λ + 2)(λ + 4) = −(λ + 2)2 (λ + 4).
So, the eigenvalues of A are λ1 = −2 and λ2 = −4. Hence A and Q are negative definite (by
Theorem 8.22.(iii)).
 
1 2 1 − λ 2
(c) A = (thus QA (x) = x2 +4xy +4y 2 ). Then PA (λ) = = λ2 −5λ = λ(λ−5).
2 4 2 4 − λ
So λ1 = 0 and λ2 = 5 are the eigenvalues of A. Thus A is positive semi-definite, but not positive
definite (by Theorem 8.22).

8.5. Conic sections


Consider a double cone C in R3 :

For example, this cone can be given by the equa-


tion z 2 = x2 + y 2 .
8.5. CONIC SECTIONS 99

Intersections of the cone C with a plane in R3 are curves, called conic sections. They could be
several types: ellipses, parabolas, hyperbolas, two lines, one line, one point.
The three main types are:
• Parabolas, given by equation y = ax2 + bx + x or x = ay 2 + by + c where a, b, c ∈ R, a 6= 0.

y = (x − 2)2 + 1 = x2 − 2x + 5
x = − 15 (y + 3)2 + 2 = − 51 y 2 − 56 y + 15

The standard form of the equation of a parabola is y = ax2 or x = ay 2 .


x2 y2 x2 y2
• Hyperbolas, given by equations 2 − 2 = 1 or − 2 + 2 = 1 (in standard form) for
a b a b
some a, b ∈ R, a > 0, b > 0.

x2 y 2 x2 y 2
− 2 =1 − + 2 =1
a2 b a2 b

x2 y 2
• Ellipses: + 2 = 1, for some a, b > 0 (in standard form).
a2 b

x2 y 2
+ = 1 in the case when a > b.
a2 b2

For every quadratic form Q : R2 → R and every α ∈ R the level set {x ∈ R2 | Q(x) = α}
represents some conic section (or the empty set) in R2 .
For instance, if Q(x) = x2 − 5y 2 and α = 2, then

x2 y2
Q(x) = α ⇐⇒ x2 − 5y 2 = 2 ⇐⇒ √ − p = 1.
( 2)2 ( 2/5)2

Thus the level set {x ∈ R2 | Q(x) = 2} is a hyperbola in R2 .


Sometimes we are given such an equation Q(x) = α, and we need to find the standard form of
the corresponding conic section.

Algorithm 8.24 (Finding the standard form of a conic section). Given a quadratic form
Q(x) = ax2 + 2bxy + dy 2 , we want to find the standard form of the conic section Q(x) = α
(⇔ ax2 + 2bxy + dy 2 = α), for some α ∈ R.
100 8. ORTHONORMAL SETS AND QUADRATIC FORMS
 
a b
Find the symmetric matrix A = , corresponding to Q. Find an orthogonal 2 × 2 matrix
b d
 
λ 0
P such that P T A P = , where λ, µ are the eigenvalues of A (as in Example 8.15).
0 µ
         
x̃ x̃ T x x x̃
Define a new set of coordinates x̃ = , by setting =P (so that =P ,
ỹ ỹ y y ỹ
as P T = P −1 since P is orthogonal).
Then

props. of transpose
Q(x) = xT A x = (P x̃)T A (P x̃) ============= (x̃T P T ) A (P x̃)
    
assoc. λ 0 λ 0 x̃
===== x̃T (P T A P ) x̃ = x̃T x̃ = (x̃ ỹ) = λx̃2 + µỹ 2 .
0 µ 0 µ ỹ

λ µ
So Q(x) = α ⇔ λ x̃2 + µ ỹ 2 = α ⇔ x̃2 + ỹ 2 = 1 (if α 6= 0). This gives a standard form of
α α
the corresponding conic section in the new coordinates x̃.
 
p q
Proposition 8.25. If P = is an orthogonal 2 × 2 matrix, then P is either the matrix
r s
of a rotation about the origin (if det(P ) = 1) or P is the matrix of a reflection in a line through
the origin (if det(P ) = −1).

Proof. Since P is orthogonal, we know that its column vectors form an orthonormal set in R2
(see Note 8.7). In particular p2 + r2 = 1. Hence there exists θ ∈ [0, 2π) such that p = cos(θ) and
r = sin(θ).  
T −1 −1 1 s −q
Now, recall that P = P , as P is orthogonal and P = det(P ) by Theorem 2.25.
−r p
Since det(P ) = ±1 (exercise), we have
 two cases:
  
−1 s −q T p r
Case 1: det(P ) = 1. Then P = = P = , which implies that p = s and
−r p q s
 
cos(θ) − sin(θ)
q = −r. Thus s = p = cos(θ) and q = −r = − sin(θ), so P = is the matrix of
sin(θ) cos(θ)
the anti-clockwise rotation by θ about the origin.
   
−s q p r
Case 2: det(P ) = −1. Then P −1
= =P =T . Hence p = −s and q = r. So,
r −p q s
 
cos(θ) sin(θ)
s = −p = − cos(θ) and q = r = sin(θ). It follows that P = is the matrix of the
sin(θ) − cos(θ)
reflection in the line, making angle θ/2 with the positive x-axis. 

Algorithm 8.24 and Proposition 8.25 together imply that the new coordinate axes for x̃ and
ỹ are obtained
 from the oldones by applying either a rotation of  a reflection.
   Indeed, suppose
cos(θ) − sin(θ) x x̃
that P = represents the rotation by θ. Since = P , the direction
sin(θ) cos(θ) y ỹ
     
1 cos(θ) x
vector ẽ1 of the x̃-axis, will be equal to P ẽ1 = P = in the original coordinates .
0 sin(θ) y
Thus the  x̃-axis isobtained
 by rotating
 the x-axis about the origin by angle θ anti-clockwise. And
− sin(θ) cos(θ + π/2)
P ẽ2 = = , showing that the ỹ-axis is obtained by rotating the y-axis
cos(θ) sin(θ + π/2)
about the origin by angle θ anti-clockwise.
8.5. CONIC SECTIONS 101

This observation will allow us to sketch the new co-


ordinate axes in the original system of coordinates.

Example 8.26. Find the standard form of the conic section given by 3x2 + 2xy + 3y 2 = 8, and
sketch its graph.
   
x 2 2 3 1
Here Q( ) = 3x + 2xy + 3y , so it corresponds to the symmetric matrix A = . We
y 1 3
start with “diagonalizing” A (as in Example 8.15).  
1
The eigenvalues of A are λ1 = 2, λ2 = 4, and the corresponding eigenvectors are v1 = ,
−1
 
1
v2 = . These vectors are clearly orthogonal, so we only need to normalize them: w1 =
1
   
1 1 1 1
√ , w2 = √ , to obtain an orthonormal basis of R2 consisting of the eigenvectors of
2 −1 2 1 !
1
√ 1√
A. Then P = (w1 w2 ) = 2 2 is the matrix of rotation by − π4 .
− √12 √1
2
     
x̃ x x̃
Now we define new coordinates by =P , so that the new axes x̃, ỹ are obtained
ỹ y ỹ
from the old axes x, y by applying the rotation
 by − π4 about the origin.

2 0
Theorem 8.14 tells us that P T A P = , thus the original equation 3x2 + 2xy + 3y 2 = 8
0 4
becomes 2x̃2 + 4ỹ 2 = 8 in the new coordinates (see Algorithm 8.24). The standard form of the
x̃2 ỹ 2
latter equation is 2 + √ = 1, which represents the ellipse from Figure 8.1.
2 ( 2)2

Figure 8.1

 
x
Example 8.27. Consider the equation x2 + 8xy − 5y 2 = 1. Then Q( ) = x2 + 8xy − 5y 2 , so
y
 
1 4
A= . Using the usual methods, we find that λ1 = 3, λ2 = −7 are the eigenvalues of A,
4 −5
   
2 −1
and v1 = , v2 = are the corresponding eigenvectors of A (A v1 = 3 v1 , A v2 = −7 v2 ).
1 2
102 8. ORTHONORMAL SETS AND QUADRATIC FORMS
 
1 2
Again these vectors are orthogonal, so we only need to normalize them: w1 = v̂1 = √ ,
5 1
!
√2 − √15
 
1 −1 5
w2 = v̂2 = √ . Hence P = (w1 w2 ) = is the matrix of rotation about the
5 2 √1 √2
  5 5
origin by angle θ = cos −1 2
√ ◦
≈ 26.57 .
5      
x̃ T x x
Now, in the new coordinates =P , the equation Q( ) = 1 becomes
ỹ y y
x̃2 ỹ 2
3x̃2 − 7ỹ 2 = 1 ⇐⇒ √ − √ = 1.
(1/ 3)2 (1/ 7)2
Thus we get a hyperbola (see Figure 8.2).

Figure 8.2

You might also like