Math1048 - Linear Algebra I
Math1048 - Linear Algebra I
Math1048 - Linear Algebra I
Lecture notes
Autumn 2018-2019
Notation iii
Chapter 4. Determinants 47
4.1. Axiomatic definition of determinant 47
4.2. Determinants and invertibility 53
4.3. Calculating determinants using cofactors 53
4.4. Determinant of a product 55
4.5. Inverting a matrix using cofactors 56
Chapter 6. Subspaces of Rn 66
6.1. Definition and basic examples 66
6.2. Null spaces 67
6.3. Linear span 69
6.4. Range and column space 70
6.5. Linear independence 71
6.6. Bases 76
Chapter 7. Eigenvalues, eigenvectors and applications 78
7.1. Eigenvalues and eigenvectors 78
7.2. More examples 82
7.3. Application of eigenvectors in Google’s page ranking algorithm 84
7.4. Symmetric matrices 85
Chapter 8. Orthonormal sets and quadratic forms 90
8.1. Orthogonal and orthonormal sets 90
8.2. Gram-Schmidt orthonormalization process 91
8.3. Orthogonal diagonalization of symmetric matrices 93
8.4. Quadratic forms 95
8.5. Conic sections 98
NOTATION iii
Notation
Symbol Meaning
∅ the empty set, i.e., the set that contains no elements
N the set of natural numbers {1, 2, 3, . . .}
Z the set of integer numbers {. . . − 2, −1, 0, 1, 2, . . .}
Q the set of rational numbers {m/n | m ∈ Z, n ∈ N}
R the set of real numbers
C the set of complex numbers {a + bi | a, b ∈ R}
Rn the real n-space {(x1 , . . . , xn ) | x1 , . . . , xn ∈ R}
Mn (R) the set of all n × n matrices with real entries
n
X 4
X
‘the sum as j goes from m to n’. E.g., aj = a1 + a2 + a3 + a4
j=m j=1
CHAPTER 0
Historically complex numbers have been invented to solve polynomial equations which have no
real solutions, such as x2 + 1 = 0 or 3x8 + 10x4 + 2 = 0. However, it turned out that it suffices to
add to R one “imaginary” number, i (such that i2 = −1), together with all possible combinations
of it, to get the field of complex numbers C, over which every polynomial equation has a solution!
This makes complex numbers extremely useful, not only in mathematics but also in many other
sciences and engineering. The goal of this chapter is to give a quick introduction into the theory
of complex numbers.
Complex numbers can be added and subtracted according to the following rules: if z = a+bi ∈ C
and w = c + di ∈ C then
• z + w = (a + bi) + (c + di) = (a + c) + (b + d)i ∈ C;
• z − w = (a + bi) − (c + di) = (a − c) + (b − d)i ∈ C.
In other words, to add/subtract two complex numbers we add/subtract their real and imaginary
parts. E.g., (1 + 5i) + (−3 − i) = −2 + 4i.
Note 0.2. The addition of complex numbers satisfies the following standard properties:
• z + 0 = 0 + z = z, for all z ∈ C (Existence of zero);
• if z = a+bi ∈ C then we define −z = −a+(−b)i = −a−bi ∈ C so that z+(−z) = 0 = −z+z
(Existence of additive inverse);
• z + w = w + z, for all z, w ∈ C (Commutativity);
• (z + w) + u = z + (w + u), for any z, w, u ∈ C (Associativity).
1
2 0. INTRODUCTION TO COMPLEX NUMBERS
All of these properties are easy to prove using the definitions. For example, to show that the
commutativity holds, assume that z = a + bi ∈ C and w = c + di ∈ C. Then
z + w = (a + c) + (b + d)i
= (c + a) + (d + b)i (by commutativity of the addition of real numbers)
= w + z.
To multiply complex numbers, we first define i2 = i·i = −1, i.e., we proclaim that the imaginary
unit i is a square root of −1. Then, for z = a + bi ∈ C and w = c + di ∈ C, assuming the the
standard laws of distributivity, associativity and commutativity hold, we must have
√
Examples: |i| = 1, |4 − 3i| = 42 + 32 = 5, etc. √
Observe that if z = a + 0i ∈ R is a real number, then |z| = a2 = |a| is simply the standard
modulus of the real number. This means the notion of the complex modulus naturally extends the
notion of the real modulus.
It is easy to see that |z| = | − z|, for all z ∈ C. More generally, we have the following
Lemma 0.6. For any two complex numbers z, w ∈ C we have |z w| = |z| |w|.
p
|z w| = |(ac − bd) + (ad + bc)i| = (ac − bd)2 + (ad + bc)2
p p
= a2 c2 − 2acbd + b2 d2 + a2 d2 + 2adbc + b2 c2 = (a2 + b2 )(c2 + d2 ) = |z| |w|.
Definition 0.7 (Argument of a complex number). The argument arg(z), of a non-zero complex
number z = a + bi ∈ C is the angle θ (in radians) measured anti-clockwise from the positive real
axis to the vector (a, b) ∈ R2 . For z = 0 we set arg(z) = 0.
Figure 0.1
Note 0.8. Since the functions cos θ and sin θ are periodic with period 2π, adding a multiple
of 2π to θ does not change z. Therefore it is convenient to think that arg(z) is only defined up to
adding a multiple of 2π. E.g, arg(i) = π/2 + 2πk, k ∈ Z.
Definition 0.9 (Exponential notation). Given any θ ∈ R we will write eiθ to denote the
complex number cos θ + i sin θ ∈ C.
4 0. INTRODUCTION TO COMPLEX NUMBERS
Equation (0.1) gives a representation of the complex number z in polar coordinates: for any
z ∈ C, we have z = reiθ , where r = |z| and θ = arg(z).
Example√ 0.10. Find the argument and expression in polar coordinates for (a) z = −3i and
(b) w = − 3 + i
(a) | − 3i| = | − 3| |i| = 3, so, in polar coordinates, −3i = 3(cos θ + i sin θ), thus cos θ = 0 and
sin θ = −1. This yields that arg(−3i) = θ = 3π/2 + 2πk, k ∈ Z. This can confirmed by plotting
the vector (0, −3), corresponding to −3i, on the Cartesian plane R2 and observing that the angle
from the positive
√ x-axis√to this vector is 3π/2. Thus, in polar coordinates,
√ −3i = 3ei3π/2 .
√ 3 + i| = 3 + 1 = 2, so, in polar√coordinates, − 3 + i = 2(cos θ + i sin θ), thus
(b) | −
cos θ = − 3/2√and sin θ = 1/2. Therefore arg(− 3 + i) = θ = 5π/6 + 2πk, k ∈ Z. Thus, in polar
coordinates, − 3 + i = 2ei5π/6 .
Note 0.11. In fact, eiθ is the value of the standard exponential function at iθ ∈ C, which can
be shown using the power series expansions of ez , cos x and sin x.
Like cos θ and sin θ, the function eiθ is periodic with period 2π: ei(θ+2πk) = eiθ , for all k ∈ Z and
θ ∈ R. In particular, the expression of a complex number in polar coordinates is never unique,
as the argument can be changed by adding an integer multiple of 2π. E.g., ei2π = e0 = 1.
Since cos π = −1 and sin π = 0, we achieve
(0.2) eiπ = −1 (Euler’s identity).
Theorem 0.12. (Properties of eiθ )
(i) |eiθ | = 1 for any θ ∈ R;
(ii) if θ1 , θ2 ∈ R then eiθ1 eiθ2 = ei(θ1 +θ2 ) .
p
Proof. (i) Given any θ ∈ R, we have |eiθ | = | cos θ + i sin θ| = cos2 θ + sin2 θ = 1.
(ii) Let θ1 , θ2 ∈ R. Then, using Definition 0.3 and the standard formulas for cosine and sine of
θ1 + θ2 , we get:
eiθ1 eiθ2 = (cos θ1 + i sin θ1 )(cos θ2 + i sin θ2 ) = (cos θ1 cos θ2 − sin θ1 sin θ2 )
+ i(sin θ1 cos θ2 + cos θ1 sin θ2 ) = cos(θ1 + θ2 ) + i sin(θ1 + θ2 ) = ei(θ1 +θ2 ) .
Theorem 0.12 shows that polar coordinates may be much more convenient to use for computing
products or powers of complex numbers: if z = r1 eiθ1 and w = r2 eiθ2 , then
z w = r1 eiθ1 r2 eiθ2 = r1 r2 ei(θ1 +θ2 ) .
And for any natural number n ∈ N, we have z n = (r1 eiθ1 )n = r1n einθ1 . For example,
√ !
4 1 3 √
2eiπ/6 = 24 ei4π/6 = 16ei2π/3 = 16 cos(2π/3) + i sin(2π/3) = 16 − +
i = −8 + 8 3i.
2 2
0.3. COMPLEX CONJUGATION AND RECIPROCALS 5
For example, 2 + 4i = 2 − 4i, −5 − 6i = −5 + 6i, ī = −i, 0̄ = 0, etc. Observe that z̄¯ = z for
all z ∈ C, i.e., taking the complex conjugate of a complex conjugate returns the original complex
number.
Main properties of the complex conjugate are listed below.
Note 0.14. Let z, w ∈ C. Then
(i) if z = reiθ then z̄ = re−iθ (in particular, |z| = |z̄|);
(ii) z = z̄ if and only if z ∈ R (i.e., Im(z) = 0);
(iii) z z̄ = |z|2 ;
(iv) z + w = z̄ + w̄;
(v) z w = z̄ w̄;
Proof. We prove claims (i) and (iii), and leave the rest as exercises. For (i), recall that, by
definition, z = reiθ = r(cos θ + i sin θ). Therefore
z̄ = r(cos θ − i sin θ) = r(cos(−θ) + i sin(−θ)) = re−iθ ,
where we used the facts that cos(−θ) = cos θ and sin(−θ) = − sin θ for every θ ∈ R.
To prove (iii), let z = a+bi ∈ C. Then z̄ = a−bi, so z z̄ = (a+bi)(a−bi) = a2 +b2 +i(−ab+ab) =
a + b2 = |z|2 , as required.
2
1
Lemma 0.15. If z ∈ C is non-zero, then the complex number z −1 = z̄ is the reciprocal of z.
|z|2
In other words, z z −1 = 1 = z −1 z.
1
Proof. Since z 6= 0 then |z| > 0, so z −1 = z̄ is indeed a complex number (with Re(z −1 ) =
|z|2
Re(z)/|z|2 and Im(z −1 ) = −Im(z)/|z|2 ). Now we can use Note 0.14.(iii) to achieve:
1 z z̄
z z −1 = z 2 z̄ = 2 = 1.
|z| |z|
Finally, z −1 z = z z −1 = 1 by Note 0.4.
a b
Note that if z = a + bi 6= 0 then |z|2 = a2 + b2 , so z −1 = −i 2 ∈ C.
a2 + b2 a + b2
Example 0.16. Find the reciprocal of z = 8 − 5i ∈ C.
1
Observe that z̄ = 8 + 5i and |z|2 = 82 + 52 = 89. Hence, by Lemma 0.15, z −1 = (8 + 5i) =
89
8 5
+ i.
89 89
6 0. INTRODUCTION TO COMPLEX NUMBERS
Proof. Indeed, if z = reiθ , then |z| = r and z̄ = r̄ eiθ = re−iθ by Note 0.14. Therefore
1 1 1
z −1 = 2
z̄ = 2 (re−iθ ) = e−iθ .
|z| r r
It follows that for z, w ∈ C, such that z = r1 eiθ1 and w = r2 eiθ2 , w 6= 0, we have
z r1 eiθ1 r1
= = ei(θ1 −θ2 ) ,
w r2 eiθ2 r2
z |z|
in particular, = , just like for moduli of real numbers.
w |w|
−3i
Example 0.20. Express the fraction √ in polar coordinates.
− 3+i √
From Example 0.10 we know that −3i = 3 ei3π/2 and − 3 + i = 2 ei5π/6 . Hence
−3i 3ei3π/2 3 3 3
√ = i5π/6 = ei(3π/2−5π/6) = ei4π/6 = ei2π/3 .
− 3+i 2e 2 2 2
We have already found two complex roots ±i of the equation z 2 + 1 = 0, so by Theorem 0.21,
these are the only complex roots of this equation.
On the other hand, consider the equation z 3 − 3z − 2 = 0. After factorizing, we see that
z 3 − 3z − 2 = (z + 1)2 (z − 2), so z = −1 is a root of multiplicity 2, z = 2 is a root of multiplicity 1,
and the equation has no other roots. Since 2 + 1 = 3, we can see that this is in line with the claim
of Theorem 0.21.
The following observation is an easy consequence of Note 0.14.
Note 0.22. If all coefficients of the equation (0.3) are real (that is, a0 , . . . , an ∈ R), then for
any complex root α ∈ C, its conjugate ᾱ ∈ C is also a root of that equation.
A quadratic equation
(0.4) az 2 + bz + c = 0, with a, b, c ∈ C and a 6= 0,
can be solved using the standard formula:
√
−b ± D
z= , where D = b2 − 4ac is the discriminant.
2a
√
Now, D here could be any complex number, so we need to explain how D is calculated. For
simplicity, we will restrict ourselves√to the case when all coefficients are real numbers: a, b, c ∈ R.
Then D ∈ R, so if D ≥ 0, then D ∈ R is the usual square root of D. However, if D < 0
(this
√ ispthe case when√the equation (0.4) has √ no real solutions), then −D > 0, so we can write
D = (−D)(−1) = −D i, where we set −1 = i (as i2 = −1).
Example 0.23. Find the complex roots of the equation z 2 + 10z + 26 = 0.
In this equation a = 1, b = 10 and c = 26. So D = 100 − 104 = √ −4. Since
√ D <√0,√the given
equation has no real solutions. To find the complex roots, we calculate D = −4 = 4 −1 = 2i.
−10 − 2i −10 + 2i
Thus the equation has two complex roots z1 = = −5 − i and z2 = = −5 + i.
2 2
(Observe that the two roots z1 and z2 are complex conjugates of each other, which was to be
expected in view of Note 0.22.)
Example 0.24. Find all complex solutions of the equation z 5 = 1.
Here, in view of Theorem 0.21, we should expect to find 5 different complex roots (one of which
must, of course, be 1, as 15 = 1). It is convenient to use polar coordinates for z: z = reiθ . Then
z 5 = (reiθ )5 = r5 ei5θ . Thus
z 5 = 1 ⇐⇒ r5 ei5θ = 1.
Taking moduli of both sides we get |r5 ||ei5θ | = |1| = 1, which means that |r5 | = 1, as |ei5θ | = 1 by
Theorem 0.12.(i).
√ On the other hand, |r5 | = |r|5 = r5 = 1, as r > 0 (r is a positive real number).
Hence r = 1 = 1, thus it remains to find all θ satisfying ei5θ = 1.
5
Now, ei5θ = cos(5θ) + i sin(5θ) = 1 if and only if cos(5θ) = 1 and sin(5θ) = 0, i.e., 5θ = 2πk,
k ∈ Z. So, θ = 2πk/5, k ∈ Z, and thus z = ei2πk/5 , k ∈ Z. However, the function eit is 2π-periodic,
therefore the latter formula gives only 5 different values of z for k = 0, 1, 2, 3, 4:
In this chapter we will recall the notion of the real n-space, Rn and will study its basic objects,
such as vectors, points, lines and planes. Throughout this chapter n ∈ N will denote some positive
integer, although the last four sections will focus on the case when n = 3.
For example, when n = 1, we get the real line R1 = R, when n = 2 we get the plane R2 and
when n = 3 we get the real 3-space R3 .
Alternative view: we can also think of Rn as a set of points P = (x1 , . . . , xn ), where
x1 , . . . , xn ∈ R. The vector v = (x1 , . . . , xn ), with the same coordinates as a point P ∈ Rn , is
called the position vector of the point P . You can imagine v as the arrow from the origin to P .
More generally, given any two points P = (x1 , . . . , xn ) and Q = (y1 , . . . , yn ) in Rn , we define
# »
the vector from P to Q as P Q = Q − P = (y1 − x1 , . . . , yn − xn ) ∈ Rn .
Example 1.2. (a) Let v = (1, 2, 3) ∈ R3 . Then v is the position vector of the point (1, 2, 3).
(b) If P = (5, −2, −4), Q = (2, −3, 1) are two points in R3 , then
# »
P Q = Q − P = (2, −3, 1) − (5, −2, −4) = (−3, −1, 5).
Definition 1.3 (Basic operations with vectors). Let a, b ∈ Rn be two vectors, with a =
(a1 , . . . , an ) and b = (b1 , . . . , bn ), and let λ ∈ R be a real number. Then we define
• the sum a + b ∈ Rn to be the vector in Rn given by
a + b = (a1 + b1 , . . . , an + bn );
• the difference a − b ∈ Rn to be the vector in Rn given by
a − b = (a1 − b1 , . . . , an − bn );
• the multiple λ a to be the vector in Rn given by
λ a = (λa1 , . . . , λan ).
For a vector v = (a1 , . . . , an ) ∈ Rn , we will write −v to denote the vector −v = (−a1 , . . . , −an ) =
(−1) v.
Figure 1.1 below describes some geometric rules for drawing sums and differences of vectors.
Definition 1.4 (Same/opposite direction). Two vectors a and b in Rn are said to have
i) the same direction if ∃ λ > 0 (“there exists λ > 0”) such that a = λ b;
ii) opposite directions if ∃ λ < 0 such that a = λ b.
For example, any non-zero vector a ∈ Rn has the same direction with the vector 7 a and opposite
directions with the vector −3 a.
8
1.1. THE REAL n-SPACE 9
Proof. All of these properties can be easily verified by using the corresponding properties of
real numbers. For example, let us prove 5). Suppose that a = (a1 , . . . , an ) ∈ Rn and λ, µ ∈ R.
Then
(µλ) a = (µλ)a1 , . . . , (µλ)an (by Definition 1.3)
= µ(λa1 ), . . . , µ(λan ) (by associativity of the multiplication of real numbers)
= µ (λa1 , . . . , λan ) (by Definition 1.3)
= µ (λ a). (by Definition 1.3)
n
X
u · v = u1 v1 + u2 v2 + · · · + un vn = ui vi
i=1
n
X
(the symbol ci means that we take he sum of ci ’s as i ranges from 1 to n).
i=1
Warning: the scalar product of two vectors is a real number, it is not a vector! DO NOT
WRITE (u1 v1 , u2 v2 , . . . , un vn )!
Example 1.7. Let u = (1, −2, 3), v = (4, 6, −7) be two vectors in R3 .
u · v = 1 · 4 + (−2) · 6 + 3 · (−7)
= 4 − 12 − 21 = −29.
Theorem 1.8 (Properties of scalar product). For all u, v, w ∈ Rn the following properties hold.
SP1: u · v = v · u (commutativity);
SP2: u · (v + w) = u · v + u · w (distributivity);
SP3: If λ ∈ R then (λ u) · v = u · (λ v) = λ (u · v);
SP4: If v = 0 then v · v = 0, otherwise v · v > 0 (positive definite).
1.3. NORM OF A VECTOR 11
Proof.
SP1: u · v = u1 v1 + u2 v2 + . . . + un vn (definition of scalar product)
= v1 u1 + v2 u2 + . . . + vn un (commutativity property in R)
=v·u
Definition 1.14
p #(Distance). Let A, B be two points in Rn . We define the distance between A
# » » # » p
and B as kABk = AB · AB = (B − A) · (B − A).
Note 1.15. The distance between two points A and B in Rn equals the distance between B
# » # » # » # »
and A as kABk = kBAk (since BA = A − B = −(B − A) = −AB).
Example 1.22. The standard basis vectors i, j, k in R3 are unit vectors. More generally, the
vectors e1 , . . . , en are unit vectors, where el = (0, . . . , 0, 1, 0, . . . , 0) and 1 appears in the l-th place,
l ∈ {1, . . . , n}. (In R3 , e1 = i, e2 = j, e3 = k.)
Given two vectors a, b, the condition
ka + bk = ka − bk
is special:
Proposition 1.23. Let a, b ∈ Rn . Then ka + bk = ka − bk is equivalent to a · b = 0.
Proof. Exercise. (Hint: begin with ka + bk = ka − bk, square both sides and manipulate the
resulting equation.)
We can use Proposition 1.23 to justify our definition of perpendicularity: indeed the following
geometric argument shows that if the norm of a + b is equal to the norm of a − b then a is
# » # »
perpendicular to b (here ∠BAC denotes the angle between the vectors AB and AC at A).
If ka+bk = ka−bk then triangles ∆1 and
∆2 have equal side lengths, so they are
congruent to each other. Hence ∠BAC =
∠DAC, but ∠BAC + ∠DAC = 180◦ .
Therefore ∠BAC = 90◦ , i.e., a is perpen-
dicular to b.
(1.1) kx a + bk2 = (x a + b) · (x a + b) = (x a) · (x a) + (x a) · b + b · (x a) + b · b
↑ ↑
Def. 1.11 SP2, SP1
Proof. (Alternative, geometric proof of Corollary 1.29.)
By Pythagoras’ theorem
kλuk2 + kbk2 = kak2
Since kbk2 ≥ 0, we see that
kλuk2 ≤ kak2 hence kλuk ≤ kak.
In view of Theorem 1.27, for any two non-zero vectors a, b in Rn we see that
a·b
kak kbk ≤ 1, which is equivalent to
a·b
−1 ≤ ≤ 1.
kak kbk
Therefore there is a unique angle θ ∈ [0, π] such that
a·b
(1.3) cos θ = .
kakkbk
Definition 1.30 (Angle). Given any two non-zero vector a, b ∈ R, we define θ ∈ [0, π]
the angle
a·b a · b
between a and b by the equation cos θ = (in other words, θ = cos−1 ).
kakkbk kakkbk
Example 1.31. Compute the angle between the vectors (2, 0, 1, 4) and (−1, 3, −5, 7) in R4 .
By definition of the angle θ between these vectors, θ ∈ [0, π] and
(2, 0, 1, 4) · (−1, 3, −5, 7) −2 − 5 + 28 1
cos(θ) = = √ √ = .
k(2, 0, 1, 4)k k(−1, 3, −5, 7)k 21 84 2
Therefore θ = π/3.
1.5. EQUATION OF A LINE 15
Equation (1.3) is sometimes used to give the geometric formula for the scalar product:
Proof. Both sides are non-negative, so it is sufficient to show that ka + bk2 ≤ (kak + kbk)2 .
To this end, consider:
ka + bk2 = (a + b) · (a + b) (definition of norm)
= a · a + 2a · b + b · b (properties of scalar product)
≤ kak2 + 2kakkbk + kbk2 (by Cauchy-Schwarz inequality – see Thm. 1.27)
= (kak + kbk)2
Therefore ka + bk2 ≤ (kak + kbk)2 . After taking square roots we get the desired inequality.
Geometric idea:
ka + bk ≤ kak + kbk is equivalent to the
statement:
“The length of a side of a triangle does not
exceed the sum of lengths of the two other
sides”.
# »
S = P + a, thus a = P S.
L : R = P + λ a, λ ∈ R,
L : (x, y, z) = P + λ a, λ ∈ R.
More generally, in Rn the standard coordinates can be denoted (x1 , x2 , . . . , xn ), and the equation
of L becomes
L : (x1 , x2 , . . . , xn ) = P + λ a, λ ∈ R.
| {z }
R
16 1. THE REAL n-SPACE
Example 1.34. (a) In R4 , the line L, passing through the point P = (0, 1, 2, 3) and parallel to
the vector a = (4, 5, 6, 7), has the parametric equation
x1 0 4
x2 1 5
L:x3 = 2 + λ 6 , λ ∈ R.
x4 3 7
(b) Let us take P = (1, 2) and a = (2, −1) in R2 . The corresponding line L, passing through P
and parallel to a, will then have the parametric equation
λ = − 15 (x − 1)
x = 1 − 5λ 1
1 − 5 (x − 1) = z − 2
(x, y, z) ∈ L ⇐⇒ y = −3 + 6λ ⇐⇒ λ = 6 (y + 3)λ ⇐⇒ 1 .
6 (y + 3) = z − 2
z =2+λ λ=z−2
This system of equations defines the line L in R3 as the set of all solutions (x, y, z). Thus this
system corresponds to a Cartesian equation of the line L in R3 . (In R4 we would need 3 equations.)
Note 1.35. Any two distinct points A, B ∈ Rn determine a unique line passing through them:
# » # »
L : R = A + λ AB, λ ∈ R, where AB = B − A.
Given two vectors a, b ∈ Rn we shall say that they are parallel if there exists α 6= 0, α ∈ R
such that b = α a. Two lines are parallel if and only if their direction vectors are parallel.
Example 1.36 (Intersection of two lines). In R2 two distinct lines are either parallel or intersect,
but in R3 there are 3 possibilities: parallel, intersecting or skew. (Skew lines are neither parallel
nor intersecting.)
Consider two lines L1 , L2 in R3 , given by their parametric equations:
L1 : (x, y, z) = (5, 1, 3) + λ(2, 1, 2), λ ∈ R
L2 : (x, y, z) = (2, −3, 0) + µ(−1, 2, −1), µ ∈ R
(here λ and µ are two different independent parameters).
It is easy to see that L1 is not parallel to L2 since their direction vectors (2, 1, 2) and (−1, 2, −1)
are not multiples of each other (i.e., there is no α ∈ R such that (2, 1, 2) = α(−1, 2, −1)). In
particular, these lines are distinct.
To check if the intersection exists we look for (x, y, z) ∈ R3 such that
x = 5 + 2λ = 2 − µ 5 + 2λ = 2 − µ (1)
y = 1 + λ = −3 + 2µ =⇒ 1 + λ = −3 + 2µ (2) .
z = 3 + 2λ = −µ 3 + 2λ = −µ (3)
1.6. PARAMETRIC EQUATION OF A PLANE 17
Π = {R ∈ Rn | ∃ λ, µ ∈ R such that R = P + λ a + µ b}
18 1. THE REAL n-SPACE
Example 1.39. Given P = (0, 1, 2, 3), a = (4, 5, 6, 7) and b = (5, 6, 7, 8) in R4 , the plane Π
passing through P and parallel to the vectors a, b is given by the parametric equation
x1 0 4 5
x2 1 5 6
Π: x3 = 2 + λ 6 + µ 7 , λ, µ ∈ R.
x4 3 7 8
Note 1.40. In R3 any plane can also be given by a Cartesian equation of the form ax+by +cz =
d, where a, b, c, d ∈ R and (a, b, c) 6= (0, 0, 0).
Example 1.41 (From a parametric equation to a Cartesian equation). Let P = (1, −2, 3),
a = (4, 2, 1), b = (5, 4, 0) in R3 .
x 1 4 5
(1.4) Π: y = −2 +λ 2 + µ 4 , λ, µ ∈ R,
z 3 1 0
so
x = 1 + 4λ + 5µ (1)
y = −2 + 2λ + 4µ (2) .
z =3+λ (3)
Let’s solve for λ, µ to obtain a Cartesian equation for Π:
(3)⇒ λ = z − 3, (1)⇒ x = 1 + 4(z − 3) + 5µ, so µ = 51 (x + 11 − 4z).
(2)⇒ y = −2 + 2(z − 3) + 45 (x + 11 − 4z). Multiply both sides by 5, and simplify to get
# »
P R = R − P = λ a + µ b, hence
# » SP2
n · P R = n · (λ a + µ b) ==== n · (λ a) + n · (µ b)
SP3
==== λ(n · a) + µ(n · b) = 0.
# »
It is also true that for any point R ∈ R3 if P R is orthogonal to n then R ∈ Π. Let r, p ∈ R3
denote the position vectors of the points R and P respectively.
# » # »
equation of Π we need to a normal vector n = (a, b, c) So, we write n · AB = 0 and n · BC = 0
# » # » # » # »
(n · AC = 0 will hold automatically as AC = AB + BC). This is equivalent to
(a, b, c) · (−2, −5, −1) = 0 −2a − 5b − c = 0 (1)
⇐⇒ .
(a, b, c) · (4, 2, −4) = 0 4a + 2b − 4c = 0 (2)
2 equations, 3 variables, so 1 degree of freedom (because a plane only specifies the direction of a
normal vector, but not its norm).
3
2 × (1) + (2) : −8b − 6c = 0, so b = − c.
4
There exist infinitely many possibilities; choose one: c = 8, b = −6. Then from (1) we get that
a = 11, and so n = (11, −6, 8) (only unique up to a multiple).
Thus the Cartesian equation of Π is:
(11, −6, 8) · (x, y, z) = (11, −6, 8) · (1, 2, 3) ⇐⇒ 11x − 6y + 8z = 23.
Example 1.50. Find a normal vector to a plane Π which is parallel to the vectors a = (−1, 0, 2)
and b = (3, −4, 5) in R3 .
By Theorem 1.49 we can use a × b as a normal vector.
−1 3 0 · 5 − (−4) · 2 8
a × b = 0 × −4 = 2 · 3 − (−1) · 5 = 11
2 5 (−1) · (−4) − 3 · 0 4
8
So, we can take n = a × b = 11.
4
1.9. INTERSECTIONS OF PLANES AND LINES IN R3 21
Note that the corresponding normal vectors are (1, 1, 1), (1, 2, −1) and (3, 4, 1). No two of them
are parallel, so no two planes are parallel.
Eliminate x:
(2) − (1) ⇒ y − 2z = 1
there is no solution since 1 6= 0.
(3) − 3 × (1) ⇒ y − 2z = 0
1.10. DISTANCES IN R3 23
1.10. Distances in R3
Problem: given a plane Π and a point P in R3 , find the distance between P and Π (in other
# »
words, we need to find the minimal possible norm kP Ak, where A runs over all points in Π).
# »
Idea: the distance from P to Π is the norm kP N k,
# »
where N ∈ Π is the point such that P N is perpendic-
ular to Π.
Here is how we can justify the above idea: for any A ∈ Π, the triangle P AN is right-angled, so
# » # » # » # »
kP Ak2 = kP N k2 + kN Ak2 ≥ kP N k2 ,
# » # » # » # »
as kN Ak2 ≥ 0. Therefore kP Ak ≥ kP N k, so kP N k minimizes the distance between P and all
points of Π.
# »
Question: how do we find kP N k?
Method 1: if we already know some n 6= 0, which is normal to Π (e.g., if Π is given by a
# »
Cartesian equation c1 x + c2 y + c3 z = d, then (c1 , c2 , c3 ) = n is normal to Π), then P N is the
# »
projection of P A (for any point A ∈ Π) along n:
# » ! # » # » # »
# » PA · n # » PA · n |P A · n| |P A · n|
PN = n, hence kP N k = knk = · knk = .
n·n n·n knk2 knk
↑
by Thm. 1.18
Method 2: given a normal n to Π, we can find N by finding the intersection of the line
# »
L : R = P + λn, λ ∈ R, with Π. Then we can compute kP N k = kN − P k = . . .
Method 3: if Π is given by a parametric equation Π : R = A + λ a + µ b, λ, µ ∈ R, for some
non-zero, non-parallel vectors a, b in R3 , then compute n = a × b and follow either Method 1 or
Method 2 above.
Example 1.57 (Distance from point to plane, using Method 1). Let Π be the plane in R3 given
by the Cartesian equation 2x + 2y − z = 9, and let P = (3, 4, −4) be a point.
To find the distance from P to Π, choose any point A ∈ Π. For example, A = (0, 3, −3) (A ∈ Π
as 2 · 0 + 2 · 3 − (−3) = 9). Observe that the n = (2, 2, −1) is normal to Π (cf. Note 1.46). Now,
# » # »
P N is the projection of P A along n, so
# »
# » PA · n
P N = λ n, where λ = .
n·n
24 1. THE REAL n-SPACE
We calculate:
# » (−3, −1, 1) · (2, 2, −1) −6 − 2 − 1
P A = A − P = (−3, −1, 1), hence λ = = = −1.
(2, 2, −1) · (2, 2, −1) 4+4+1
# » # » √
Therefore P N = λ n = (−2, −2, 1), and the required distance is kP N k = 4 + 4 + 1 = 3.
# »
Note that λ is negative here, which means that n and P N have opposite directions.
Exercise 1.58. Use Method 2 above to find the distance from Example 1.57.
Problem: find the distance from a point P to a line L in R3 , where L is given by the parametric
equation L : R = A + λ a, λ ∈ R, so that A ∈ R3 is a point and a ∈ R3 is a non-zero vector.
# »
Question: how do we calculate kP N k?
# » !
# » # » # » PA · a # »
Method 1: clearly N A is the projection of P A along a, so N A = a and P N =
a·a
# » !
# » # » # » # » # »
# » P A·a
P A − N A. Hence kP N k = kP A − N Ak =
P A − a
.
a·a
Method 2: consider the plane Π passing through P and perpendicular to L. Then Π will have
the equation
(x, y, z) · a = p · a,
where p is the position vector of P . Clearly N is the intersection of Π with L. So, by finding this
# »
intersection (cf. Example 1.54), we find N and kP N k = kN − P k.
Example 1.59. In R3 , find the distance from the point P = (2, 1, −3) and the line L, given by
the equation L : (x, y, z) = (1, 0, −4) + λ(2, 2, −1), λ ∈ R.
# »
1) Using Method 1: A = (1, 0, −4), a = (2, 2, −1) so P A = A − P = (−1, −1, −1), and
# »
P A · a = −2 − 2 + 1 = −3, a · a = 22 + 22 + 1 = 9, so
# » !
PA · a 3 1 2 2 1
a = − a = − (2, 2, −1) = − , − , .
a·a 9 3 3 3 3
Thus the distance between P and L is equal to
# » # » !
PA · a
2 2 1
P A − a = (−1, −1, −1) − − , − ,
a·a 3 3 3
1 1 16 √
r
1 1 4
=
− 3 , − 3 , − 3
= 9 + 9 + 9 = 2.
2) Using Method 2: the plane Π, perpendicular to a, passing through P , will have the equation
Π : (x, y, z) · a = p · a, where a = (2, 2, −1), p = (2, 1, −3).
Therefore Π : 2x + 2y − z = 9 (as p · a = 2 · 2 + 2 · 1 + (−1) · (−3) = 9).
1.10. DISTANCES IN R3 25
No let us find the intersection of L and Π as in Example 1.54. We can re-write the parametric
equation of L as L : (x, y, z) = (1 + 2λ, 2λ, −4 − λ), λ ∈ R. Substituting this in the Cartesian
equation of Π above we get:
1
2(1 + 2λ) + 2(2λ) − (−4 − λ) = 9 ⇐⇒ 9λ + 6 = 9 ⇐⇒ λ = .
3
Plugging in λ = 13 back into the equation of L,we see that thepoint N , of the intersection of L
2 2 1 5 2 13
and Π, has coordinates N = 1 + , , −3 − = , ,− .
3 3 3 3 3 3
# »
5 2 13 1 1 4
Therefore P N = N − P = − 2, − 1, − − (−3) = − , − , − . Hence the distance
3 3 3 3 3 3
from P to L is equal to
# » √
1 1 4
kP N k =
− , − , −
= 2.
3 3 3
Thus we see that Methods 1 and 2 produce the same answer, as expected.
CHAPTER 2
Matrix Algebra
This chapter will introduce matrices and basic operations with them. Matrices represent one
of the key concepts in the module and we will frequently use them further on.
√
1 2 √sin(15)
Example 2.2. A = is a 2 × 3 matrix with a12 = 2, a22 = y, a23 = z − w,
x y z−w
etc.
Definition 2.3 (Equal matrices). Two matrices A = (aij ) and B = (bij ) are said to be equal
if they have the same size (m × n) and aij = bij for all i, j.
In the remainder of this chapter and throughout these notes we will assume that the entries in
all the matrices are real numbers, unless specified otherwise.
(For example, c11 = a11 b11 + a12 b21 + . . . + a1n bn1 , c25 = a21 b15 + a22 b25 + . . . + a2n bn5 , etc.)
What does this really mean? Think of A consisting of row vectors a1 , . . . , am ∈ Rn :
row vector a1
row vector a2
A=
..
.
.
row vector am
1 5
E.g., if A = −3 4 , then a1 = (1, 5), a2 = (−3, 4) and a3 = (0, −2).
0 −2 3×2
28 2. MATRIX ALGEBRA
column vector b1
column vector b2
column vector bp
B= .
..
.
7 8 9 10
E.g., if B = , then b1 = (7, 11), b2 = (8, 12) and b3 = (9, 13) and
11 12 13 14 2×4
b4 = (10, 14).
Then
a1 · b1 a1 · b2 ... a1 · bp
a2 · b1 a2 · b2 ... a2 · bp
C = AB = . .
.. ..
.. . .
am · b1 am · b2 . . . am · bp m×p
So, cij = ai · bj (the (i, j)-th entry of A B is equal to the scalar product of the i-th row of A
with the j-th column of B).
Note 2.9. According to Definition 2.8, the product A B is only defined if the number of
columns in A equals the number of rows in B.
1 2
1 2 3
Example 2.10. Let A = and B = 2 0. Note that A has 3 columns and B has
2 1 3
1 1
3 rows, so the product A B is defined (and it must be a 2 × 2 matrix):
1 2
1 2 3 1·1+2·2+3·1 1·2+2·0+3·1 8 5
AB = 2 0 = = .
2 1 3 2·1+1·2+3·1 2·2+1·0+3·1 7 7
1 1
The product B A is also defined (and has size 3 × 3):
1 2 1+4 2+2 3+6 5 4 9
1 2 3
B A = 2 0 = 2 + 0 4 + 0 6 + 0 = 2 4 6 .
2 1 3
1 1 1+2 2+1 3+3 3 3 6
Note 2.11. Example 2.10 shows that in general the product of matrices is not commutative,
as A B 6= B A (in fact, even the sizes of these two products are different!).
1 2 3 1 2 3
Example 2.12. a) The product is undefined as 3 6= 2.
2 1 3 2 1 3
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ −1 ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 ∗ ∗ ∗
2 4 3 1 6 5 7 ∗ ∗ ∗ 2 ∗ ∗ ∗
b) If A = (aik ) = ∗ ∗ ∗ ∗ ∗ ∗ ∗ , B = (bkj ) = ∗ ∗ ∗ 5 ∗ ∗ ∗ and
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ −3 ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 4 ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗ ∗ 7×7 ∗ ∗ ∗ 6 ∗ ∗ ∗ 7×7
C = A B, then (C)34 = (AB)34 = −2 + 0 + 6 + 5 − 18 + 20 + 42 = 53
P7
(because (AB)34 = a3k bk4 = a31 b14 + a32 b24 + a33 b34 + . . . + a37 b74 ).
k=1
2.2. OPERATIONS WITH MATRICES 29
By Definition 2.4, rA = (raik ) and rB = (rbkj ), so, according to Definition 2.8, ((r A) B)ij =
n
P n
P
(raik )bkj and (A (r B))ij = aik (rbkj ). Thus the above equation shows the that (r D)ij =
k=1 k=1
((r A) B)ij = (A (r B))ij for all i = 1, . . . , m and j = 1, . . . , p. Since these three matrices are all of
the same size m × p, we can conclude that r (A B) = (r A) B = A (r B), as required.
1 if i = k
(v) Recall that Im = (δik ) is an m × m matrix, where δik = .
0 if i 6= k
Pm
Therefore (Im A)ij = δik akj = δii aij = aij , as δii = 1 is the only non-zero term among δik .
k=1
Since this is true for all possible i and j, and Im A has the same size m × n, as A, we can conclude
that Im A = A.
The equality A In = A can be proved similarly.
Now we are going to prove property (iii), which is extremely important. Let us first start with
the following general observation concerning the Σ-notation:
Note 2.14. Suppose that γkj are some real numbers, where k = 1, . . . , r and j = 1, . . . , n.
Then
r n n r
!
X X X X
γkj = γkj ,
k=1 j=1 j=1 k=1
i.e., permuting the order of the two summations does not change the result.
30 2. MATRIX ALGEBRA
Proof of Note 2.14. Using standard properties of real numbers (commutativity and asso-
ciativity), we get:
Xr Xn n
X n
X n
X
γkj =
γ1j + γ2j + . . . + γrj
k=1 j=1 j=1 j=1 j=1
| {z } | {z } | {z }
k=1 k=2 k=r
= (γ11 + γ12 + . . . + γ1n ) + (γ21 + γ22 + . . . + γ2n ) + . . . + (γr1 + γr2 + . . . + γrn )
| {z } | {z } | {z }
k=1 k=2 k=r
= (γ11 + γ21 + . . . + γr1 ) + (γ12 + γ22 + . . . + γr2 ) + . . . + (γ1n + γ2n + . . . + γrn )
| {z } | {z } | {z }
j=1 j=2 j=n
r r r n r
!
X X X X X
= γk1 + γk2 + . . . + γkn = γkj .
k=1 k=1 k=1 j=1 k=1
| {z } | {z } | {z }
j=1 j=2 j=n
We are now ready to prove the associativity of matrix multiplication (claim (iii) in Theo-
rem 2.13). Let A = (aij ) be an m × n matrix, B = (bjk ) be an n × r matrix, C = (ckl ) be an
n
P
r × s matrix, and D = A B, G = B C. Then D = (dik ) is an m × r matrix, dik = aij bjk , and
j=1
r
P
G = (gjl ) is an n × s matrix, with gjl = bjk ckl .
k=1
Let E = (eil ) = (A B) C = D C and F = (fil ) = A (B C) = A G, then both E and G have the
same size m × s, and
Xr r
X Xn
eil = dik ckl = aij bjk ckl (by definitions of E = D C and D = A B)
k=1 k=1 j=1
r
X X n Xr X n
= aij bjk ckl = γkj (by distributivity in R)
k=1 j=1
| {z } k=1 j=1
call this γkj
n r n r
! !
X X X X
= γkj = aij bjk ckl (by Note 2.14)
j=1 k=1 j=1 k=1
n r
!
X X
= aij bjk ckl (by distributivity in R, as aij are indepedent of k)
j=1 k=1
Xn
= aij gjl = fil (by definitions of G = B C and F = A G).
j=1
vn
2.3. THE TRANSPOSE OF A MATRIX 31
For example, if v = (1, 2, 3) then the associated row vector is the matrix 1 2 3 1×3 , and the
1
associated column vector is 2 .
3 3×1
This correspondence allows to multiply matrices and vectors following rules of matrix multi-
plication. Thus an m × n matrix A can be multiplied with a column vector v ∈ Rn , so that the
result is an m × 1 matrix (corresponding
to a column vector in Rm ).
1
−3 0 5
For instance, if v = 2 and A = , then the matrix product A v makes
−2 1 7 2×3
3 3×1
sense and the result is a column vector in R2 :
1
−3 0 5 12
Av = 2 = .
−2 1 7 21
3
Hence the (i, j)-th element of (A B)T equals the (i, j)-th element of B T AT , for all i, j. Therefore
(A B)T = B T AT .
1 2
1 2 3
Example 2.20. Let A = and B = 2 0 be matrices from Example 2.10. We
2 1 3
1 1
1 2
8 5 T T 1 2 1
have already seen that A B = . Now, A = 2 1 and B =
, so
7 7 2 0 1
3 3
1 2 T
T T 1 2 1 8 7 8 5
B A = 2 1 = = = (A B)T .
2 0 1 5 7 7 7
3 3
Thus (A B)T = B T AT , as expected from Theorem 2.19.(iv).
E.g., A3 = A A A, A−5 = A−1 A−1 A−1 A−1 A−1 , A0 = In (if A is invertible, that is if A−1
exists).
Note 2.28. If A is not invertible then As is undefined for s ≤ 0.
Proposition 2.29. Let A be a square matrix and let r, s be integers, so that Ar and As are
defined. Then
(i) Ar+s = Ar As ;
(ii) (Ar )s = Ars .
Proof. Exercise (need to consider different cases depending on the signs of r and s).
1 2
Example 2.30. Let A = , calculate A4 and A−4 .
3 4
2 1 2 1 2 7 10
A = AA = = .
3 4 3 4 15 22
4 2 2 7 10 7 10 199 290
A = A A = = .
15 22 15 22 435 634
↑
Prop. 2.29.(i)
34 2. MATRIX ALGEBRA
This chapter introduces Gaussian Elimination, as a method for solving systems of linear equa-
tions, and then discusses how one can use row operations to find the rank and the inverse of a
matrix.
For example,
x+y+z =5
3x − 2y = 6
is a system with 2 equations and 3 variables (unknowns).
The numbers aij are called coefficients of the system. The matrix of coefficients of the system
(3.1) is the m × n matrix
a11 a12 · · · a1n
a21 a22 · · · a2n
A = .. .. .
..
. . .
am1 am2 · · · amn
a11 a12 · · · a1n x1 a11 x1 + a12 x2 + · · · + a1n xn
a21
x2 a21 x1 + a22 x2 + · · · + a2n xn
a22 · · · a2n
Note 3.1. .. .. .. = .
.. ..
. . . . .
am1 am2 · · · amn xn am1 x1 + am2 x2 + · · · + amn xn
| {z } | {z } | {z }
A x column vector b
where A is the matrix of coefficients, x = (x1 , . . . , xn )T is the column vector of variables, and
b = (b1 , . . . , bm )T ∈ Rm .
Definition 3.2 (Augmented matrix). The augmented matrix of system (3.1) is
a11 · · · a1n b1
(A | b) = ... .. .. ,
. .
am1 · · · amn bm
which can be thought of as an m × (n + 1) matrix.
35
36 3. SYSTEMS OF LINEAR EQUATIONS
x1 + x2 + x3 = 3
Example 3.3. The augmented matrix for the system 2x1 + x2 + x3 = 4 is
x1 − x2 + 2x3 = 5
1 1 1 3
2 1 1 4 .
1 −1 2 5
The effects of O1, O2, O3 on the augmented matrix correspond to row operations:
Definition 3.4 (Row equivalence). We will say that two matrices are row equivalent if one can
be obtained from the other by applying a finite sequence of row operations.
0 −3 3 1 2 −1
Example 3.5. Matrix A = 1 2 −1 is row equivalent to matrix B = 0 1 −1 .
−2 4 7 0 8 5
This is because B can be obtained from A by applying 3 row operations as follows:
Note 3.6. Row operations are invertible and their inverses are also row operations.
x1 + x2 + x3 = 3 1 1 1 3
corresponds to
2x1 + x2 + x3 = 4 2 1 1 4
x1 − x2 + 2x3 = 5 1 −1 2 5
x1 + x2 + x3 = 3 1 1 1 3
−x2 − x3 = −2 0 −1 −1 −2
3x3 = 6 0 0 3 6
R2 →−R2
multiply second equation by−1,
divide third equation by 3 R3 → 13 R3
x1 + x2 + x3 = 3 1 1 1 3
x2 + x3 = 2 0 1 1 2
x3 = 2 0 0 1 2
R1 →R1 −R3
subtract equation 3 from
the first and the second equations R2 →R2 −R3
x1 + x2 = 1 1 1 0 1
x2 = 0 0 1 0 0
x3 = 2 0 0 1 2
x1 = 1 1 0 0 1
x2 = 0 0 1 0 0
x3 = 2 0 0 1 2
Definition 3.8 (Row echelon form). The first non-zero entry of a row in a matrix is called a
pivot. A matrix is said to be in a (row) echelon form if it has the following two properties:
• if a row is non-zero, then its pivot is further to the right with respect to pivots of the rows
above it;
• zero rows are at the bottom of the matrix.
38 3. SYSTEMS OF LINEAR EQUATIONS
0 0 0 1 δ x4 δ
Definition 3.10 (Gaussian elimination). The method of solving a system of linear equations,
which starts with the augmented matrix, performs row operations with it to end up with a reduced
row echelon form, and then “reads off” the solution, is called Gaussian elimination.
Example 3.11. Use Gaussian elimination to solve the following system of linear equations:
R2 → R2 − 2R1
2x1 + 3x2 + x3 = 6 2 3 1 6 1 1 1 6 R3 → R3 − 2R1
R1 ↔R3
2x1 + 4x2 + x3 = 5 2 4 1 5 − −−−−→ 2 4 1 5 −−−−−−−−−−−→
x1 + x2 + x3 = 6 1 1 1 6 2 3 1 6
1 1 1 6 R ↔ R3
1 1 1 6 R3 → R3 − 2R2
0 2 −1 −7 −−−2−−−− → 0 1 −1 −6 −−−−−−−−−−−→
0 1 −1 −6 0 2 −1 −7
R1 → R1 − R3
1 1 1 6 R2 → R2 + R3
1 1 0 1 R1 → R1 − R2
1 0 0 2
0 1 −1 −6 −−−−−−−−−−−→ 0 1 0 −1 −−−−−−−−−−−→ 0 1 0 −1 .
0 0 1 5 0 0 1 5 0 0 1 5
| {z } | {z }
row echelon form of reduced row echelon form
the augmented matrix of the augmented matrix
3.2. ROW OPERATIONS AND GAUSSIAN ELIMINATION 39
x1 2
Hence the solution of the system is x2 = −1. Geometrically, this is a single point in R3 .
x3 5
We can easily check the answer by substituting this back into the system:
2 · 2 + 3(−1) + 5 = 6 X
2 · 2 + 4(−1) + 5 = 5 X .
2−1+5=6 X
Example 3.12. Find the general solution of a system of linear equations whose augmented
matrix has been reduced to
R1 → R1 + 2R3
1 6 2 −5 −2 −4 R2 → R2 + R1
1 6 2 −5 0 10 R2 → 1 R2
0 0 2 −8 −1 3 −−−−−−−−−−−→ 0 0 2 −8 0 10 −−−−−−2−−→
0 0 0 0 1 7 0 0 0 0 1 7
1 6 2 −5 0 10 R1 → R1 − 2R2
1 6 0 3 0 0
0 0 1 −4 0 5 −−−−−−−−−−−→ 0 0 1 −4 0 5 ⇐⇒
0 0 0 0 1 7 0 0 0 0 1 7
| {z }
reduced row echelon form
x1 + 6x2 + 3x4 = 0
x3 − 4x4 = 5 .
x5 = 7
We now have three equations and five variables with x4 and x2 free variables: say, x4 = λ, x2 = µ
where λ, µ ∈ R. It follows that x1 = −6µ − 3λ, x3 = 5 + 4λ and x5 = 7. Thus the general solution
of the system has the form
x1 −6µ − 3λ 0 −6 −3
x2 µ 0 1 0
x3 = 5 + 4λ = 5 + µ 0 + λ 4 , λ, µ ∈ R.
x4 λ 0 0 1
x5 7 7 0 0
So, there are infinitely many solutions. Geometrically, we see that the set of all solutions forms a
plane in R5 , passing through the point P = (0, 0, 5, 0, 7) and parallel to the vectors a = (−6, 1, 0, 0, 0)
and b = (−3, 0, 4, 1, 0).
Example 3.13.
R2 → R2 − 2R1
x−y+z =2 1 −1 1 2 R3 → R3 − 4R1
2x + y + 3z = 2 2 1 3 2 −−−−−−−−−−−→
4x − y + 5z = 1 4 −1 5 1
1 −1 1 2 R3 → R3 − R2
1 −1 1 2
0 3 1 −2 −−−−−−−−−−−→ 0 3 1 −2 .
0 3 1 −7 0 0 0 −5
The last row is equivalent to the equation 0 · x + 0 · y + 0 · z = −5 ⇐⇒ 0 = −5, which is
impossible. Therefore this system has no solution. This is an example of an inconsistent system.
Definition 3.14 (Consistent system). A system of linear equations is said to be consistent if
it has at least one solution. Otherwise the system is said to be inconsistent.
We can now formulate the following important theorem:
Theorem 3.15. (Theorem about the existence and uniqueness of solutions in systems of linear
equations.)
a) A system of linear equations is consistent if and only if the corresponding augmented
matrix in a row echelon form has no row of the form 0 0 . . . 0 | b, where b 6= 0.
40 3. SYSTEMS OF LINEAR EQUATIONS
b) If the system is consistent then either there is a unique solution (i.e., there are no free
variables), or there are infinitely many solutions (i.e., there are free variables). In the
latter case the number of free variables is equal to the difference between the number of
variables and the number of non-zero rows in an echelon form of the augmented matrix of
the system.
Proof. This can be proved by a straightforward analysis of the possible echelon forms of the
augmented matrix. (Details are omitted – check yourself!)
The fact that the number of free variables is independent of the way we solve a system of linear
equations follows from
Theorem 3.16. Each matrix is row equivalent to a unique matrix in reduced row echelon form.
In particular, every row echelon form of a matrix has the same number of non-zero rows.
Proof. Omitted.
Theorem 3.17. Suppose that we are given a system of n linear equations with n unknowns:
a11 x1 + a12 x2 + . . . + a1n xn = b1
(3.2) .. .
.
an1 x1 + an2 x2 + . . . + ann xn = bn
b1
..
Let A = (aij )n×n be the corresponding matrix of coefficients and let b = . . If A is invertible
bn n×1
x1
..
then the system has a unique solution . = A−1 b.
xn
Proof. From Note 3.1, we know that system (3.2) is equivalent to the matrix equation
x1
(3.3) A x = b, where x = ... .
xn
A x = A (A−1 b) = (A A−1 ) b = In b = b X.
↑ ↑ ↑
Assoc. Def. of A−1 Prop. of In
3.3. MATRIX INVERSE USING ROW OPERATIONS 41
To show that this solution is unique, multiply both sides of the equation A x = b by A−1 on
the left:
A−1 (A x) = A−1 b
⇐⇒ (A−1 A) x = A−1 b (by associativity of matrix multiplication)
−1
⇐⇒ In x = A b (by definition of A−1 )
⇐⇒ x = A−1 b (as In x = x).
Thus the solution is indeed unique.
Definition 3.18 (Elementary matrix). An elementary matrix E of size n is an n × n matrix
obtained from the identity matrix In by applying exactly one row operation.
1 0 0 −5
0 1 0 0
E.g.,
0 0 1
is the elementary matrix of size 4 corresponding to the row operation
0
0 0 0 1
0 1 0
R1 → R1 − 5R4 , 1 0 0 is the 3 × 3 elementary matrix corresponding to the row operation
0 0 1
1 0 0 0
0 1 0 0
R1 ↔ R2 , and 0 0 1 0 is the 4 × 4 elementary matrix corresponding to the row operation
2
0 0 0 1
R3 → 21 R3 .
1 0 0
On the other hand, 0 1 0 is not an elementary matrix because in order to obtain it from
1 0 2
I3 we need to perform at least two row operations.
Note 3.19. If E is an elementary matrix obtained from In by doing some row operation, and
A is any n × n matrix, then the product E A can be obtained from A by doing the exact same row
operation.
Proof. Exercise (see Example 3.20 for the idea).
4 4 4 4 1 1 1 1 4×4
from A by applying R1 ↔ R4 .
1 0 0
b) The elementary matrix E = 2 1 0 is obtained from I3 by doing R2 → R2 + 2R1 .
0 0 1 3×3
1 2 3 1 2 3
Let A = 4 5 6. Then E A = 6 9 12 is obtained from A by doing R2 → R2 + 2R1 .
7 8 9 7 8 9
42 3. SYSTEMS OF LINEAR EQUATIONS
Proposition 3.21. Every elementary matrix is invertible, and its inverse is an elementary
matrix (of a similar type).
Proof. Let E be an elementary matrix of size n. Then there are three possibilities.
Case 1: E is obtained from In by exchanging row i with row j (for some 1 ≤ i < j ≤ n). Then
E −1 = E as E E = In . Indeed, according to Note 3.19, E E is obtained from E by exchanging row
i with row j, thus we get In back.
Case 2: E is obtained from In by multiplying row i by a number λ 6= 0. Let F be the elemen-
tary matrix obtained from In by multiplying row i by 1/λ. The same argument as above (using
Note 3.19) shows that F E = In and E F = In , hence F = E −1 .
Case 3: E is obtained from In by adding λ·ith row to j th row, for some λ ∈ R and i 6= j, 1 6 i, j 6 n.
Let F be the elementary matrix obtained from In by applying Rj → Rj − λRi . Again, in view of
Note 3.19, F E can be obtained from E by applying the row operation Rj → Rj − λRi . However,
E was obtained from In by applying Rj → Rj + λRi . Thus the two row operations cancel each
other out, so, as a result, we get In back: F E = In . Similarly, E F = In . Therefore E is invertible
and E −1 = F .
Theorem 3.22 (Characterization of invertible matrices). If A is an n × n matrix then the
following are equivalent:
(i) A is invertible;
x1
..
(ii) the equation A x = 0 only has the trivial solution x = 0, where x = . ;
xn
(iii) the reduced row echelon form of A is In ;
(iv) A is a product of n × n elementary matrices.
Proof. We will show that (i) ⇒ (ii) ⇒ (iii) ⇒ (iv) ⇒ (i).
“(i) ⇒ (ii)” By Theorem 3.17, the only solution of A x = 0 is x = A−1 0 = 0.
“(ii) ⇒ (iii)” Since the system A x = 0 has a unique solution, there are no free variables. Hence,
by Theorem 3.15, the reduced echelon form of the augmented matrix (A | 0) has no zero rows and
no rows of the form 0 . . . 0 | b, where b 6= 0. In particular, every row in the reduced echelon form
of (A | 0)starts with 1. It follows
that the only possibility for the reduced echelon form of this
1 0 ... 0 ∗
0 1 ... 0 ∗
matrix is .. .. . . .. .. . As we see the left-hand side of this reduced echelon form is
. . . . .
0 0 . . . 1 ∗ n×(n+1)
the identity matrix In . So, matrix A can be brought to In via a finite sequence of row operations.
Therefore, by Theorem 3.16, In is the reduced echelon form of A, i.e., (iii) holds.
“(iii) ⇒ (iv)” (iii) implies that we can get from A to In by applying finitely many row operations.
By Note 3.19 this means that there exist elementary matrices E1 , . . . , Ek (all n × n) such that
Ek (Ek−1 . . . (E1 A) . . .) = In . In view of the associativity of matrix multiplication, the latter is
equivalent to (Ek Ek−1 . . . E1 ) A = In .
By Proposition 3.21, for each i = 1, 2, . . . , k, Ei is invertible and Ei−1 is an elementary ma-
trix. Arguing as in the proof of Theorem 2.26.(c), one can show that Ek . . . E1 is invertible and
(Ek . . . E1 )−1 = E1−1 . . . Ek−1 . After multiplying both sides of the equation (Ek Ek−1 . . . E1 ) A = In
by (Ek . . . E1 )−1 on the left and using the standard properties of matrices from Theorem 2.13, we
obtain A = (Ek . . . E1 )−1 = E1−1 . . . Ek−1 , i.e., (iv) holds.
“(iv) ⇒ (i)” Suppose that A = E1 . . . Ek for some elementary matrices E1 , . . . , Ek . Arguing as above
(basically by Theorem 2.26.(c) and Proposition 3.21), we conclude that E1 . . . Ek is invertible and
(E1 . . . Ek )−1 = Ek−1 . . . E1−1 . Hence A = E1 . . . Ek is invertible, i.e., (i) holds.
3.3. MATRIX INVERSE USING ROW OPERATIONS 43
Theorem 3.22 will be very useful further on. As the first application, let us show that for a
square matrix to be invertible, it is sufficient only to have a left (or right) inverse of the same size.
(Recall that according to our Definition 2.21, an n × n matrix B is invertible if there is a matrix A
such that B is both the left and the right inverse of A: A B = In and B A = In .)
Theorem 3.23. If A and B are n × n matrices such that A B = In , then both A and B are
invertible and B = A−1 , A = B −1 .
x1
..
Proof. Let x = . and suppose that B x = 0. Since A B = In , we get A (B x) = A 0, so,
xn
by associativity, In x = 0, yielding that x = 0. Hence the equation B x = 0 has only the trivial
solution x = 0. So, by Theorem 3.22, B is invertible.
Now, multiplying both sides of the equation A B = In by B −1 on the right, we get (A B) B −1 =
In B −1 . Using associativity and the fact that B B −1 = In , we obtain A In = B −1 , so A = B −1 (by
properties of In ). Therefore A is invertible and A−1 = (B −1 )−1 = B (see Theorem 2.26.(b)).
Example
3.24. The claim of Theorem 3.23 does not hold for non-square matrices.E.g., take
1 0 1 0 0
1 0 0 1 0
B = 0 1 , A = . Then A B = = I2 but B A = 0 1 0 6= I3 (in
0 1 0 2×3 0 1
0 0 3×2 0 0 0
any case, a non-square matrix cannot be invertible, by definition).
Exercise. Let A be an n × n matrix. Prove that the following are equivalent:
(i) A is not invertible;
(ii) AT is not invertible;
(iii) there is a non-zero n × n matrix B such that A B = O, where O is the n × n zero matrix;
(iv) there is a non-zero n × n matrix C such that C A = O, where O is the n × n zero matrix.
We can now describe an algorithm for finding the inverse of a matrix using row operations.
Algorithm 3.25 (Matrix inverse via row operations). Start with an n × n matrix A and form
the new extended matrix (A | In ). Then apply row operations to bring A (the left part of the
extended matrix) to In .
If the above succeeds, then A is invertible and A−1 can be read off the resulting extended matrix,
which will have the form (In | A−1 ). On the other hand, if it is not possible to bring A to In via
row operations (that is, if some echelon form of A has a row of zeros), then A is not invertible.
Let us justify why the above algorithm is valid.
Proposition 3.26. Starting with a square matrix A, Algorithm 3.25 will indeed terminate,
either by showing that A is not invertible or by outputting the matrix A−1 if A is invertible.
Proof. First, we know (from Theorem 3.22) that if A cannot be brought to In via row oper-
ations then A is not invertible.
So, suppose that the algorithm is successful, i.e., we managed to transform the left-hand side
of the extended matrix to In . Recall that, according to Note 3.19, each row operation corresponds
to multiplication of A by an elementary matrix on the left, ending up in In . So, at the end of the
algorithm, the left half of the extended matrix becomes Ek . . . E2 E1 A = In . Since we do the same
row operations with the right half, it becomes Ek . . . E1 In (= Ek . . . E1 ). But, (Ek . . . E1 ) A = In
implies that A is invertible and Ek . . . E1 = A−1 by Theorem 3.23. Hence the right half of the
extended matrix ends up being A−1 .
2 8 3
Example 3.27. Find A−1 using row operations for A = 1 3 2.
2 7 4
44 3. SYSTEMS OF LINEAR EQUATIONS
First we form the extended matrix (A | I3 ), then use row operations to bring the left half to In
column-by-column:
R2 → R2 − 2R1
2 8 3 1 0 0 R ↔ R2
1 3 2 0 1 0 R3 → R3 − 2R1
1 3 2 0 1 0 −−−1−−−− → 2 8 3 1 0 0 −−−−−−−−−−−→
2 7 4 0 0 1 2 7 4 0 0 1
1 3 2 0 1 0 R2 ↔ R3
1 3 2 0 1 0 R3 → R3 − 2R2
0 2 −1 1 −2 0 −−−−−−−→ 0 1 0 0 −2 1 −−−−−−−−−−−→
0 1 0 0 −2 1 0 2 −1 1 −2 0
R3 → −R3
1 3 2 0 1 0 R1 → R1 − 2R3
1 3 0 2 5 −4 R1 → R1 − 3R2
0 1 0 0 −2 1 −−−−−−−−−−−→ 0 1 0 0 −2 1 −−−−−−−−−−−→
0 0 −1 1 2 −2 0 0 1 −1 −2 2
1 0 0 2 11 −7
0 1 0 0 −2 1 .
0 0 1 −1 −2 2
2 11 −7
Hence A is invertible and A−1 = 0 −2 1.
−1 −2 2
2 8 3 2 11 −7 1 0 0
Check: A A−1 = 1 3 2 0 −2 1 = 0 1 0 = I3 X.
2 7 4 −1 −2 2 0 0 1
(By Theorem 3.23 it is enough to check that A B = In to show that B = A−1 .)
0
augmented matrix is zero, its row echelon form is (A0 | 0), where A0 is a row echelon form of A.
Thus rank ((A | 0)) = rank(A), and, by Theorem 3.15,
number of free variables in
(3.4) = n − rank((A | 0)) = n − rank(A).
the solution of A x = 0
Note 3.29. We have used Theorem 3.16 to explain why rank(A) is well-defined, however, we
have not given a proof of Theorem 3.16. This can be partially remedied by noticing that, in view
of (3.4), the statement “rank(A) = n” is equivalent to saying that the number of free variables in
the solution of the system A x = 0 is zero, i.e., this system has a unique solution x = 0 ∈ Rn .
Obviously the uniqueness of the solution only depends on the system itself, and does not depend
on the way we solve the system. Hence the statement “rank(A) = n” is well-defined, i.e., any row
echelon form of A has exactly n non-zero rows.
3.4. RANK OF A MATRIX 45
1 2 3
Example 3.30. Find the rank of A = 4 5 6.
7 8 9
First bring A to a row echelon form:
R2 → R2 − 4R1
1 2 3 R3 → R3 − 7R1
1 2 3 R3 → R3 − 2R2
1 2 3
4 5 6 −−− −−−−−−−−→ 0 −3 −6 −−−−−−−−−−−→ 0 −3 −6 .
7 8 9 0 −6 −12 0 0 0
| {z }
row echelon form of A
As we see the number of non-zero rows in the above row echelon form of A is 2, so rank(A) = 2.
The next theorem characterizes invertible matrices as the square matrices of maximal possible
rank.
Theorem 3.31 (Characterization of invertibility in terms of rank). If A is an n × n matrix,
then the following statements are equivalent to each other:
(a) A is invertible;
0
..
(b) the equation A x = 0 has only the trivial solution x = . ∈ Rn ;
0
(c) rank(A) = n.
Proof. Note that (a) ⇔ (b) by Theorem 3.22. Now, if (b) holds then the same theorem implies
that A has In as its reduced echelon form. Therefore rank(A) = n, i.e., (b) ⇒ (c).
On the other hand, (c) ⇒ (b) because if rank(A) = n, then the number of free variables in the
solution of A x = 0 is n − rank(A) = 0 (by Theorem 3.15), i.e., x = 0 is the unique solution and
(b) is true.
Thus (a) ⇔ (b) ⇔ (c).
Example 3.32. Let A be the matrix from Example 3.30. Since rank(A) = 2 < 3 and A is a
3 × 3 matrix, using Theorem 3.31 we can conclude that A is not invertible.
Theorem 3.33 (Properties of matrix rank). Let A be an m × n matrix, let B be any matrix.
Then:
(i) if B is row equivalent to A (i.e., B can be obtained from A by applying a finite sequence
of row operations), then rank(A) = rank(B).
(ii) rank(λ A) = rank(A), for any λ ∈ R, λ 6= 0.
(iii) rank(A) ≤ min{m, n}.
(iv) rank(A + B) ≤ rank(A) + rank(B) (provided B is also an m × n-matrix).
(v) rank(A B) ≤ min{rank(A), rank(B)} (if B is an n × k matrix for some k).
Proof. (i) is obvious, as row equivalent matrices can be brought to the same row echelon form.
(ii) follows from (i) as λ A is row equivalent to A (one just needs
to multiply each rowof A by λ).
∗ ∗ ... ... ∗ ∗
0 ∗ ∗ . . . ∗ ∗
To prove (iii), note that a row echelon form of A is A0 = ... ... ... .. .. . The
. .
0 0 0 . . . ∗ ∗
0 0 0 . . . 0 0 m×n
rank of A is equal to the number of non-zero rows in A0 , by definition, which is also is equal to the
number of pivots in A0 . But the number of rows in A0 is m and the number of pivots in A0 cannot
exceed the number n, of columns in it, as each column contains at most one pivot (by definition of
a row echelon form). Hence rank(A) ≤ min{m, n}.
The proofs of (iv) and (v) are omitted, as they are beyond our present scope.
46 3. SYSTEMS OF LINEAR EQUATIONS
Determinants
For any natural number n ∈ N, let Mn (R) denote the set of all n × n matrices with real entries.
Given a square matrix A ∈ Mn (R), various real numbers can be associated to it. For example,
one useful quantity is the trace of A, defined as the sum of all diagonal entries. This chapter will
introduce another such quantity, called the determinant. Determinants are an important tool in
Mathematics, and they naturally occur in many different subjects (e.g., in Calculus they appear as
Jacobians).
a1 a1
.. ..
. (D1).(i) .
D(A) = D( 0 ) ====== 0 · D( 0 ) = 0 · D(A) = 0.
. .
.. ..
an an
Theorem 4.4 (Basic properties of determinant). Let A, B ∈ Mn (R) (i.e., A and B are n × n
matrices).
(a) If B is obtained from A by adding λ · j-th row to i-th row, for some λ ∈ R and i 6= j, then
D(A) = D(B).
(b) If B is obtained from A by interchanging two rows then D(A) = −D(B).
(c) If B is obtained from A by multiplying a row of A by some λ ∈ R, λ 6= 0, then D(B) =
λD(A) (and so D(A) = λ1 D(B)).
a1
..
Proof. Let A = . , where a1 , . . . , an ∈ Rn are row vectors of A.
an n×n
(a) Without loss of generality, assume that j > i, as the case when i > j is similar.
a1 a1 a1
.. .. ..
. . .
ai + λaj ai λaj
.. (D1).(ii)
... ) + D( .. )
B= , hence D(B) == = == = = D(
. .
a
aj aj
j
.. . .
. .. ..
an an an
| {z }
A
a1
..
.
aj
(D1).(i) . (D2)
.. ) ==== D(A) + 0 = D(A).
====== D(A) + λD(
aj
.
..
an
(b) Observe that B can be obtained from A as follows:
Ri → Ri + Rj Rj → Rj + (−1) × Ri Ri → Ri + Rj Rj → (−1) × Rj
A −−−−−−−−−→ A0 −−−−−−−−−−−−−→ A00 −−−−−−−−−→ A000 −−−−−−−−−−→ B.
Diagram:
a1 a1 a1 a1 a1
.. . . . . .
. ..
. . . . .
a a + a a + a a aj
i
. j
i i
. j
j
.
.. −→ . −→ . −→ . −→ .. .
.
. . . .
−ai −ai
a
a ai
j
.j
. . . .
.. .. .. .. ..
an an an an an
| {z } | {z } | {z } | {z } | {z }
A A0 A00 A000 B
4.1. AXIOMATIC DEFINITION OF DETERMINANT 49
By part (i) we know that D(A) = D(A0 ) = D(A00 ) = D(A000 ), and, by (D1).(i), D(B) =
−D(A000 ) = −D(A). Thus (b) holds.
Finally, (c) is true by (D1).(i).
Theorem 4.4 tells us that we can keep track of the determinant while performing row operations:
Corollary 4.5. Assume that an n × n matrix B is obtained from another n × n matrix A by
performing a single row operation. Then
Ri → Ri + λRj
1 if A −−−−−−−−−→ B, for some λ ∈ R
Ri ↔ Rj
D(B) = α · D(A), where α = −1 if A −−−−−→ B
Ri → µRi
µ if A −−−−−−→ B, for some µ ∈ R, µ 6= 0
1 0 ... 0 0
sequence of similar 0 1 . . . 0 0
row replacements
. . . . .
−−−−−−−−−−−−−→ .. . . . . .. .. = In .
. . .
.. .. . . 1 0
0 0 ... 0 1
50 4. DETERMINANTS
Since In is obtained from B by applying row replacement operations only, we know that D(B) =
D(In ) by Theorem 4.4. But D(In ) = 1 by (D3), hence D(B) = 1, so D(A) = a11 a22 · · · ann .
Now suppose that some diagonal entry of A is 0. Let us show that then D(A) = 0 (= a11 · · · ann
as one of aii = 0). Choose the maximal i, 1 ≤ i ≤ n, such that aii = 0. Then ajj 6= 0 if i < j ≤ n.
Now, if i = n, i.e., ann = 0, then the n-row of A is zero, so D(A) = 0 by Note 4.3. Otherwise, if
ann 6= 0, we can perform (n − 1) row replacements to make sure that all entries above ann become
0:
a11 . . . a1i a1,i+1 . . . a1n
.. ..
. ..
. .. .. a
. . . Rj → Rj − a jn Rn ,
nn
0 ... 0 ai,i+1 . . . ain . . . , n − 1)
(for every j = 1, 2,
A= −−−−−−−−−−−−−−−−−−−→
0 . . . 0 ai+1,i+1 . . . ai+1,n
. .. .. .. ..
.. . . . .
0 ... 0 0 . . . ann
Now we can present an algorithm for computing the determinant of any square matrix:
Algorithm 4.7. Given any square matrix A, use row operations to bring A to an upper tri-
angular form, keeping track of how the determinant changes at each step (using Theorem 4.4).
Once this is done, the determinant of the upper triangular matrix can be calculated according to
Theorem 4.6 (by taking the product of all diagonal entries).
Example 4.8. (a) Let A = (a) be a 1 × 1 matrix. Then D(A) = a D(1) = a 1 = a, as (1) = I1
and D(I1 ) = 1 by(D3).
a b
(b) Let A = ∈ M2 (R). Let us show that D(A) = ad − bc. To prove this we will
c d 2×2
consider several cases.
Case 1: a 6= 0 Then
a b R2 →R2 − ac R1 a b
c
D(A) = D( ) ========== D( c ) = a d − b = ad − bc.
c d 0 d − ab a
↑ ↑
Thm. 4.4 Thm. 4.6
4.1. AXIOMATIC DEFINITION OF DETERMINANT 51
Further
Notation.
on instead of D(A) we will write det(A) or |A|. Thus, by Example 4.8,
a b a b
= ad − bc.
det =
c d c d
Example 4.9. Calculate the determinant of the given matrix using row operations (i.e., using
Algorithm 4.7).
4 1 1
(a) A = 1 2 3.
−1 3 0
1 2 3 R2 → R2 − 4R1
4 1 1 1 2 3
R ↔R2 R 3 → R 3 + R 1
det(A) = 1 2 3 ==1==== − 4 1 1 ============= − 0 −7 −11
−1 3 0 ↑ −1 3 0 ↑ 0 5 3
Thm. 4.4.(b) Thm. 4.4.(a)
1 2 3
R3 →R3 + 57 R2 34
========== − 0 −7 −11 ===== − 1 · (−7) · − = −34.
34 7
↑ 0 0 −7 ↑
Thm. 4.4.(a) Thm. 4.6
Example 4.15. The matrices A, B and C from Example 4.9 are all invertible by Theorem 4.14,
as their determinants are non-zero.
Theorem 4.18 allows to calculate determinants inductively, as each ckl = (−1)k+l · Mkl and Mkl
is a determinant of a matrix of smaller size. A computation using cofactor expansions is especially
effective when the matrix contains many zeros.
Example 4.19. Calculate det(A) using cofactor expansion for
0 4 0 −2
2 −4 2 1
A= .
0 3 −1 0
0 7 0 6 4×4
Since there are many zero’s in the 1st column, we will use expansion by 1st column:
0
4 0 −2
2 −4 2 1
= 0 · (−1)1+1 · M11 + 2 · (−1)2+1 · M21 + 0 · (−1)3+1 · M31 + 0 · (−1)4+1 · M41
0
3 −1 0
0 7 0 6
4 0 −2
= 2 · (−1)3 · 3 −1
0
7 0 6
1+2 3 0 2+2 4 −2 3+2 4 −2
= −2 · 0 · (−1) · + (−1) · (−1) · + 0 · (−1) ·
7 6 7 6 3 0
↑
expand by col. 2
4 −2
= −2 · −
= 2 · (4 · 6 − (−2) · 7) = 2 · 38 = 76.
7 6
↑
Ex. 4.8.(b)
0 0 0
det(AT ) = (AT )11 · (−1)1+1 · M11 + (AT )12 · (−1)1+2 · M12 + . . . + (AT )1n (−1)1+n · M1n
= a11 · (−1)1+1 · M11 + a21 (−1)2+1 · M21 + . . . + an1 (−1)n+1 Mn1 = det(A),
where the last equality follows from expansion of det(A) by the 1st column.
Hence we have proved that det(AT ) = det(A) for n = k + 1.
Conclusion: by induction, det(AT ) = det(A) holds for all natural numbers n.
4.4. DETERMINANT OF A PRODUCT 55
Combining equations (4.1) and (4.2) together, we see that det(A B) = det(A) det(B) holds in this
case as well.
Exercise. Prove Theorem 4.20 using elementary matrices, similarly to the proof of Theo-
rem 4.21 above.
Corollary 4.23. Let A be an n × n matrix. Then for any s ∈ N = {1, 2, 3, . . .}, det(As ) =
(det(A))s . If, in addition, A is invertible (i.e., det(A) 6= 0), then det(A−1 ) = (det(A))−1 and
det(As ) = (det(A))s for all s ∈ Z, where Z = {0, ±1, ±2, . . .} denotes the set of integers.
56 4. DETERMINANTS
Calculate det(A2 B −2 ).
First, recall that det(A) = 76 (see Example 4.19). On the other hand, observe that B is a lower
triangular matrix, so B T is upper triangular. So, according to Theorem 4.20,
√ √
3
1 1/3
√3 5
0 2 7 −4 Thm. 4.6
det(B) = det(B T ) = ======= 1 · 2 · (−4) · (−2) = 16.
0 0 −4 −10
0 0 0 −2
Now, det(A2 · B −2 ) = det(A2 ) det(B −2 ) = det(A)2 · (det(B))−2 , where we used Theorem 4.21
and Corollary 4.23 (note that B is invertible by Theorem 4.14, as det(B) = 16 6= 0).
Hence det(A2 · B −2 ) = (76)2 · (16)−2 = 361/256.
Note 4.27. the fact that det(A) 6= 0 implies that A is invertible has already been proved in
Theorem 4.14. So, Theorem 4.26 only provides an explicit formula for the inverse matrix A−1 ,
which can be used as an alternative to Algorithm 3.25 (for finding the inverse matrix).
1
Proof of Theorem 4.26. By Theorem 3.23, it is enough to show that A |A| adj(A) = In ,
where |A| = det(A). This is obviously equivalent to
|A| 0 . . . 0
0 |A| . . . 0
A adj(A) = |A| In = .. .
.. ..
. . .
0 0 . . . |A|
Take any pair of indices i, j, 1 ≤ i, j ≤ n.
Case 1: i = j. Then
n n n
X X i=j X Thm. 4.18
(A adj(A))ij === aik · (adj(A))kj === aik cjk ==== aik cik ======== |A|.
↑ k=1 ↑ k=1 k=1
Def. 2.8 Def. 4.25 | {z }
expansion of |A| by row i
4.5. INVERTING A MATRIX USING COFACTORS 57
n
P
Case 2: i 6= j. As we have just seen, (A · adj(A))ij = aik cjk . Consider an auxiliary matrix B,
k=1
which is obtained from A by replacing the j-th row of A by the i-th row (here the diagram assumes
that i < j):
a11 . . . . . . a1n
.. ..
. .
ai1 . . . . . . ain
row i →
. ..
B= . .
. .
row j → ai1 . . . . . . ain
. ..
.. .
an1 . . . . . . ann n×n
Since B differs from A only in its j-th row, the (j, k)-th minor of B coincides with the (j, k)-th
n
P
minor of A for any k, so det(B) = aik cjk by Theorem 4.18 (expansion by row j). But det(B) = 0
k=1
n
P
by (D2) from Definition 4.1, hence (A adj(A))ij = aik cjk = 0 if i 6= j.
k=1
Cases 1 & 2 together yield that A adj(A) = |A| In ,
thus
1 1
A· adj(A) = (A · adj(A)) = In ,
|A| |A|
as required.
0 2 0
Example 4.28. Let A = 1 −1 1. Calculate A−1 using the adjugate matrix.
0 0 −1
First, let’s find the matrix of cofactors.
1+1
−1 1 1+2
1 1 1+3
1 −1
c11 = (−1) · = 1, c12 = (−1) · = 1, c13 = (−1) · = 0,
0 −1 0 −1 0 0
2 0 0 0 0 2
c21 = (−1)2+1 · = 2, c22 = (−1)2+2 · = 0, c23 = (−1)2+3 · = 0,
0 −1 0 −1 0 0
3+1 2 0
0 0 0 2
= (−1)3+2 · = (−1)3+3 ·
c31 = (−1) · = 2, c32 = 0, c33 = −2.
−1 1 1 1 1 −1
1 1 0
So, C = 2 0 0 is the matrix of cofactors. Expanding det(A) by row 1, we get det(A) =
2 0 −2
0 · c11 + 2 · c12 + 0 · c13 = 2. Since det(A) 6= 0 we know that A−1 exists. Therefore
1 2 2 1/2 1 1
1
adj(A) = C T = 1 0 0 and A−1 = adj(A) = 1/2 0 0 .
0 0 −2 |A| 0 0 −1
Check:
0 2 0 1/2 1 1 1 0 0
A A−1 = 1 −1 1 1/2 0 0 = 0 1 0 = I3 X.
0 0 −1 0 0 −1 0 0 1
Note 4.29. While this method does allow to compute the inverse of a matrix, it usually takes
much longer than the method described in Algorithm 3.25. It is only efficient if sufficiently many
entries of the matrix are zero.
CHAPTER 5
Linear transformations
Linear maps are particularly nice and amenable functions, which can be studied using matrices.
In this chapter all of the material we have learned so far comes together in developing the theory
of linear transformations.
Lemma 5.3 tells us that every matrix gives rise to a linear transformation. We will now formulate
and prove the converse statement:
Theorem 5.6. Suppose that T : Rn → Rm is a linear transformation. Let {e1 , . . . , en } be the
standard basis of Rn and let A be the m × n matrix with column vectors T (e1 ), . . . , T (en ), i.e.,
A = (T (e1 ) . . . T (en ))m×n .
Then T (x) = A x for all x ∈ Rn . Moreover, A is the unique matrix with this property (that
T (x) = A x for all x ∈ Rn ).
60 5. LINEAR TRANSFORMATIONS
x1
Proof. Let x = (x1 , . . . , xn )T = ... be any vector in Rn . Then
xn
1 0 0
x1 0 1 0
x = ... = x1 .. + x2 .. + · · · + xn .. = x1 e1 + · · · + xn en .
. . .
xn
0 0 1
Since T is a linear transformation, we have
T (x) = T (x1 e1 + · · · + xn en ) = T (x1 e1 ) + · · · + T (xn en )
Lemma 5.5
= x1 T (e1 ) + · · · + xn T (en ) ======== T (e1 ) . . . T (en ) x = A x.
| {z }
m×n matrix A
Now let’s prove the uniqueness of A. Suppose that A0
is another m × n matrix such that
T (x) = A x for all x ∈ R . Then T (ei ) = A ei = ci , where c0i is the i-th column vector of A0 ,
0 n 0 0
Figure 5.1
Figure 5.2
cos(θ + π2 )
cos θ − sin θ
Then T (e1 ) = , T (e2 ) = = . Hence, by Theorem 5.6, the
sin θ sin(θ + π2 ) cos θ
cos θ − sin θ
matrix of T is .
sin θ cos θ
Example 5.12 (Reflection in a line through the origin). Now, suppose that T : R2 → R2 is
the reflection in the line L line making angle θ ∈ R with the positive x-axis (so L : y = tan θ x,
provided cos θ 6= 0). Figure 5.3 assumes that L passes through the first quadrant and 0 < θ < π/4.
cos(2θ)
Then the angle between T (e1 ) and the positive x-axis is 2θ, hence T (e1 ) = . On the
sin(2θ)
other hand, the angle between the positive x-axis and T (e2 ) is −( π2 − 2θ) as we have to measure
angles anti-clockwise
starting from the
positive x-axis.
cos(2θ − π2 )
sin(2θ) cos(2θ) sin(2θ)
Hence T (e2 ) = = . Thus the matrix of T is .
sin(2θ − π2 ) − cos(2θ) sin(2θ) − cos(2θ)
62 5. LINEAR TRANSFORMATIONS
Figure 5.3
cos θ − sin θ
Note that det = cos2 θ + sin2 θ = 1, i.e., the determinant of the matrix of
sin θ cos θ
cos(2θ) sin(2θ)
a rotation is always equal to 1. On the other hand, det = − cos2 (2θ) −
sin(2θ) − cos(2θ)
sin2 (2θ) = −1, i.e., the determinant of the matrix corresponding to a reflection is −1.
Example 5.13. Let T be the reflection in the line L, passing through the origin and the second
quadrant, making angle 5π
6 with x-axis (anti-clockwise).
Thus L has equation y = tan( 5π √x
6 )x, i.e., y = − 3 .
x
So, the image of any point ∈ R2 under T can be computed by
y
√ √
x x 1/2
√ − 3/2 x x/2
√ − 3y/2
T( )=A = = ∈ R2 .
y y − 3/2 −1/2 y − 3x/2 − y/2 2×1
√ !
2+ 3
2
E.g., T ( ) = 1−22√3 .
−1
2
2 2 λ 0
Example 5.14. If T : R → R has matrix , for some λ, µ > 0, then T stretches/shrinks
0 µ
the plane by λ in the direction of x-axis and by µ in the direction of y-axis.
Figure 5.4
Rm
Diagram: T S
Q=S◦T
Rn Rk
Lemma 5.17. The composition of linear transformations is again a linear transformation.
Proof. Exercise. (Hint: one needs to check that if Q = S ◦ T , then for all u, v ∈ Rn , λ ∈ R,
Q(u + v) = Q(u) + Q(v) and Q(λu) = λQ(u)).
Theorem 5.18 (Matrix of the composition). Let T : Rn → Rm and S : Rm → Rk be linear
transformations with matrices A and B respectively. Then S ◦ T is a linear transformation with
matrix B A (product of B and A).
Proof. The fact that S ◦ T : Rn → Rk is a linear transformation is given by Lemma 5.17, thus
we only need to prove the claim about its matrix.
Note that according to the assumptions, T (x) = A x for all x ∈ Rn and S(y) = B y for all
y ∈ Rm . Therefore, for all x ∈ Rn we have
(S ◦ T )(x) = S T (x) (by definition of S ◦ T )
= S(A x) (as A is the matrix of T )
= B (A x) (as B is the matrix of S and y = A x ∈ Rm )
= (B A) x (by associativity of matrix multiplication).
Hence (S ◦ T )(x) = (B A) x for all x ∈ Rn , and we can apply Theorem 5.6 to conclude that B A is
the matrix of S ◦ T , as claimed.
Note 5.19. Theorem 5.18 is actually the reason why the multiplication of matrices is defined
as in Definition 2.8.
Example 5.20. Let S, T : R2 → R2 be reflections in lines L1 , L2 in R2 , making angles φ and θ
with the positive x-axis respectively. What is S ◦ T ?
64 5. LINEAR TRANSFORMATIONS
(b) The inverse of the reflection in a line through the origin is itself (Exercise: check this).
1 λ 1 −λ
(c) The inverse of the shear T with matrix is the shear S with matrix (because
0 1 0 1
1 −λ 1 λ 1 0 1 λ 1 −λ
= = I2 and = I2 ).
0 1 0 1 0 1 0 1 0 1
2 1
(d) If A = and T = TA : R2 → R2 , then T is not invertible by Proposition 5.24 below, as
0 0
A is not invertible (because det(A) = 0).
Proposition 5.24. Let T : Rn → Rn be a linear transformation and let A ∈ Mn (R) be its
matrix. Then T is invertible if and only if A is invertible.
And if T is invertible then A−1 is the matrix of T −1 . (In particular, T −1 is unique!)
Proof. “⇒” Suppose that T is invertible, i.e., there is a linear transformation S : Rn → Rn
such that S ◦ T = Id = T ◦ S, where Id is the identity map from Rn to Rn . Let B be the matrix
of S (it exists by Theorem 5.6). Then Theorem 5.18 implies that B A = In and A B = In , as In is
the matrix of Id : Rn → Rn . Hence A is invertible and B = A−1 .
“⇐” Suppose that A is invertible. Then we can let S : Rn → Rn be the linear transformation
given by A−1 : S(x) = A−1 x for all x ∈ Rn (S is linear by Lemma 5.3). It follows that
assoc.
(S ◦ T )(x) = S(T (x)) = S(A x) = A−1 (A x) ===== (A−1 A) x = In x = x, for all x ∈ Rn .
Similarly, (T ◦ S)(x) = (A A−1 ) x = In x = x for all x ∈ Rn . Thus S ◦ T = Id and T ◦ S = Id. So,
T is invertible and S is its inverse.
Note 5.25. Let T : Rn → Rn and S : Rn → Rn be linear transformations such that S ◦ T = Id
where Id is the identity transformation from Rn to itself. Then both S and T are invertible and
S = T −1 , T = S −1 .
Proof. Exercise. (Hint: see Theorem 3.23.)
Example 5.26. Let T : R2 → R2 be the linear transformation given by the formula
x −2x + y
T( )= .
y 5x − 3y
Determine whether T is invertible and find the formula for its inverse (if it exists).
1 −2 0 1
T (e1 ) = T ( )= , T (e2 ) = T ( )= .
0 5 1 −3
−2 1
So the matrix of T is A = . Observe that det(A) = (−2) · (−3) − 1 · 5 = 1 6= 0, so,
5 −3 2×2
by Theorem 2.25, A is invertible and
−1 1 −3 −1 −3 −1
A = = .
1 −5 −2 −5 −2
Hence, by Proposition 5.24, T is invertible and the matrix of T −1 is A−1 . Therefore
−1 x −1 x −3 −1 x −3x − y
T ( )=A = = .
y y −5 −2 y −5x − 2y
CHAPTER 6
Subspaces of Rn
Subspaces of Rn are the non-empty subsets that are closed under vector addition and scaling.
Basic examples of subspaces are lines and planes passing through the origin. Each subspace can
be thought of as a copy of Rm inside Rn , for some m ≤ n. Subspaces occur naturally in Linear
Algebra, as null spaces and ranges of linear transformations.
The goal of this chapter is to introduce the concepts of subspaces, spans and linear independence
in the case of Rn , preparing the reader for a more general and abstract discussion of these notions
in Linear Algebra II.
N(T ) = {x ∈ Rn | T (x) = 0 ∈ Rm }.
(b) Let A be an m × n matrix. The null space of A is the subset of Rn given by
N(A) = {x ∈ Rn | A x = 0 ∈ Rm }.
Note 6.5. Clearly, if T : Rn → Rm is a linear transformation and A is its matrix (of size m×n),
then N(T ) = N(A) ⊆ Rn .
Proposition 6.6. The null space of a linear transformation T : Rn → Rm (or of an m × n
matrix A) is a subspace of Rn .
Proof. Let us check the 3 conditions from Definition 6.1.
(i) 0 ∈ N(T ), as T (0) = 0 ∈ Rm by Proposition 5.9.(i). Therefore N(T ) 6= ∅.
(ii) If u, v ∈ N(T ), then
linearity of T as u, v ∈ N(T )
T (u + v) ========== T (u) + T (v) ========== 0 + 0 = 0.
Hence u + v ∈ N(T ).
68 6. SUBSPACES OF Rn
Note 6.8. Let A be an n × n matrix. Then A is invertible if and only if N(A) = {0} (i.e., the
null space of A consists only of the zero vector).
Proof.
A is invertible
⇐⇒ x = 0 is the only vector in Rn satisfying A x = 0 (by Theorem 3.22)
⇐⇒ N (A) = {0} (by definition of N(A)).
6.3. LINEAR SPAN 69
0 1
(c) A line L, passing through the origin and parallel to a non-zero vector a ∈ Rn can be identified
with {α a | α ∈ R} = span{a}.
Proposition 6.12. For any k ∈ N and arbitrary vectors v1 , . . . vk ∈ Rn , the linear span
V = span{v1 , . . . , vk } is a subspace of Rn .
Proof. As before, we must show that the 3 defining conditions of a subspace are satisfied.
(i) Take c1 = c2 = . . . = ck = 0. Then 0 v1 + . . . + 0 vk = 0 ∈ span{v1 , . . . , vk } = V . Hence
V 6= ∅.
(ii) Let u, v ∈ V . Then there exist c1 , . . . ck , d1 , . . . , dk ∈ R such that u = c1 v1 + . . . + ck vk and
v = d1 v1 + . . . + dk vk . Therefore, using standard properties of vectors, we have
u + v = (c1 v1 + . . . + ck vk ) + (d1 v1 + . . . + dk vk )
= (c1 + d1 ) v1 + . . . + (ck + dk ) vk ∈ span{v1 , . . . , vk } = V, since c1 + d1 , . . . , ck + dk ∈ R.
(iii) Exercise.
Example 6.13. (a) Let e1 , . . . , en be the standard basis of Rn . Then for all x = (x1 , . . . , xn )T ∈
Rn , we have x = x1 e1 + . . . + xn en . It follows that span{e1 , . . . , en } = Rn .
(b) Let a, b ∈ Rn be non-zero vectors that are not parallel to each other. Then span{a, b} =
{α a + β b | α, β ∈ R} gives rise a plane in Rn (passing through the origin and parallel to a, b).
Note 6.14. Let v1 , . . . , vk be arbitrary vectors in Rn . If V is any subspace of Rn such that
vi ∈ V for each i = 1, . . . , k, then span{v1 , . . . , vk } ⊆ V .
It follows that span{v1 , . . . , vk } is the smallest subspace of Rn containing these vectors.
Proof. Exercise.
70 6. SUBSPACES OF Rn
(c) Two vectors v1 , v2 ∈ Rn are linearly dependent if and only if they are parallel, i.e., if one of
them is a multiple of the other.
0 −3 7
(d) Vectors , , and are linearly dependent in R2 , because, as we saw in Example 6.21,
1 4 2
7 −3 34 0 7 0
− + = .
3 4 3 1 2 0
Here c1 = 73 , c2 = − 34
3 and c3 = 1.
Proof.
Proof.
−1 2 −1
Example 6.30. (a) To check whether the vectors 1 , 4 and 4 are linearly depen-
1 2 3
−1 2 −1
dent or independent, we can form the matrix A = 1 4 4 and calculate its determinant.
1 2 3
−1 2 −1 R2 → R2 + R1 −1 2 −1
R3 →R3 − 23 R2 −1 2 −1 Thm. 4.6
R3 → R3 + R1
det(A) = 1 4
4 ============ 0 6
3 ========== 0 6
3 ======= 0.
1 2 3 0 4 2 0 0 0
6.5. LINEAR INDEPENDENCE 73
−1 2
Since det(A) = 0, we can apply Corollary 6.29 to conclude that the vectors 1 , 4 and
1 4
−1
4 are linearly dependent.
3
(b) The standard basis e1 , . . . , en of Rn consists of linearly independent vectors. Indeed: A =
(e1 . . . en ) = In is the n × n identity matrix. Since det(In ) = 1 6= 0, the vectors e1 , . . . , en are
linearly independent (by Corollary 6.29).
More generally, when k 6= n we have the following:
Theorem 6.31 (Characterization of linear independence in terms of rank). Let v1 , . . . , vk be
some vectors in Rn , and let A = (v1 . . . vk )n×k be the matrix with column vectors v1 , . . . , vk .
Then v1 , . . . , vk are linearly independent if and only if rank(A) = k.
Proof.
v1 , . . . , vk ∈ Rn are linearly independent
⇐⇒ the equation A x = 0 has a unique solution x = 0 ∈ Rk (Theorem 6.28)
⇐⇒ the system A x = 0 is consistent and has no free variables
⇐⇒ the number of non-zero rows in a row echelon form of (A | 0) is k (Theorem 3.15)
⇐⇒ the number of non-zero rows in a row echelon form of A is k
⇐⇒ rank(A) = k (Definition 3.28).
Corollary 6.32. If v1 , . . . , vk ∈ Rn and k > n, then these vectors are linearly dependent.
Proof. Indeed, if k > n then the matrix A = (v1 . . . vk ) has size n × k, so rank(A) ≤ n by
the definition of the rank (see also Theorem 3.33.(iii)). Hence, k > n implies that rank(A) ≤ n < k,
so v1 , . . . , vk must be linearly dependent by Theorem 6.31.
Note 6.33. Corollary 6.32 can be re-formulated by saying that the size of a set of linearly
independent vectors in Rn cannot exceed n.
1 3 −1 2
For example, the vectors 1 , 2, −4 and 5 are linearly dependent in R3 , because
1 0 5 6
4 > 3 (by Corollary 6.32).
1 2 4
2 −1 1 4
Example 6.34. Let v1 = −1 , v2 = 3, v3 = 3 in R .
5 0 6
(a) In orderto check whether
v 1 , v 2 , v 3 are linearly independent we need to calculate rank(A),
1 2 4
2 −1 1
where A = −1
. To this end, we need to find a row echelon form of this matrix:
3 3
5 0 6 4×3
R2 → R2 − 2R1
1 2 4 R3 → R3 + R1 1 2 4 R3 → R3 + R2 1 2 4
2 −1 1 R4 → R4 − 5R1 0 −5 −7 R4 → R4 − 2R2 0 −5 −7
−−−−−−−−−−−→ −−−−−− −− −− −→ .
−1 3 3 0 5 7 0 0 0
5 0 6 0 −10 −14 0 0 0
| {z }
row echelon form of A
74 6. SUBSPACES OF Rn
−30 0 30 0
Of course, if we chose a different value for x3 , we would get a different linear dependence relation.
Finally, note that the dependence relation (6.1) can be used to express any of the vectors involved
in it as a linear combination of the others. For example, v1 = − 76 v2 + 65 v3 , v2 = − 67 v1 + 57 v3 .
Proposition 6.35. Let v1 , . . . , vk ∈ Rn be any vectors. Then v1 , . . . , vk are linearly dependent
if and only if one of these vectors is a linear combination of the others.
Proof. Exercise.
Algorithm 6.36 (Finding a linear dependence relation). Suppose you need to check whether
some vectors v1 , . . . , vk ∈ Rn are linearly independent, and find a linear dependence relation among
them in the case if they are linearly dependent. Then proceed as in Example 6.34.(b). I.e., form
the augmented matrix (A | 0), where A = (v1 . . . vk )n×k , bring it to a row echelon form. If
this row echelon form has exactly k non-zero rows, then v1 , . . . , vk are linearly independent (by
Theorem 6.31).
Otherwise a non-zero solution of A x = 0 exists, say x1 = c1 , . . . , xk = ck (you can find this
solution by bringing A to a reduced row echelon form). Then the vectors v1 , . . . , vk are linearly
dependent and c1 v1 + . . . + ck vk is a dependence relation among them.
Algorithm 6.37 (Checking linear independence). If you are given a collection of vectors
v1 , . . . , vk ∈ Rn and are only asked to determine whether they are linearly independent, then the
following approach is most efficient:
(1) if k > n then the vectors are linearly dependent by Corollary 6.32.
(2) if k ≤ n, form the matrix A = (v1 . . . vk )n×k , and calculate its rank. Then, by Theo-
rem 6.31,
• if rank(A) = k then the vectors are linearly independent;
• if rank(A) < k then the vectors are linearly dependent.
6.5. LINEAR INDEPENDENCE 75
1 4 1 2
Following Algorithm 6.36, we need to solve the system A x = 0, where A is the 4 × 4 matrix
formed from the given vectors.
R2 → R2 + 3R1
−1 −1 2 3 0 R3 → R3 − 2R1 −1 −1 2 3 0
3 3 −6 −7 0 R4 → R4 + R1 0 0 0 2 0 R ↔ R3
−−−2−−−−
(A | 0) =
−2 −1
−−− −−−−−−−−→ →
5 0 0 0 1 1 −6 0
1 4 1 2 0 0 3 3 5 0
−1 −1 2 3 0 −1 −1 2 3 0
0 1 1 −6 0 R4 → R4 − 3R2 0 1 1 −6 0 R4 → R4 − 23
2
R3
−−−−−−−−−−−→ −−− −−−−−−− −−→
0 0 0 2 0 0 0 0 2 0
0 3 3 5 0 0 0 0 23 0
−1 −1 2 3 0 R1 → −R1 1 1 −2 −3 0 R1 → R1 + 3R3
0 1 1 −6 0 R3 → 12 R3 0 1 1 −6 0 R2 → R2 + 6R3
−−−−−−−−→ −−−−−−−−−−−→
0 0 0 2 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0
1 1 −2 0 0 1 0 −3 0 0
0 1 x1 − 3x3 = 0
1 0 0 R1 → R1 − R2 0 1
1 0 0
0 0 −−−−−−−−−−−→ ⇐⇒ x2 + x3 = 0 .
0 1 0 0 0 0 1 0
x4 = 0
0 0 0 0 0 0 0 0 0 0
So, taking x3 = 1 we obtain x1 = 3, x2 = −1, x4 = 0, i.e., the given vectors v1 , v2 , v3 , v4 are
linearly dependent, with the dependence relation
−1 −1 2 0
3 3 −6 0
3 −2 − −1 + 5 = 0 , that is, 3v1 − v2 + v3 = 0.
1 4 1 0
Thus any of the vectors v1 , v2 or v3 can be expressed as a linear combination of the other vectors.
For example, v3 = −3 v1 + v2 .
Example 6.39. Check whether the vectors v1 , v2 , v4 from Example 6.38 are linearly indepen-
dent or dependent. In the latter case express one of these vectors as a linear combination of the
others.
We proceed as before, following Algorithm 6.36.
R2 → R2 + 3R1
−1 −1 3 0 R3 → R3 − 2R1 −1 −1 3 0
3 3 −7 0 R4 → R4 + R1 0
−−−−−−−−−−−→ 0 2 0 R ↔ R3
−−−2−−−−
−2 −1 →
0 0 0 1 −6 0
1 4 2 0 0 3 5 0
−1 −1 3 0 −1 −1 3 0
0 1 −6 0 R4 → R4 − 3R2 0 1 −6 0 R4 → R4 − 23
2
R3
−−−−−−−−−−−→ −−−−−−−−−− − −→
0 0 2 0 0 0 2 0
0 3 5 0 0 0 23 0
76 6. SUBSPACES OF Rn
−1 −1 3 0
0
1 −6 0 .
0 0 2 0
0 0 0 0
Thus we found a row echelon form of the matrix with 3 non-zero rows, which is precisely the
number of vectors v1 , v2 , v4 . Hence these vectors are linearly independent by Algorithm 6.36.
6.6. Bases
Definition 6.40 (Basis). Let V be a subspace of Rn and let v1 , . . . , vk ∈ V be some vectors
in V . The set {v1 , . . . , vk } is called a basis of V if the following two conditions are satisfied:
(i) v1 , . . . , vk are linearly independent;
(ii) V = span{v1 , . . . , vk } (i.e., v1 , . . . , vk span V ).
Example 6.41. (a) The standard basis e1 , . . . , en of Rn is a basis: e1 , . . . , en are linearly
independent by Example 6.30.(b) and Rn = span{e1 , . . . , en } by Example 6.13.(a).
1 4
(b) Let v1 = 2 , v2 = 5 and V = span{v1 , v2 }. Then (ii) is automatically satisfied.
3 6
To check if v1 , v2 are linearly independent, we use Algorithm 6.37:
R2 → R2 − 2R1
1 4 R3 → R3 − 3R1
1 4 R3 → R3 − 3R2
1 4
2 5 −−−−−−−−−−−→ 0 −3 −−−−−−−−−−−→ 0 −3 .
3 6 0 −6 0 0
1 4
Thus rank(2 5) = 2, as it has two non-zero rows in a row echelon form. So, by Theorem 6.31,
3 6
the vectors v1 , v2 are linearly independent. (Alternatively, we could notice that v1 is not parallel
to v2 , so they are linearly independent by Example 6.27.(iii).) Therefore the set {v1 , v2 } is a basis
of V = span{v1 , v2 }.
The following observation immediately follows from Definition 6.40:
Note 6.42. Suppose that v1 , . . . , vk are linearly independent. Then {v1 , . . . , vk } is a basis of
V = span{v1 , . . . , vk }.
Theorem 6.43 (Any n linearly independent vectors form a basis of Rn ). Let v1 , . . . , vn be n
vectors in Rn . Then {v1 , . . . , vn } is a basis of Rn if and only if these vectors are linearly independent
Proof. “⇒” If {v1 , . . . , vn } is a basis then v1 , . . . , vn must be linearly independent by the
definition of a basis.
“⇐” Suppose that the vectors v1 , . . . , vn are linearly independent. Let A ∈ Mn (R) be the n × n
matrix whose column vectors are v1 , . . . , vn . Then, according to Corollary 6.29, det(A) 6= 0, so A
is invertible (by Theorem 4.14).
Now, to show that {v1 , . . . , vn } is a basis of Rn we need to prove that these vectors span Rn ,
i.e., for any b ∈ Rn there exist x1 , . . . , xn ∈ R such that
Lemma 5.5
b = x1 v1 + · · · + xn vn ======== A x, where x = (x1 , . . . , xn )T .
Thus we see that Rn = span{v1 , . . . , vn } if and only if the equation Ax = b has a solution for every
b ∈ Rn . But the latter is true since A is invertible, by Theorem 3.17 (we can take x = A−1 b).
Therefore the vectors v1 , . . . , vn span Rn . Since they are linearly independent by the assump-
tion, we can conclude that {v1 , . . . , vn } is a basis of Rn .
The notion of a basis will play an important role in Linear Algebra II and will be studied in
much more detail in that module. In particular, the following two important statements will be
proved.
6.6. BASES 77
Theorem 6.44. If V is a subspace of Rn then V has a basis and any two bases of V have the
same number of vectors (i.e., if {v1 , . . . , vk } and {u1 , . . . , ul } are bases of V then k = l).
Theorem 6.44 tells us that the number of vectors in any basis of a subspace V is equal to the
same integer k. This k is called the dimension of V .
Theorem 6.45. Any basis of Rn has exactly n vectors. In fact, if v1 , . . . , vk ∈ Rn and k < n,
then v1 , . . . , vk do not span Rn (i.e., span{v1 , . . . , vk } is a proper subspace of Rn ).
The above theorems can be used to determine whether or not a given collection of vectors forms
a basis.
Example 6.46. Determine whether the following vectors form a basis of Rn .
2 5 −3
(a) 0 , 6 ,
7 in R3 .
0 0 8
2 5 −3
Let us start with checking linear independence: A = 0 6 7, so det(A) = 2·6·8 = 96 6= 0
0 0 8
(cf. Theorem 4.6). Therefore thesevectors are
linearly independent by Corollary 6.29. Now,
2 5 −3
applying Theorem 6.43, we see that 0 , 6 , 7 form a basis of R3 .
0 0 8
−1 2 −1
(b) 1 , 4 and 4 in R3 .
1 2 3
These vectors are linearly dependent by Example 6.30.(a), so they cannot form a basis of R3
(by Definition 6.40).
−1 2
(c) 1 , 4 in R3 .
1 2
These two vectors are clearly linearly independent, as they are not multiples of each other.
However, they do not form a basis of R3 because any basis of R3 must consist of 3 vectors (see
Theorem 6.45).
CHAPTER 7
Eigenvalues and eigenvectors are important tools for studying matrices and linear transfor-
mations. They play a crucial role in differential equations, stochastic analysis, in the study of
networks, etc. The aim of this chapter is to introduce the first methods for computing eigenvalues
and eigenvectors of a matrix, and to describe an application of these notions in Google’s page
ranking algorithm.
Geometric meaning: v ∈ Rn is an eigenvector if A stretches (or shrinks) it, but preserves the
line L passing through v (see Figure 7.1).
Figure 7.1
1 α 1 0
Example 7.2. Let T : R2 → R2be a shear (so its matrix A is of the form or ).
0 1 β 1
Then T actually fixes one of the coordinate vectors. Figure 7.2 below assumes that α > 0. E.g.,
Figure 7.2
1 α −1 −1 −1 1 α
= , so is an eigenvector of with eigenvalue λ = 1.
0 1 0 0 0 0 1
78
7.1. EIGENVALUES AND EIGENVECTORS 79
7 −15 3
Example 7.3. Let A = . Then is an eigenvector of A corresponding to eigen-
2 −4 1
value λ = 2. Indeed, this can be easily checked:
7 −15 3 6 3
= =2 .
2 −4 1 2 1
Question 7.4. Given a square matrix, how can we find its eigenvalues and eigenvectors?
Proposition 7.5 (Determinant criterion for eigenvalues). Let A be an n × n matrix. Then
λ ∈ R is an eigenvalue of A if and only if det(A − λIn ) = 0.
Proof. First observe that for any vector v ∈ Rn and λ ∈ R,
(7.1) A v = λ v ⇐⇒ (A − λ In ) v = 0.
Indeed, using standard properties of matrices, we have
(A − λ In ) v = A v − (λ In ) v = A v − λ (In v) = A v − λ v,
and the equation A v − λ v = 0 is obviously equivalent to A v = λ v.
We can now prove the claim of the proposition:
λ is an eigenvalue of A
⇐⇒ ∃ v ∈ Rn , v 6= 0, such that A v = λ v (by Definition 7.1.(a))
n
⇐⇒ ∃ v ∈ R , v 6= 0, such that (A − λ In ) v = 0 (by (7.1))
n
⇐⇒ ∃ v ∈ R which is a non-zero solution of (A − λ In ) v = 0
⇐⇒ the matrix B = A − λ In is not invertible (by Theorem 3.22)
⇐⇒ det(B) = det(A − λ In ) = 0 (by Theorem 4.14).
Proposition 7.5 suggests a method for finding eigenvalues of a matrix.
2 1
Example 7.6. Find the eigenvalues of A = .
1 2
2 1 1 0 2−λ 1
We start with forming the matrix A − λ I2 = −λ = . Next, we
1 2 0 1 1 2−λ
calculate the determinant of this matrix, set it equal zero and solve the resulting equation:
det(A − λ I2 ) = (2 − λ) (2 − λ) − 1 = λ2 − 4λ + 3.
Hence
det(A − λI2 ) = 0 ⇐⇒ λ2 − 4λ + 3 = 0 ⇐⇒ λ = 1 or λ = 3.
Thus, by Proposition 7.5, A has two eigenvalues λ1 = 1 and λ2 = 3.
(We can check that det(A − λI2 ) = 0 for λ = 1, 3:
1 1
λ1 = 1, A − λ1 I2 = A − I2 = , so det(A − λ1 I2 ) = 0 X.
1 1
−1 1
λ2 = 3, A − λ2 I2 = A − 3 I2 = , so det(A − λ2 I2 ) = 0 X.)
1 −1
As we see from Example 7.6, det(A−λI2 ) = λ2 −4λ+3 turned out to be a quadratic polynomial
in λ. In fact, for any n × n matrix A, det(A − λ In ) is always a polynomial in λ, of degree n.
Definition 7.7 (Characteristic polynomial). Let A be an n × n matrix. The characteristic
polynomial of A is defined as pA (λ) = det(A − λ In ).
The equation pA (λ) = 0 (⇔ det(A − λ In ) = 0) is called the characteristic equation of A.
So, to find the eigenvalues we first compute the characteristic polynomial, and then we find its
roots.
80 7. EIGENVALUES, EIGENVECTORS AND APPLICATIONS
Note 7.8. If A ∈ Mn (R), the characteristic polynomial of A will have degree n. So it cannot
have more than n distinct roots, hence A cannot have more than n eigenvalues.
Observe that λ = −1 is a root (usually in the case of a 3 × 3 matrix you will be given one of the
eigenvalues, so you will know one of the roots. Also, for integer roots you can test possible divisors
of the last term of the polynomial: 2 in this case. Its divisors are ±1, ±2).
7.1. EIGENVALUES AND EIGENVECTORS 81
λ2 + 3λ + 2
To find the remaining roots, factorize the poly- λ + 1 λ3 + 4λ2 + 5λ + 2
nomial. For example, by using long division: −1 λ3 + λ2
is a root, so λ3 + 4λ2 + 5λ + 2 must be divisible 3λ2 + 5λ + 2
by λ + 1. 3λ2 + 3λ
Thus we obtain that
2λ + 2
λ3 + 4λ2 + 5λ + 2 = (λ + 1)(λ2 + 3λ + 2).
2λ + 2
0
Now, we factorize λ2 +3λ+2. Since λ = −1, −2 are its roots, we have λ2 +3λ+2 = (λ+1)(λ+2).
Therefore pA (λ) = −(λ + 1)2 (λ + 2).
Hence λ1 = −1 and λ2 = −2 are the eigenvalues of A. Note that there are only two of them,
but λ1 = −1 is of multiplicity 2.
Let us now find thecorrespondingeigenvectors.
−2 −1 −2 x 0
λ1 = −1. A − λ1 I3 = 1 0 1. Thus we need to solve (A − λ1 I3 ) y = 0.
1 1 1 z 0
R2 → R2 + 2R1
−2 −1 −2 0 R1 ↔ R2
1 0 1 0 R3 → R3 − R1
1 0 1 0 −−−−−−−→ −2 −1 −2 0 −−−−−−−−−−−→
1 1 1 0 1 1 1 0
1 0 1 0 R → R + R2
1 0 1 0
0 −1 0 0 −−−3−−−−3−−−− x+z =0
→ 0 −1 0 0 ⇐⇒
−y = 0
0 1 0 0 0 0 0 0
x 1
Taking z = −1, we get that y = 0 is an eigenvector corresponding to the eigenvalue
z −1
λ1 = −1.
−1 −1 −2 x
λ2 = −2. A − λ2 I3 = 1 1 1 , and we look for a non-zero solution of (A − λ2 I3 ) y =
1 1 2 z
0 x 1
0. We can guess that y = −1 works, but, to be rigorous, we should check it:
0 z 0
−3 −1 −2 1 −2 1
1 −1 1 −1 = 2 = −2 −1 X.
1 1 0 0 0 0
1
So −1 is an eigenvector corresponding to the eigenvalue λ2 = −2.
0
Example 7.13. Instead of long division you can use other ways to factorize the polynomial
λ3 + 4λ2 + 5λ + 2 (knowing that λ = −1 is a root):
(1) Work backwards (keeping in mind that the polynomial is divisible by λ + 1):
λ3 + 4λ2 + 5λ + 2 = λ3 + 4λ2 + 3λ + 2(λ + 1)
= λ3 + λ2 + 3(λ2 + λ) + 2(λ + 1)
= λ2 (λ + 1) + 3λ(λ + 1) + 2(λ + 1)
= (λ + 1)(λ2 + 3λ + 2) = (λ + 1)(λ + 1)(λ + 2).
82 7. EIGENVALUES, EIGENVECTORS AND APPLICATIONS
where nj is the number of outgoing links from page j and Lk is the set of pages which link to page
x3 x4 x3 x4
k. In our example, if k = 1, the equation (7.2) becomes x1 = + = + .
n3 n4 1 2
x1 x1 x2 x4 x1 x2
Similarly, x2 = , x3 = + + , x4 = + . This is equivalent to the following
3 3 2 2 3 2
matrix equation:
1
0 0 1 2 x1
1
3 0 0 0 x2
(7.3) A x = x, where A =
1 1 1
x .
and x =
3 2 0 2 3
1 1
3 2 0 0 x4
The matrix A = (aij ) constructed this way is called the link matrix of the web. Thus, by
definition,
0 if page j does not link to page i
aij = 1
if page j links to page i ,
nj
7.4. SYMMETRIC MATRICES 85
un
u1 · (λ1 v1 ) u1 · (λ2 v2 ) . . . u1 · (λn vn )
u · (λ1 v ) u · (λ2 v ) . . . u · (λn v )
2 1 2 2 2 n SP3
.. .. .. ====
. . .
un · (λ1 v1 ) un · (λ2 v2 ) . . . un · (λn vn )
λ 1 u1 · v 1 λ 2 u1 · v 2 · · · λ n u1 · v n λ1 0 · · · 0
λ1 u · v λ2 u · v · · · λn u · v (7.6) 0 λ2 · · · 0
2 1 2 2 2 n
==== .. .. ,
.. .. .. .. . .
. . . . . . .
λ 1 un · v 1 λ 2 un · v 2 · · · λ n un · v n 0 0 · · · λn
as claimed.
1 1 1
Example 7.31. (a) Let A = 1 1 1 be as in Example 7.25.(b).
1 1 1
−1 −1 1
Then 1 , 0 and 1 are eigenvectors of A. It is easy to check that these vectors
0 1 1
| {z } | {z }
λ1 =0 λ2 =3
−1 −1 1
are linearly independent, so we set P = 1 0 1. Calculating P −1 (using Algorithm 3.25),
0 1 1
1 2 1
−3 3 −3
0 0 0
we get P −1 = − 31 − 13 2 . A simple computation now shows that P −1 A P = 0 0 0,
3
1 1 1 0 0 3
3 3 3
which is in line with the claim of Theorem 7.30.
7.4. SYMMETRIC MATRICES 89
4 0 4
(b) Let A = 0 4 4 be the symmetric matrix from Example 7.18. The eigenvectors of A are
4 4 8
1 1 1
v1 = −1 (corresponding to λ1 = 4), v2 = 1 (corresponding to λ2 = 0) and v3 = 1
0 −1 2
(corresponding to λ3 = 12). v1 , v2 , v3 are linearly independent by Corollary 7.29, so,
The vectors
1 1 1
for P = (v1 v2 v3 )3×3 = −1 1 1, Theorem 7.30 claims that P is invertible and P −1 A P =
0 −1 2
λ1 0 0 4 0 0
0 λ2 0 = 0 0 0 .
0 0 λ3 0 0 12
Note that vi · vj = 0 if i 6= j, by Theorem 7.27. This fact can simplify finding P −1 , because it
immediately tells us that
1 −1 0 1 1 1 2 0 0
P T P = 1 1 −1 −1 1 1 = 0 3 0 ,
1 1 2 0 −1 2 0 0 6
where 2 = v1 · v1 = kv1 k2 , 3 = v2 · v2 = kv2 k2 and 6 = v3 · v3 = kv3 k2 . This shows that P T is
very close to being the inverse of P . Namely, we can get P −1 by dividing each row vector of P T
1 1
2 −2 0
by the square of its norm. Indeed, P −1 = 31 1 1 by Theorem 3.23, as
3 −3
1 1 1
6 6 3
1
− 12
2 0 1 1 1 1 0 0
1 1
− 13 −1 1 1 = 0 1 0 = I3 .
3 3
1 1 1
6 6 3 0 −1 2 0 0 1
CHAPTER 8
This chapter will introduce the notions of orthogonal and orthonormal sets of vectors and
will discuss diagonalization of symmetric matrices in orthonormal bases. It will also introduce
the Gram-Schmidt process for obtaining an orthonormal set from any linearly independent set of
vectors. The chapter will conclude with discussions of applications of Linear Algebra in the study
of quadratic forms and in the theory of conic sections.
90
8.2. GRAM-SCHMIDT ORTHONORMALIZATION PROCESS 91
1 0 ··· 0
0 1 · · · 0
= .. .. . . .. = In ,
. . . .
0 0 ··· 1
1
vT1
kv1 k2
0 if i 6= j ..
because vi ·vj = . By Theorem 3.23, P is invertible and P −1 = .
2
kvi k if i = j .
1
kvn k2
vTn
Corollary 8.5. If {v1 , . . . , vn } is an orthonormal set of vectors and P = (v1 . . . vn )n×n , then
P is invertible and P −1 = P T .
T
v1
−1 ..
Proof. By Proposition 8.4, P = . = P T (as kvi k2 = 1 and, in particular, vi 6= 0, for
vTn
each i = 1, 2, . . . , n).
Definition 8.6 (Orthogonal matrix). An n × n matrix P is called orthogonal if P T = P −1
(i.e., if P T P = In ).
Note 8.7. An n×n matrix P is orthogonal if and only if its column vectors form an orthonormal
set.
Proof. If v1 , . . . , vn is an orthonormal set of vectors in Rn then the n×n matrix P = (v1 . . . vn )
is orthogonal by Corollary 8.5. The proof of the opposite implication is an exercise.
v 2 · u1
u2 = v 2 − u1 = v2 −(projection of v2 along u1 )
u1 · u1
We need to check that u1 is orthogonal to u2 , these vectors are non-zero and span{v1 , v2 } =
span{u1 , u2 }.
1) Using standard properties of scalar product (see Theorem 1.8), we obtain
v2 · u1 v2 · u1
u2 · u1 = v2 − u1 u1 = v 2 · u1 − (u1 · u1 ) = v2 · u1 − v2 · u1 = 0.
u1 · u1 u1 · u1
Hence u1 · u2 = 0
2) Observe that u1 6= 0 as u1 = v1 6= 0 (because otherwise v1 , . . . , vk would be linearly
dependent by Example 6.27.(i)).
v2 · u1
Arguing by contradiction, suppose that u2 = 0. Since u2 = v2 − u1 = 0, we
u1 · u1
v ·u
see that c v1 + 1 v2 + 0 v3 + · · · + 0 vk = 0, where c = − u2 ·u1 ∈ R. Thus v1 , . . . , vk are
1 1
linearly dependent, contradicting our assumption. Therefore u2 6= 0.
3) Note that, by construction, u1 = v1 and u2 is a linear combination of v2 and u1 = v1
so, span{u1 , u2 } ⊆ span{v1 , v2 } (by Note 6.14, as span{v1 , v2 } is a subspace of Rn by
Proposition 6.12).
v2 · u1
On the other hand, v1 = u1 ∈ span{u1 , u2 } and v2 = u2 + u1 ∈ span{u1 , u2 },
u1 · u1
so span{v1 , v2 } ⊆ span{u1 , u2 } by Note 6.14 (as span{u1 , u2 } is a subspace of Rn by
Proposition 6.12). Therefore span{v1 , v2 } = span{u1 , u2 }, as claimed.
Thus we have shown that {u1 , u2 } is an orthogonal set of non-zero vectors in Rn satisfying
span{u1 , u2 } = span{v1 , v2 }, so the theorem is proved in the case k = 2. The general case (for any
k ∈ N) can be proved by induction on k.
Now, let us proceed to stage 2. From stage 1 we know that ui · uj = 0 if i 6= j. So, when
we set wi = ûi , i = 1, . . . , k, we have kwi k = 1 (wi · wi = kwi k2 = 1) and wi · wj = ûi · ûj =
1 n
ku k ku k (ui · uj ) = 0 if i 6= j, 1 ≤ i, j ≤ k. Thus {w1 , . . . , wk } is an orthonormal set in R .
i j
Clearly span{w1 , . . . , wk } = span{u1 , . . . , uk }, so span{w1 , . . . , wk } = span{v1 , . . . , vk } = V .
Therefore {w1 , . . . , wk } is an orthonormal basis of V (by Note 8.10).
1 3 2
Example 8.12. Let V = span 0 , −1 , 2 in R3 . Use the Gram-Schmidt process
2 1 1
to find an orthonormal basis of V .
1 3 2
We start with the vectors v1 = 0 , v2 = −1 and v3 = 2. First we check that
2 1 1
these vectors are linearly independent (exercise). Next, we apply the Gram-Schmidt process (Al-
gorithm 8.8).
8.3. ORTHOGONAL DIAGONALIZATION OF SYMMETRIC MATRICES 93
1 3 1 2
v2 · u1 5
Stage 1. u1 = v1 = 0, u2 = v2 − u1 = −1 − 0 = −1 (check: u2 · u1 =
2 u1 · u1 1 5 2 −1
2 − 2 = 0 X).
13
2 1 2 15 2
v 3 · u1 v3 · u2 4 1 13 13
u3 = v 3 − u1 − u2 = 2 − 0 − −1 = 6 = 5 .
u1 · u1 u2 · u2 5 6 30
1 2 −1 − 13
30 −1
2
To simplify calculations, let us take u3 = 5 (since it is still orthogonal to u1 , u2 : u1 · u3 =
−1
0 = u2 · u3 ).
1 2
1 1
Stage 2. Now we normalize: w1 = û1 = √ 0 , w2 = û2 = √ −1 and w3 = û3 =
5 2 6 −1
2 1 2 2
1 1 1 1
√ 5 . Thus √ 0 ,√ −1 , √ 5 is an orthonormal basis of V .
30 −1 5
2 6 −1 30 −1
To prove(b), let P =(w1 . . . wn )n×n . Then P is an orthogonal matrix by Corollary 8.5 and
λ1 ... 0
−1 .. .. .. by Theorem 7.30. Thus (b) follows, as P −1 = P T by definition of an
P AP = . . .
0 . . . λn
orthogonal matrix.
5 −3
Example 8.15. (a) Let A = . Find an orthonormal basis of R2 formed of eigenvec-
−3 5
tors of A and find an orthogonal 2 × 2 matrix P such that P T A P is a diagonal matrix.
First we need to find the eigenvectors of A.
5 − λ −3
PA (λ) = det(A − λ I2 ) =
= (5 − λ)2 − 9 = λ2 − 10λ + 16 = (λ − 8)(λ − 2).
−3 5 − λ
So λ1 = 2, λ2 = 8 are the eigenvalues of A, and we proceed by finding the corresponding eigenvec-
tors.
3 −3 x 0 x 1
λ1 = 2. (A − 2 I2 ) v = 0 ⇐⇒ = . Evidently we can take = . So
−3 3 y 0 y 1
1 5 −3 1
v1 = is an eigenvector of A corresponding to λ1 = 2. (Check: A v1 = =
1 −3 5 1
2 1
=2 X.)
2 1
−3 −3 x 0 x 1
λ2 = 8. (A − 8 I2 ) v = 0 ⇐⇒ = . Clearly we can take = . Thus
−3 −3 y 0 y −1
1 5 −3 1
v2 = is an eigenvector of A corresponding to λ2 = 8. (Check: A v2 = =
−1 −3 5 −1
8 1
=8 X.)
−8 1
1 1
Observe that v1 · v2 = 0 (as expected from Theorem 7.27), so we let w1 = v̂1 = √2
1
1
and w2 = v̂2 = √12 . Then {w1 , w2 } will be an orthonormal basis of R2 formed out of the
−1
eigenvectors of A. !
√1 √1
Finally, from Theorem 8.14 we know that P = 2 2 is an orthogonal matrix such that
√1 − √12
2
2 0
PT AP = .
0 8
2 1 1
(b) B = 1 2 1.
1 1 2
The standard calculations show that pB (λ) = −(λ − 1)2 (λ − 4), so λ1 = 1 and λ2 = 4 are the
eigenvalues (λ1 is of multiplicity 2).
The following fact (which we present without proof) will be useful:
Fact 8.16. If A is a symmetric matrix and λ is a root of the characteristic polynomial pA (λ)
of multiplicity k, then there are precisely k linearly independent eigenvectors of A corresponding to
λ.
Let us find 3 eigenvectors of
B.
1 1 1 x 0
λ1 = 1. (B − I3 ) v = 0 ⇐⇒ 1 1 1 y = 0 ⇐⇒ x + y + z = 0. So the solution of
1 1 1 z 0
this system has 2 free variables y and z, which means that we can find two linearly independent
8.4. QUADRATIC FORMS 95
1
eigenvectors (as expected from Fact 8.16). If y = −1 and z = 0 we get x = 1, so so v1 = −1.
0
1
If y = 0 and z = −1, we have x = 1, so v2 = 0. Evidently v1 , v2 are linearly independent,
−1
and both of them are eigenvectors of B corresponding to the eigenvalue λ1 = 1.
Problem: v1 · v2 = 1 6= 0, i.e., these vectors are not orthogonal. We need to apply the Gram-
Schmidt process (Algorithm 8.8) to them (it is a fact that applying this process will result in w1 , w2 ,
which are still eigenvectors of B for the same eigenvalue λ1 = 1).
1
1 1
1 2
v2 · u1 1
Thus we set u1 = v1 = −1, u2 = v2 − u1 = 0 − · −1 = 12 .
0 u1 · u1 2
−1 0 −1
(Check: u1 · u2 = 0 X)
Normalizing u1 and u2 , we obtain
1
1 2 1
1 2 1 1
w1 = û1 = √ −1 and w2 = û2 = √ 2 = √ 1 .
2 6 6
0 −1 −2
(It is not hard to check that w1 and w2 are linearly independent eigenvectors of B corresponding
to the eigenvalue λ1 = 1.)
row
−2 1 1 0 operations
1 0 −1 0
λ2 = 4. (B − 4 I3 ) v = 0 ⇐⇒ 1 −2 1 0 −−−−−−−−→ 0 1 −1 0 . As expected
1 1 −2 0 0 0 0 0
from Fact 8.16, the solution has 1 free variable (as the multiplicity
of λ 2 is 1), so there is only one
1
linearly independent eigenvector for λ2 = 4. E.g., v3 = 1. From Theorem 7.27 we know that
1
v3 is orthogonal to eigenvectors corresponding to λ1 = 1, hence v3 must
be orthogonal to w1 , w2
1
1
(v3 · w1 = 0, v3 · w2 = 0 X). So, it is enough to set w3 = v̂3 = √ 1.
3 1
1 1 1 1
1 1
Thus √ −1 , √ 1 , √ 1 is an orthonormal basis of R4 , made out of eigen-
2 6 −2 3 1
0
1 1 1
√ √ √
2 6 3
vectors of B. Finally, we take P = (w1 w2 w3 ) = − √12 √1 √1 . From Theorem 8.14 we
6 3
0 − √26 √1
3
know that P an orthogonal matrix such that
1 0 0
P T A P = 0 1 0 .
0 0 4
(v) A and QA are indefinite if and only if A has both positive and negative eigenvalues.
Proof. Recall that, by Theorem 7.24, all eigenvalues of the symmetric matrix A are real
numbers, so (i)–(v) cover all possibilities. We will only prove claim (i), as proofs of the other
statements are similar.
(i) “⇒” Suppose that QA is positive definite. Let λ ∈ R be an eigenvalue of A. This means
that there is some v ∈ Rn , v 6= 0, such that A v = λ v. Then, using standard properties of matrix
multiplication, we have
1
QA (v) = vT A v = vT (λ v) = λ (vT v) = λ kvk2 , so λ = QA (v)
kvk2
as kvk2 > 0 because v 6= 0 (see property SP4 of scalar product). Now, QA (v) > 0 as QA is positive
definite and v 6= 0, hence λ > 0, as claimed.
“⇐” Suppose that each eigenvalue of A is strictly positive. By Theorem 8.14, there exists an
orthogonal matrix P ∈ Mn (R) such that
λ1 . . . 0
(8.1) P T A P = ... . . . ... ,
0 . . . λn n×n
where λ1 , . . . , λn are eigenvalues of A (so λi > 0 for each i, 1 ≤ i ≤ n, by our assumption). Let us
multiply both sides of equation (8.1) by P on the left and by P T on the right:
λ1 . . . 0
P (P T A P ) P T = P ... . . . ... P T .
0 . . . λn
Recall that P is orthogonal, so P T P = P P T = In . Consequently, by associativity of matrixmul-
λ1 . . . 0
T T T T .. . . . T
tiplication, P (P A P ) P = (P P ) A (P P ) = In A In = A. Thus A = P . . .. P .
0 . . . λn
Now, for every x ∈ R we have
λ1 . . . 0 λ1 . . . 0
QA (x) = xT A x = xT P ... . . . ... P T x ===== (xT P ) ... . . . ... (P T x)
assoc.
0 . . . λn 0 . . . λn
λ1 . . . 0 λ1 . . . 0
============= (P T x)T ... . . . ... (P T x) = yT ... . . . ... y,
props. of transpose
0 . . . λn 0 . . . λn
y1
T n ..
where y = P x ∈ R . So, if y = . , then
yn
λ1 . . . 0 y1
.. . . . . 2 2
QA (x) = (y1 . . . yn ) . . .. .. = λ1 y1 + . . . + λn yn > 0,
0 . . . λn yn
0
..
provided y 6= . , because λi > 0 for all i by the assumption.
0
Now, it remains to check that if x 6= 0, then y 6= 0. Indeed: y = P T x, and P T is invertible
(P = P −1 , so (P T )−1 = (P −1 )−1 = P ). Hence the only solution of P T x = 0 is x = 0 (see
T
98 8. ORTHONORMAL SETS AND QUADRATIC FORMS
Intersections of the cone C with a plane in R3 are curves, called conic sections. They could be
several types: ellipses, parabolas, hyperbolas, two lines, one line, one point.
The three main types are:
• Parabolas, given by equation y = ax2 + bx + x or x = ay 2 + by + c where a, b, c ∈ R, a 6= 0.
y = (x − 2)2 + 1 = x2 − 2x + 5
x = − 15 (y + 3)2 + 2 = − 51 y 2 − 56 y + 15
x2 y 2 x2 y 2
− 2 =1 − + 2 =1
a2 b a2 b
x2 y 2
• Ellipses: + 2 = 1, for some a, b > 0 (in standard form).
a2 b
x2 y 2
+ = 1 in the case when a > b.
a2 b2
For every quadratic form Q : R2 → R and every α ∈ R the level set {x ∈ R2 | Q(x) = α}
represents some conic section (or the empty set) in R2 .
For instance, if Q(x) = x2 − 5y 2 and α = 2, then
x2 y2
Q(x) = α ⇐⇒ x2 − 5y 2 = 2 ⇐⇒ √ − p = 1.
( 2)2 ( 2/5)2
Algorithm 8.24 (Finding the standard form of a conic section). Given a quadratic form
Q(x) = ax2 + 2bxy + dy 2 , we want to find the standard form of the conic section Q(x) = α
(⇔ ax2 + 2bxy + dy 2 = α), for some α ∈ R.
100 8. ORTHONORMAL SETS AND QUADRATIC FORMS
a b
Find the symmetric matrix A = , corresponding to Q. Find an orthogonal 2 × 2 matrix
b d
λ 0
P such that P T A P = , where λ, µ are the eigenvalues of A (as in Example 8.15).
0 µ
x̃ x̃ T x x x̃
Define a new set of coordinates x̃ = , by setting =P (so that =P ,
ỹ ỹ y y ỹ
as P T = P −1 since P is orthogonal).
Then
props. of transpose
Q(x) = xT A x = (P x̃)T A (P x̃) ============= (x̃T P T ) A (P x̃)
assoc. λ 0 λ 0 x̃
===== x̃T (P T A P ) x̃ = x̃T x̃ = (x̃ ỹ) = λx̃2 + µỹ 2 .
0 µ 0 µ ỹ
λ µ
So Q(x) = α ⇔ λ x̃2 + µ ỹ 2 = α ⇔ x̃2 + ỹ 2 = 1 (if α 6= 0). This gives a standard form of
α α
the corresponding conic section in the new coordinates x̃.
p q
Proposition 8.25. If P = is an orthogonal 2 × 2 matrix, then P is either the matrix
r s
of a rotation about the origin (if det(P ) = 1) or P is the matrix of a reflection in a line through
the origin (if det(P ) = −1).
Proof. Since P is orthogonal, we know that its column vectors form an orthonormal set in R2
(see Note 8.7). In particular p2 + r2 = 1. Hence there exists θ ∈ [0, 2π) such that p = cos(θ) and
r = sin(θ).
T −1 −1 1 s −q
Now, recall that P = P , as P is orthogonal and P = det(P ) by Theorem 2.25.
−r p
Since det(P ) = ±1 (exercise), we have
two cases:
−1 s −q T p r
Case 1: det(P ) = 1. Then P = = P = , which implies that p = s and
−r p q s
cos(θ) − sin(θ)
q = −r. Thus s = p = cos(θ) and q = −r = − sin(θ), so P = is the matrix of
sin(θ) cos(θ)
the anti-clockwise rotation by θ about the origin.
−s q p r
Case 2: det(P ) = −1. Then P −1
= =P =T . Hence p = −s and q = r. So,
r −p q s
cos(θ) sin(θ)
s = −p = − cos(θ) and q = r = sin(θ). It follows that P = is the matrix of the
sin(θ) − cos(θ)
reflection in the line, making angle θ/2 with the positive x-axis.
Algorithm 8.24 and Proposition 8.25 together imply that the new coordinate axes for x̃ and
ỹ are obtained
from the oldones by applying either a rotation of a reflection.
Indeed, suppose
cos(θ) − sin(θ) x x̃
that P = represents the rotation by θ. Since = P , the direction
sin(θ) cos(θ) y ỹ
1 cos(θ) x
vector ẽ1 of the x̃-axis, will be equal to P ẽ1 = P = in the original coordinates .
0 sin(θ) y
Thus the x̃-axis isobtained
by rotating
the x-axis about the origin by angle θ anti-clockwise. And
− sin(θ) cos(θ + π/2)
P ẽ2 = = , showing that the ỹ-axis is obtained by rotating the y-axis
cos(θ) sin(θ + π/2)
about the origin by angle θ anti-clockwise.
8.5. CONIC SECTIONS 101
Example 8.26. Find the standard form of the conic section given by 3x2 + 2xy + 3y 2 = 8, and
sketch its graph.
x 2 2 3 1
Here Q( ) = 3x + 2xy + 3y , so it corresponds to the symmetric matrix A = . We
y 1 3
start with “diagonalizing” A (as in Example 8.15).
1
The eigenvalues of A are λ1 = 2, λ2 = 4, and the corresponding eigenvectors are v1 = ,
−1
1
v2 = . These vectors are clearly orthogonal, so we only need to normalize them: w1 =
1
1 1 1 1
√ , w2 = √ , to obtain an orthonormal basis of R2 consisting of the eigenvectors of
2 −1 2 1 !
1
√ 1√
A. Then P = (w1 w2 ) = 2 2 is the matrix of rotation by − π4 .
− √12 √1
2
x̃ x x̃
Now we define new coordinates by =P , so that the new axes x̃, ỹ are obtained
ỹ y ỹ
from the old axes x, y by applying the rotation
by − π4 about the origin.
2 0
Theorem 8.14 tells us that P T A P = , thus the original equation 3x2 + 2xy + 3y 2 = 8
0 4
becomes 2x̃2 + 4ỹ 2 = 8 in the new coordinates (see Algorithm 8.24). The standard form of the
x̃2 ỹ 2
latter equation is 2 + √ = 1, which represents the ellipse from Figure 8.1.
2 ( 2)2
Figure 8.1
x
Example 8.27. Consider the equation x2 + 8xy − 5y 2 = 1. Then Q( ) = x2 + 8xy − 5y 2 , so
y
1 4
A= . Using the usual methods, we find that λ1 = 3, λ2 = −7 are the eigenvalues of A,
4 −5
2 −1
and v1 = , v2 = are the corresponding eigenvectors of A (A v1 = 3 v1 , A v2 = −7 v2 ).
1 2
102 8. ORTHONORMAL SETS AND QUADRATIC FORMS
1 2
Again these vectors are orthogonal, so we only need to normalize them: w1 = v̂1 = √ ,
5 1
!
√2 − √15
1 −1 5
w2 = v̂2 = √ . Hence P = (w1 w2 ) = is the matrix of rotation about the
5 2 √1 √2
5 5
origin by angle θ = cos −1 2
√ ◦
≈ 26.57 .
5
x̃ T x x
Now, in the new coordinates =P , the equation Q( ) = 1 becomes
ỹ y y
x̃2 ỹ 2
3x̃2 − 7ỹ 2 = 1 ⇐⇒ √ − √ = 1.
(1/ 3)2 (1/ 7)2
Thus we get a hyperbola (see Figure 8.2).
Figure 8.2