Matrix Calculus PDF
Matrix Calculus PDF
Matrix Calculus PDF
Matrix
Calculus
F–1
Appendix F: MATRIX CALCULUS
TABLE OF CONTENTS
Page
§F.1. Introduction F–3
§F.2. The Derivatives of Vector Functions F–3
§F.2.1. Derivative of Vector with Respect to Vector . . . . . . . F–3
§F.2.2. Derivative of a Scalar with Respect to Vector . . . . . . F–3
§F.2.3. Derivative of Vector with Respect to Scalar . . . . . . . F–3
§F.2.4. Jacobian of a Variable Transformation . . . . . . . . . F–4
§F.3. The Chain Rule for Vector Functions F–5
§F.4. The Derivative of Scalar Functions of a Matrix F–6
§F.4.1. Functions of a Matrix Determinant . . . . . . . . . . F–7
§F.5. The Matrix Differential F–8
F–2
§F.2 THE DERIVATIVES OF VECTOR FUNCTIONS
§F.1. Introduction
In this Appendix we collect some useful formulas of matrix calculus that often appear in finite
element derivations.
F–3
Appendix F: MATRIX CALCULUS
Remark F.1. Many authors, notably in statistics and economics, define the derivatives as the transposes of
those given above.1 This has the advantage of better agreement of matrix products with composition schemes
such as the chain rule. Evidently the notation is not yet stable.
The foregoing definitions can be used to obtain derivatives to many frequently used expressions,
including quadratic and bilinear forms.
1 One author puts it this way: “When one does matrix calculus, one quickly finds that there are two kinds of people in this
world: those who think the gradient is a row vector, and those who think it is a column vector.”
F–4
§F.3 THE CHAIN RULE FOR VECTOR FUNCTIONS
and if A is symmetric,
∂2 y
= 2A. (F.18)
∂x2
∂y
y ∂x
Ax AT
xT A A
xT x 2x
xT Ax Ax + AT x
F–5
Appendix F: MATRIX CALCULUS
Then ∂z1 ∂ yq ∂z 1 ∂ yq ∂z 2 ∂ yq
yq ∂ x1 ∂ yq ∂ x2
... ∂ yq ∂ xn
T ∂z 2 ∂ yq ∂z 2 ∂ yq ∂z 2 ∂ yq
∂z ...
∂ yq ∂ x1 ∂ yq ∂ x2 ∂ yq ∂ xn
=
∂x ..
.
∂zm ∂ yq ∂zm ∂ yq ∂zm ∂ yq
∂ yq ∂ x1 ∂ yq ∂ x2
... ∂ yq ∂ xn
∂z 1 ∂z 1 ∂z 1 ∂ y1 ∂ y1 ∂ y1
∂ y1 ∂ y2
... ∂ yr ∂ x1 ∂ x2
... ∂ xn
∂z 2 ∂z 2
... ∂z 2 ∂ y2 ∂ y2
... ∂ y2
∂ y1 ∂ y2 ∂ yr ∂ x1 ∂ x2 ∂ xn
=
.
.
.. ..
∂z m ∂z m ∂ yr ∂ yr ∂ yr
∂ y1
. . . ∂z
∂ y2
m
∂ yr ∂ x1 ∂ x2
... ∂ xn
T T
∂z ∂y ∂y ∂z T
= = . (F.22)
∂y ∂x ∂x ∂y
On transposing both sides, we finally obtain
∂z ∂y ∂z
= , (F.23)
∂x ∂x ∂y
which is the chain rule for vectors. If all vectors reduce to scalars,
∂z ∂ y ∂z ∂z ∂ y
= = , (F.24)
∂x ∂x ∂y ∂y ∂x
which is the conventional chain rule of calculus. Note, however, that when we are dealing with
vectors, the chain of matrices builds “toward the left.” For example, if w is a function of z, which
is a function of y, which is a function of x,
∂w ∂y ∂z ∂w
= . (F.25)
∂x ∂x ∂y ∂z
On the other hand, in the ordinary chain rule one can indistictly build the product to the right or to
the left because scalar multiplication is commutative.
y = f (X), (F.26)
∂y
, (F.27)
∂X
F–6
§F.4 THE DERIVATIVE OF SCALAR FUNCTIONS OF A MATRIX
Example F.4. Find the gradient matrix if y is the trace of a square matrix X of order n, that is
n
y = tr(X) = xii . (F.29)
i=1
Obviously all non-diagonal partials vanish whereas the diagonal partials equal one, thus
∂y
G= = I, (F.30)
∂X
where I denotes the identity matrix of order n.
But
|Y| = yi j Yi j , (F.33)
j
where Yi j is the cofactor of the element yi j in |Y|. Since the cofactors Yi1 , Yi2 , . . . are independent
of the element yi j , we have
∂|Y|
= Yi j . (F.34)
∂ yi j
It follows that
∂|Y| ∂ yi j
= Yi j . (F.35)
∂ xr s i j
∂ xr s
* The elementary matrix Ei j of order m × n has all zero entries except for the (i, j) entry, which is one.
F–7
Appendix F: MATRIX CALCULUS
Example F.5. If X is a nonsingular square matrix and Z = |X|X−1 its cofactor matrix,
∂|X|
G= = ZT . (F.38)
∂X
If X is also symmetric,
∂|X|
G= = 2ZT − diag(ZT ). (F.39)
∂X
If X and Y are product-conforming matrices, it can be verified that the differential of their product
is
d(XY) = (dX)Y + X(dY). (F.43)
which is an extension of the well known rule d(x y) = y d x + x dy for scalar functions.
Example F.6. If X = [xi j ] is a square nonsingular matrix of order n, and denote Z = |X|X−1 . Find the
differential of the determinant of X:
∂|X|
d|X| = d xi j = Xi j d xi j = tr(|X|X−1 )T dX) = tr(ZT dX), (F.44)
i, j
∂ xi j i, j
F–8
§F.5 THE MATRIX DIFFERENTIAL
Example F.7. With the same assumptions as above, find d(X−1 ). The quickest derivation follows by differ-
entiating both sides of the identity X−1 X = I:
from which
d(X−1 ) = −X−1 dX X−1 . (F.46)
If X reduces to the scalar x we have
1 dx
d =− . (F.47)
x x2
F–9