Matrix Decomposition and Its Application in Statistics NK
Matrix Decomposition and Its Application in Statistics NK
Matrix Decomposition and Its Application in Statistics NK
Application in Statistics
Nishith Kumar
Lecturer
Department of Statistics
Begum Rokeya University, Rangpur.
Email: [email protected]
1
Overview
• Introduction
• LU decomposition
• QR decomposition
• Cholesky decomposition
• Jordan Decomposition
• Spectral decomposition
• Singular value decomposition
• Applications
2
Introduction
This Lecture covers relevant matrix decompositions, basic
numerical methods, its computation and some of its applications.
Decompositions provide a numerically stable way to solve
a system of linear equations, as shown already in [Wampler,
1970], and to invert a matrix. Additionally, they provide an
important tool for analyzing the numerical stability of a system.
3
Easy to solve system (Cont.)
Some linear system that can be easily
solved
The solution:
b1 / a11
b / a
2 22
b / a
n nn
4
Easy to solve system (Cont.)
Lower triangular matrix:
5
Easy to solve system (Cont.)
Upper Triangular Matrix:
6
LU Decomposition
LU decomposition was originally derived as a decomposition of quadratic
and bilinear forms. Lagrange, in the very first paper in his collected
works( 1759) derives the algorithm we call Gaussian elimination. Later
Turing introduced the LU decomposition of a matrix in 1948 that is used to
solve the system of linear equation.
A … U (upper triangular) 6 2 2 1 0 0 6 2 2
0 4 2 2 1 0 12 8 6
U = Ek E1 A 0 12 1 1 / 2 0 1 3 13 2
6 2 2 1 0 0 1 0 0 6 2 2
A = (E1)1 (Ek)1 U 0 4 2 0 1 0 2 1 0 12 8 6
0 0 5 0 3 1 1 / 2 0 1 3 13 2
U E2 E1 A
6 22 1 0 0 6 2 2
0 4 2 2 1 0 12 8 6
0 12 1 1 / 2 0 1 3 13 2
6 2 2 1 0 0 1 0 0 6 2 2
0 4 2 0 1 0 2 1 0 12 8 6
0 0 5 0 3 1 1 / 2 0 1 3 13 2
9
Calculation of L and U (cont.)
Now
1 1
1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
2 1 0 0 1 0 2 1 0 0 1 0 2 1 0
1 / 2 0 1 0 3 1 1 / 2 0 1 0 3 1 1 / 2 3 1
Therefore,
6 2 2 1 0 0 6 2 2
A 12 8 6 = 2 1 0 0 4 2
=LU
3 13 2 1 / 2 3 1 0 0 5
If A is a Non singular matrix then for each L (lower triangular matrix) the
upper triangular matrix is unique but an LU decomposition is not unique.
There can be more than one such LU decomposition for a matrix. Such as
6 2 2 6 0 0 1 2 / 6 2 / 6
A 12 8 6 = 12 1 0 0 4 2 =LU
3 3 1 0 10
3 13 2 0 5
Calculation
Calculationof
of LL and U (cont.)
and U (cont.)
Thus LU decomposition is not unique. Since we compute LU
decomposition by elementary transformation so if we change
L then U will be changed such that A=LU
LU Decomposition in R:
• library(Matrix)
• x<-matrix(c(3,2,1, 9,3,4,4,2,5 ),ncol=3,nrow=3)
• expand(lu(x))
11
Calculation of L and U (cont.)
• Note: there are also generalizations of LU to non-square and singular
matrices, such as rank revealing LU factorization.
• [Pan, C.T. (2000). On the existence and computation of rank revealing LU
factorizations. Linear Algebra and its Applications, 316: 199-222.
• Miranian, L. and Gu, M. (2003). Strong rank revealing LU factorizations.
Linear Algebra and its Applications, 367: 1-16.]
12
Solving system of linear equation
using LU decomposition
Suppose we would like to solve a m×m system AX = b. Then we can find
a LU-decomposition for A, then to solve AX =b, it is enough to solve the
systems
6 2 2
and 8
A 12 8 6 b 14
3 13 2 17
13
Solving system of linear equation
using LU decomposition
We have seen A = LU, where
1 0 0 6 2 2
L 2 1 0 U 0 4 2
0 0 5
1 / 2 3 1
Then
y1 8
Y y 2 2
y 3 15
14
Solving system of linear equation
using LU decomposition
Now, we solve UX =Y by backward substitution
6 2 2 x1 8 x1 1
0 4 2 x 2 x 2
2 then 2
0 0 5 x3 15
x3
3
15
QR Decomposition
Firstly QR decomposition
originated with Gram(1883).
Later Erhard Schmidt (1907)
proved the QR Decomposition
Jørgen Pedersen Gram Theorem Erhard Schmidt
(1850 –1916) (1876-1959)
16
QR-Decomposition (Cont.)
Theorem : If A is a m×n matrix with linearly independent columns, then A
can be decomposed as , A QRwhere Q is a m×n matrix whose
columns form an orthonormal basis for the column space of A and R is an
nonsingular upper triangular matrix.
u1 v1 q1
u 2 v 2 q 2 u 2 , q1 q1
u 3 v3 q3 u 3 , q1 q1 u 3 , q 2 q 2
u n v n q n u n , q1 q1 u n , q 2 q 2 u n , q n 1 q n 1 18
QR-Decomposition (Cont.)
Let Q= [q1 q2 . . . qn] , so Q is a m×n matrix whose columns form an
orthonormal basis for the column space of A .
v1 u 2 , q1 u 3 , q1 u n , q1
Now, 0 v2 u3 , q2 u n , q2
A u1 u 2 u n q1 q 2 q n 0 0 v3 u n , q3
0 0 0 0 v n
i.e., A=QR. v1 u 2 , q1 u 3 , q1 u n , q1
Where, 0 v2 u3 , q2 u n , q2
R 0 0 v3 u n , q3
0 0 0 0 vn
2nd Step:
r12 q1T a 2 2 3
3rd Step: 1 1 3 1 / 3
0 1 3 2 / 3
qˆ 2 a 2 q1 q1 a 2 a 2 q1 r12 (2 / 3 )
T
1 / 3
1 1 3
0 0 0
r22 qˆ 2 2 3
1/ 6
1 23
q2 qˆ 2
qˆ 2 1/ 6
0 21
Calculation of QR Decomposition
4th Step:
r13 q1T a3 1 3
6th Step:
1/ 2
0
qˆ 3 a3 q1 q1T a3 q 2 q 2T a3 a3 r13 q1 r23 q 2
1/ 2
1
r33 qˆ 3 6 / 2
1/ 6
1 0
q3 qˆ 3
qˆ 3 1/ 6
2/ 6
22
Calculation of QR Decomposition
Therefore, A=QR
1 1 1 1 / 3 1 / 6 1/ 6
1 0 3 2 / 3 1/ 3
0 1 / 3 2 / 6 0
0 2 / 6 1 / 6
1 1 0 1 / 3 1 / 6 1/ 6
0 0 6 / 2
0 0 1 0 0 2 / 6
R code for QR Decomposition:
x<-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol=3,nrow=3)
qrstr <- qr(x)
Q<-qr.Q(qrstr)
R<-qr.R(qrstr)
X X b X Y
t t
X Y RQY
t t t
Therefore,
R Rb R Q Y R
t t t
t 1
R Rb R
t
t 1
R t Q t Y Rb Q t Y Z
24
Cholesky Decomposition
Cholesky died from wounds received on the battle field on 31 August
1918 at 5 o'clock in the morning in the North of France. After his death
one of his fellow officers, Commandant Benoit, published Cholesky's
method of computing solutions to the normal equations for some least
squares data fitting problems published in the Bulletin géodesique in 1924.
Which is known as Cholesky Decomposition
Andre-Louis Cholesky 25
1875-1918
Cholesky Decomposition
Theorem: If A is a n×n real, symmetric and positive definite matrix then
there exists a unique lower triangular matrix G with positive diagonal
element such that A GG T .
27
Example of Cholesky Decomposition
Suppose 4 2 2 For k from 1 to n 1 / 2
A 2 10 2
k 1
2 2 5 l kk a kk l ks2
s 1
k 1
For j from k+1 to n l jk a jk l js l ks l kk
Then Cholesky Decomposition s 1
Now,
2 0 0
L 1 3 0
28
1 1 3
R code for Cholesky Decomposition
• x<-matrix(c(4,2,-2, 2,10,2, -2,2,5),ncol=3,nrow=3)
• cl<-chol(x)
1 0 0 4 0 0
L 1 / 2 1 0 and D 0 9 0
1 / 2 1 / 3 1 0 0 3
29
Application of Cholesky
Decomposition
Cholesky Decomposition is used to solve the system
of linear equation Ax=b, where A is real symmetric
and positive definite.
In regression analysis it could be used to estimate the
parameter if XTX is positive definite.
30
Characteristic Roots and
Characteristics Vectors
Any nonzero vector x is said to be a characteristic vector of a matrix A, If
there exist a number λ such that Ax= λx;
32
Jordan Decomposition
Camille Jordan (1870)
• Let A be any n×n matrix then there exists a nonsingular matrix P and JK(λ)
a k×k matrix form 1 0 0
0 1 0
J k ( )
0 0 0
Such that
Camille Jordan
J k1 (1 ) 0 0 (1838-1921)
0 J ( ) 0
P 1 AP
k2 2
0 0 0 J kr (r )
where k1+k2+ … + kr =n. Also λi , i=1,2,. . ., r are the characteristic roots
And ki are the algebraic multiplicity of λi ,
Jordan Decomposition is used in Differential equation and time series analysis. 33
Spectral Decomposition
CAUCHY, A.L.(1789-
1857)
Let A be a m × m real symmetric matrix. Then
there exists an orthogonal matrix P such that
P T AP or A PP , where Λ is a diagonal
T
matrix.
34
Spectral Decomposition and
Principal component Analysis (Cont.)
By using spectral decomposition we can write A PPT
V (Y ) i
i 1
36
R code for Spectral Decomposition
x<-matrix(c(1,2,3, 2,5,4, 3,4,9),ncol=3,nrow=3)
eigen(x)
Application:
For Data Reduction.
Image Processing and Compression.
K-Selection for K-means clustering
Multivariate Outliers Detection
Noise Filtering
Trend detection in the observations.
37
Historical background of SVD
There are five mathematicians who were responsible for establishing the existence of the
singular value decomposition and developing its theory.
Eugenio Beltrami Camille Jordan James Joseph Erhard Schmidt Hermann Weyl
(1835-1899) (1838-1921) Sylvester (1876-1959) (1885-1955)
(1814-1897)
The Singular Value Decomposition was originally developed by two mathematician in the
mid to late 1800’s
1. Eugenio Beltrami , 2.Camille Jordan
Several other mathematicians took part in the final developments of the SVD including James
Joseph Sylvester, Erhard Schmidt and Hermann Weyl who studied the SVD into the mid-1900’s.
39
Singular Value Decomposition (Cont.)
Theorem (Singular Value Decomposition) : Let X be m×n of rank
r, r ≤ n ≤ m. Then there exist matrices U , V and a diagonal
matrix Λ , with positive diagonal elements such that, X UV T
Proof: Since X is m × n of rank r, r ≤ n ≤ m. So XXT and XTX both
of rank r ( by using the concept of Grammian matrix ) and of
dimension m × m and n × n respectively. Since XXT is real
symmetric matrix so we can write by spectral decomposition,
XX T QDQ T
Where Q and D are respectively, the matrices of characteristic
vectors and corresponding characteristic roots of XXT.
Again since XTX is real symmetric matrix so we can write by
spectral decomposition,
X T X RMR T 40
Singular Value Decomposition (Cont.)
Where R is the (orthogonal) matrix of characteristic vectors and M
is diagonal matrix of the corresponding characteristic roots.
Since XXT and XTX are both of rank r, only r of their characteristic
roots are positive, the remaining being zero. Hence we can
write,
Dr 0
D
0 0
Also we can write,
M r 0
M
0 0
41
Singular Value Decomposition (Cont.)
We know that the nonzero characteristic roots of XXT and XTX are
equal so Dr M r
Partition Q, R conformably with D and M, respectively
i.e., Q (Qr , Q* ) ; R ( Rr , R * ) such that Qr is m × r , Rr is n × r and
correspond respectively to the nonzero characteristic roots of
XXT and XTX. Now take
U Qr
V Rr
1/ 2 1/ 2 1/ 2
D 1/ 2
r diag (d 1 , d2 ,, d r )
Where d i , i 1,2,, r are the positive characteristic roots of XXT
and hence those of XTX as well (by using the concept of
42
grammian matrix.)
Singular Value Decomposition (Cont.)
Now define, S Qr Dr1 / 2 Rr
T
Similarly, SS T XX T
From the first relation above we conclude that for an arbitrary orthogonal matrix, say
P1 ,
S P1 X
While from the second we conclude that for an arbitrary orthogonal matrix, say P 2
We must have 43
S XP2
Singular Value Decomposition (Cont.)
44
R Code for Singular Value Decomposition
45
Decomposition in Diagram
Matrix A
Full column rank
Lu decomposition
QR Decomposition
Not always unique
Rectangular
Square
Asymmetric
Symmetric SVD
AM>GM AM=GM
PD Similar
Cholesky Jordan Diagonalization
Spectral Decomposition
Decomposition P-1AP=Λ
Decomposition 46
Properties Of SVD
47
Proprieties of SVD
Low rank Approximation
Theorem: If A=UΛVT is the SVD of A and the
singular values are sorted as 1 2 , n
then for any l <r, the best rank-l approximation
to A is ~ l
r
~2
A ui i vi ; A A i2
T
i 1 i l 1
48
Low-rank Approximation
• SVD can be used to compute optimal low-rank
approximations.
• Approximation of A is à of rank k such that
~
A Min A X F Frobenius norm
X :rank ( X ) k
m n 2
A a
i 1 j 1
ij
n
di
2
If d1 , d 2 , , d n are the characteristics roots of ATA then A
i 1
2 2 2
min
X :rank ( X ) k
A X F A A
F
i k 1
i
51
Row approximation and column
approximation
Suppose Ri and cj represent the i-th row and j-th column of A. The SVD
~
of A and A is
r l
~
A UV T
u
k 1
k k v T
k A U l lVl uk k vkT
T
k 1
r
l
We can also approximate Cj by C v jk k u k
l
j ; l<r 52
k 1
Least square solution in inconsistent
system
By using SVD we can solve the inconsistent system.This gives the
least square solution. min 2
Ax b
x
53
The SVD of Ag is
Where
54
55
Basic Results of SVD
SVD based PCA
we can write- XV = UΛ
Suppose Y = XV = UΛ
Then the first columns of Y represents the first
principal component score and so on.
57
Origin of biplot
Gabriel (1971)
One of the most
important advances in
data analysis in recent
decades
Currently…
> 50,000 web pages
Numerous academic
publications
Included in most
statistical analysis
packages
Still a very new
technique to most
scientists
Prof. Ruben Gabriel, “The founder of biplot”
Courtesy of Prof. Purificación Galindo
University of Salamanca, Spain
58
What is a biplot?
• “Biplot” = “bi” + “plot”
– “plot”
• scatter plot of two rows OR of two columns, or
• scatter plot summarizing the rows OR the columns
– “bi”
• BOTH rows AND columns
• 1 biplot >> 2 plots
59
Practical definition of a biplot
“Any two-way table can be analyzed using a 2D-biplot as soon as it can be
sufficiently approximated by a rank-2 matrix.” (Gabriel, 1971)
(Now 3D-biplots are also possible…)
5
Matrix decomposition 4
E1
3
G2 G1
P(4, 3) G(3, 2) E(2, 3) 2
e1 e2 e3 x y 1
E2
Y
g1 20 9 6 g 1 4 3
e1 e2 e3 0 G4
g2 6 12 15 g 2 3 3 x 2 3 3
O
10 6 g 3 1 3 y 4 1 2
-1
g3 9
12 12 g 4 4 0
g4 8 -2 E3
-3 G3
G-by-E table
-4
-4 -3 -2 -1 0 1 2 3 4 5
60
Singular Value Decomposition (SVD) &
Singular Value Partitioning (SVP)
The ‘rank’ of Y, i.e.,
the minimum number Matrix Matrix
of PC required to characterising “Singular values” characterising
fully represent Y the rows the columns
r f=1
(uik )(
SVP f 1 f
vkj )
SVP: k 1
k k f=0
f=1/2
Rows scores Column scores
The simplest biplot is to show the first two PCs together with the
projections of the axes of the original variables
x-axis represents the scores for the first principal component
Y-axis the scores for the second principal component.
The original variables are represented by arrows which
graphically indicate the proportion of the original variance
explained by the first two principal components.
The direction of the arrows indicates the relative loadings on
the first and second principal components.
62
Biplot of Iris Data
-10 -5 0 5 10
1 33
0.2 Sepal W.
10
1
3
1 1
111
0.1
1111
5
1
1 333 333 Sepal L.
22 2 33 33
11 1 22223 3 33
333
1 11111 2 3 3 3 3
Comp. 2
1
111 2 233 3 3 Petal W.
11
0.0
11 1 33 23
2 Petal L.
2222 3 3
0
1 222 22 233 33
11
1 2 2 22 333
1 11 1
1 11 22 323
1 222 3 3 3
11 22 22
2
-0.1
-5
1 3
22 2
2 22 3 23
2 22
2
-10
-0.2
1
1= Setosa
2
2= Versicolor
3= Virginica -0.2 -0.1 0.0 0.1 0.2
63
Comp. 1
Image Compression Example
Pansy Flower image, collected from
http://www.ats.ucla.edu/stat/r/code/pansy.jpg
64
Singular values of flowers image
68
Low rank Approximation to flowers image
Scatter plot of Hawkins, Bradu and kass data (a) scatter plot of first two PC’s and
(b) scatter plot of first and third PC. 72
Outlier Detection Using SVD (Cont.)
MODIFIED BROWN DATA
Data set given by Brown (1980).
Ryan (1997) pointed out that the
original data on the 53 patients
which contains 1 outlier
(observation number 24).
76
Climatic Variables
The climatic variables are,
1. Rainfall (RF) mm
2. Daily mean temperature (T-MEAN)0C
3. Maximum temperature (T-MAX)0C
4. Minimum temperature (T-MIN)0C
5. Day-time temperature (T-DAY)0C
6. Night-time temperature (T-NIGHT)0C
7. Daily mean water vapor pressure (VP) MBAR
8. Daily mean wind speed (WS) m/sec
9. Hours of bright sunshine as percentage of maximum possible sunshine
hours (MPS)%
10. Solar radiation (SR) cal/cm2/day
77
Consequences of SVD
Generally many missing values may present in the data. It may also contain
unusual observations. Both types of problem can not handle Classical singular
value decomposition.
78
The Alternating L1 Regression Algorithm for Robust Singular Value
Decomposition.
di by minimizing x
j 1
ij d i v j1 ; i=1,2,….,n
For the second and subsequent of the SVD, we replaced X by a deflated matrix 79
obtained by subtracting the most recently found them in the SVD X X-λkukvkT
Clustering weather stations on Map
Using RSVD
80
References
• Brown B.W., Jr. (1980). Prediction analysis for binary data. in
Biostatistics Casebook, R.G. Miller, Jr., B. Efron, B. W. Brown, Jr., L.E.
Moses (Eds.), New York: Wiley.
• Dhrymes, Phoebus J. (1984), Mathematics for Econometrics, 2nd ed.
Springer Verlag, New York.
• Hawkins D. M., Bradu D. and Kass G.V.(1984),Location of several
outliers in multiple regression data using elemental sets. Technometrics,
20, 197-208.
• Imon A. H. M. R. (2005). Identifying multiple influential observations in
linear Regression. Journal of Applied Statistics 32, 73 – 90.
• Kumar, N. , Nasser, M., and Sarker, S.C., 2011. “A New Singular Value
Decomposition Based Robust Graphical Clustering Technique and Its
Application in Climatic Data” Journal of Geography and Geology,
Canadian Center of Science and Education , Vol-3, No. 1, 227-238.
• Ryan T.P. (1997). Modern Regression Methods, Wiley, New York.
• Stewart, G.W. (1998). Matrix Algorithms, Vol 1. Basic
Decompositions, Siam, Philadelphia.
• Matrix Decomposition.
http://fedc.wiwi.hu-berlin.de/xplore/ebooks/html/csa/node36.html
81
82